Extract KML data

 KML is a really common format, it is not usually one that GIS professionals use for a number of reasons but it is really handy for passing GIS data to online mapping systems, portable devices and it can even be created with a text editor.  If you want more information on it then the wiki page on Keyhole Markup Language (yes, that is the actual name) and Google Developers KML page are good places to start.

Using KML with QGIS is super simple however there are a few tricks, particularly with KMZ files.  Here is a brief tutorial on opening KML files in QGIS.

So getting them into QGIS is fine but sometimes I need to use the data within a KML file and QGIS doesn't deal with that very well as all the descriptive fields are groups in an HTML table in a single field.  While I'm sure it is possible to parse this in QGIS, here is a script in R since that was where I needed the data anyway.

  1. Save the KML file as a .csv file
  2. Open the file in R with read.csv
  3.        
    
                csvData <- read.csv("filename.csv")
    
           
     
  4.  Put the description data into a separate variable
  5.        
    
                descriptionData <- as.character(csvData$Description)
    
           
     
  6.  Create a function to split the character string on the line break tag '<br>' and on the colon (with space) ': ' which separates the field name from the field data.
  7.        
    
                splitbr <- function(x) {
        
                    y <- strsplit(x,"<br>")
                    lapply(y,function(z) strsplit(z,": "))
        
                }
    
           
     
  8.  Run function on each row of data using lapply
  9.        
    
                descriptionList <- lapply(descriptionData,splitbr)
    
           
     
  10.  This puts the data into a bit list which needs to be separate into a dataframe.  This happens using some nested apply functions to select only the data portion for the list
  11.        
    
                descriptionDF <- as.data.frame(do.call("rbind", lapply(descriptionList,function(z) sapply(z[[1]],function(y) y[2]))))
    
           
     
  12.  Next the names or headers are added to the data frame and it is ready to be used
  13.        
    
                names(descriptionDF) <- sapply(descriptionList[[1]][[1]],function(z) z[[1]][1])
    
           
     
Notes:
There are other ways to do this in R however this uses only base functions (and is still relatively simply) so there are no dependency issues.
It would probably also be possible to read the kml directly into R which I might investigate in another post.

Comments

Popular posts from this blog

Victorian Property Overlay Map

On your bike... to where?

Everyday IoT: Tank Water Levels