Over a million developers have joined DZone.

Using R with Geospatial Data

DZone's Guide to

Using R with Geospatial Data

· Big Data Zone
Free Resource

Effortlessly power IoT, predictive analytics, and machine learning applications with an elastic, resilient data infrastructure. Learn how with Mesosphere DC/OS.

GIS, an acronym that brings joy to some and strikes fear in the heart of those not interested in buying expensive software. Luckily fight or flight can be saved for another day because you don’t need to be a GIS jock with a wad of cash to work with spatial data and make beautiful plots. A computer and internet connection should be all you need. In this post I will show how to

  • Get a machine ready to use R to work with geospatial data
  • Describe what type of data can be used and some of the exciting sources of free GIS data
  • Use data from the Washington Department of Natural Resources to generate some pretty plots by working through an example script

Getting your Machine Ready

First, if you do not have R on your computer you can download it here. Even better, download Rstudio, an incredibly efficient and easy to use interface for working with R available here.  Working with R studio is highly recommended and will be more clearly outlined in this post. R does not support working with spatial data straight out of the box so there are a couple of packages that need to be downloaded to get R working with spatial data. The two packages required are ‘sp’ and ‘rgdal’.  We will also use a third package, ‘rgeos’ for some fancy geospatial tricks. Unfortunately, the latest release of the sp package is not compatible with the latest version of R — v 3.0 at this time. When the Rstudio install prompts to install R, download version 2.15.3.

First intall and open RStudio. To add the required packages open RStudio and click on the “packages” tab in the lower right hand panel of the interface. The lower right window will show a list of packages that come with a standard download of RStudio. Some of the packages will have check marks next to them, this means that those libraries are loaded and ready to be used. If you just downloaded R for the first time sp and rdgal will not be on that list, click on the “Install Packages” button. Make sure the “Install from” option is set to “Repository (CRAN)” and type “sp” into  the “Packages” space. Check the “Install Dependencies” option and download! By checking the “Install Dependencies” option packages sp needs to function properly will automatically be downloaded. Download rgdal in the same way and you have the tools needed to start!  Download rgeos as well and you can run the portion of the example script that uses centroids.

Sp and Rgdal abilities and sources of Data

Rgdal is what allows R to understand the structure of shapefiles by providing functions to read and convert spatial data into easy-to-work-with R dataframes. Sp enables transformations and projections of the data and provides functions for working with the loaded spatial polygon objects. The take away should be that these are powerful libraries that allow R to use .shp files. The US Geological Survey, the National Park Serivce, and the Washington State Department of Natural Resources are a just a few examples of organizations that make enormous stockpiles of spatial data available to the public.

Example Script

The following code uses Watershed Resource Inventory Area (WRIA) spatial data from the Washington State Department of Ecology.  This dataset contains information on Washington State’s different water resource management areas.  Exactly what information is stored in the shapefiles will be explored using R! If a function or any of the code looks mysterious try  ”? mysteriousFunctionName()” and handy documentation will fill you in on what a function does. Lets start using R to investigate the data. Just cut and paste the following code into RStudio:

# WRIA example by Steven Brey @ mazamascience.com

# load the required libraries 

# First load the data from the Washington department of ecology website

# Data source
# Data: ftp://www.ecy.wa.gov/gis_a/hydro/wria.zip
# Metadata: http://www.ecy.wa.gov/services/gis/data/hydro/wria.htm

# Create a directory for the data
localDir <- 'R_GIS_data'
if (!file.exists(localDir)) {
# Download the 5mb file of WRIA data
url <- 'ftp://www.ecy.wa.gov/gis_a/hydro/wria.zip'
file <- paste(localDir,basename(url),sep='/')
if (!file.exists(file)) {
  download.file(url, file)
# Show the unzipped files 

# layerName is the name of the unzipped shapefile without file type extensions 
layerName <- "WRIA_poly"  
# Read in the data
data_projected <- readOGR(dsn=localDir, layer=layerName) 

# What is this thing and what's in it?
# It's an S4 "SpatialPolygonsDataFrame" object with the following slots:
# [1] "data"        "polygons"    "plotOrder"   "bbox"        "proj4string"

# What does the data look like with the default plotting command? 

We don’t know yet what the different ‘slots’ contain but Holy Smokes!  That plot looks like Washington. The plot is not pretty (yet) but in just a few lines of code you can already make a primitive map of the polygons in this dataset. The fact that this object is a ~DataFrame suggests that we can treat it like an R dataframe.  Lets find out what data it has, select the interesting columns and work with only those.  (Not surprisingly, the @data slot is where the actual dataframe exists but many dataframe methods also work on the entire object.)

We’ll go to the trouble of renaming variables, reprojecting the data to “longlat” and then saving the data to a .RData file for easy loading in the future.

# Could use names(data_projected@data) or just:
#["WRIA_ID"    "WRIA_NR"    "WRIA_AREA_" "WRIA_NM"    "Shape_Leng" "Shape_Area"

# Identify the attributes to keep and associate new names with them
attributes <- c("WRIA_NR", "WRIA_AREA_", "WRIA_NM")
# user friendly names 
newNames <- c( "number", "area", "name")

# Subset the full dataset extracting only the desired attributes
data_projected_subset <- data_projected[,attributes]

# Assign the new attribute names
names(data_projected_subset) <- newNames

# Create a dataframe name (potentially different from layerName)
data_name <- "WRIA"

# Reproject the data onto a "longlat" projection and assign it to the new name
assign(data_name,spTransform(data_projected_subset, CRS("+proj=longlat")))

# NOTE: If using assign() above gave you an error it is likely the version of 
# NOTE: R you are using does not currently support the sp package. Use R 
# NOTE: 2.15.3 instead. 

# The WRIA dataset is now projected in latitude longitude coordinates as a
# SpatialPolygonsDataFrame.  We save the converted data as .RData for faster
# loading in the future.

# Upon inspecting the metadata you can see that the first 19 areas in WRIA
# surround Puget Sound. The names of the first 19 watersheds in WRIA are

# For fun, save a subset including only only these 19 areas
PSWRIANumbers <- c(1:19)
WRIAPugetSound <- WRIA[WRIA$number %in% PSWRIANumbers,]
# Sanity check, plot WRIAPugetSound to make sure it looks like the subset we want

# Save Puget Sound data as an RData file which we can quickly "load()"
data_name <- "WRIAPugetSound"
We have now saved both WRIA and WRIAPugetSound data as R spatial objects that can easily be loaded! Now the real fun begins, those of you who have been waiting for pretty plots are about to be rewarded. The rest of the script is a walk through of some of the fun analysis and basic figure creating that can easily be done with the converted WRIA data.
# Load the data that was converted to SpatialPolygonsDataFrame
# NOTE: This can be skipped (but does not have to be) if the spatial
# NOTE: objects are still in your workspace.

# Load and show the names of the attributes in WRIA
file <- paste(localDir,"WAWRIAs.RData",sep="/")

file <- paste(localDir,"WAWRIAs.RData",sep="/")

# Sweet.  We can see that this is the WRIA dataset we saved earlier

# NOTE: For more advanced users, slotNames(WRIA) will list the structures 
# in WRIA. Using the @ command allows you to grab a particular slot from the
# spatial object.  If you really want the HUGE AND GORY details of what's
# in this object, you can examine the full structure with str(WRIA).

# Here is how you would extract the contents of slots for easier use.
WriaData <- WRIA@data
WriaBbox <- WRIA@bbox

# We have a pretty good idea of what kind of data we are working with 
# and what it looks like. Now its time for the data to answer some
# questions and tell us a story.

# What is the biggest water resource area in Washington? 
maxArea <- max(WRIA$area)
# Create a 'mask' identifying the biggest area so we can find out its name
# NOTE:  Eaach 'mask' we create is a vector of TRUE/FALSE values that we
# NOTE:  will use to subset the dataframe.
biggestAreaMask <- which(WRIA$area == maxArea)
biggestAreaName <- WRIA$name[biggestAreaMask]

# Create a SpatialPolygonsDataFrame subset
biggestArea <- WRIA[biggestAreaMask,]

# plot biggest area in blue with a title
plot(biggestArea, col="blue")

# NOTE: Many more plot arguments can be explored by investigating 
# NOTE: the "SpatialPolygons" "plot-method" in the sp package
# I have heard of a water resource management area in Washington State
# called Pend Oreille.  Where is it located in this dataframe?
which(WriaData$name == "Pend Oreille")

# Now we have isolated the watershed with the largest area as well as the
# fabled Pend Oreille.  Lets figure out how to highlight these regions when
# plotting all  regions. I have also heard that Lake Chelan is Beautiful.
# Lets isolate it as well.

# Each of the following makes a spatialPolygonsDataFrame subset, selecting 
# a specific region based on some selected attribute in WRIA.

WRIA_puget <- WRIA[WRIA$number %in% PSWRIANumbers,]
WRIA_chelan <- WRIA[WRIA$name == "Chelan",]
WRIA_Pend_Oreille <- WRIA[WRIA$name == "Pend Oreille",]

# Check out what they look like plotted individually 

# For fun we will make 8 different watersheds 8 different colors!
watersheds <- c(1:8)
watershed_colors <- c("burlywood1","forestgreen","burlywood3","darkolivegreen3",

watershed_color_variation <- WRIA[WRIA$number %in% watersheds,]

# Plot some of the created spatial objects together
plot(watershed_color_variation, add=TRUE, col=watershed_colors)

# NOTE:  gCentroid is from the 'rgeos' package
# Find the center of each region and label lat and lon of centers
centroids <- gCentroid(WRIA, byid=TRUE)
centroidLons <- coordinates(centroids)[,1]
centroidLats <- coordinates(centroids)[,2]

# Find regions with center East of -121 longitude (East of Cascade mountains) 
eastSideMask <- centroidLons > -121

# Create spatialPolygonsDataFrame for these regions
WRIA_nonPacific <- WRIA[eastSideMask,]

# Find watersheds with area 75th percentile or greater
basinAreaThirdQuartile <- quantile(WRIA$area,0.75)
largestBasinsMask <- WRIA$area >= basinAreaThirdQuartile

WRIA_largest <- WRIA[largestBasinsMask,]

# To get legend and labels to fit on the figure we can change the size of the
# area plotting. bbox(WRIA) shows the bounding lat and long of WRIA.
bBox <- bbox(WRIA)
ynudge <- 0.5
xnudge <- 3
xlim <- c(bBox[1,1] + xnudge , bBox[1,2] - xnudge)
ylim <- c(bBox[2,1] - ynudge, bBox[2,2] )

# Plot some of the different spatial objects and show lat-long axis, 
# label watersheds
text(centroidLons, centroidLats, labels=WRIA$number, col="blue", cex=.7)

title(main="Washington Watersheds")
labels <- c("Puget Sound Watersheds", "Washington's biggest watersheds",
            "drain to Pacific via Columbia river") 
label_cols <- c("navy","green1","red")
legend("bottomleft", NULL, labels, fill=label_cols, density, bty="n", cex=.8)

Who needs expensive GIS software?

Another useful package for plotting spatial data is the “maptools” library available from from the Cran repository. The code below requires the maptools package so make sure it is installed before running the code.

# Lets get a taste of what some of the built in plotting functions in the map
# tools package can do

# First load

allWashingtonAreas <- spplot(WRIA, zcol="area", identify=TRUE)

# by saving as object more than one plot can be shown on a panel
pugetSoundAreas <- spplot(WRIA_puget, zcol="area", identify=TRUE)

# Open new plot frame
print(allWashingtonAreas, position=c(0,.5,.5,1), more=T)
print(pugetSoundAreas, position=c(.5,.5,1,1), more=T)

These examples only scratch the surface of the plotting and analysis potential available in R using the sp, rgdal, and maptools packages.  Exploring the plot-methods in sp is highly recommended before investing a large amount of time into scripting a plot on your own.

Congratulations! You now know how to find out what is in geospatial data and how to make plots using R.

Happy Mapping!

Learn to design and build better data-rich applications with this free eBook from O’Reilly. Brought to you by Mesosphere DC/OS.


Published at DZone with permission of Jonathan Callahan, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}