Introduction

This is an introduction to Spatial Analysis in R. I’ve developed this code based on some common questions from friends and colleagues or ones that I’ve asked myself. There is a lot here to help you get started, but there is also a lot more to learn!

The focus here will be on raster analysis, rather than vector (shapefiles, polygons, polylines, points, etc.).

This (recently updated) tutorial is inspired by a recent paper in Nature Ecology and Evolution, Mapping the global potential for marine aquaculture. The authors used multiple constraints including ship traffic, dissolved oxygen, bottom depth and more, to limit areas suitable for aquaculture.




We are going to use a similar, but much more simplified approach here. We will map potential areas of marine aquaculture for the super cute Lump sucker fish (Eumicrotremus orbis)




We will answer this question by taking into consideration the following spatial data:

1. Sea Surface Temperature (raster data)
2. Net Primary Productivity (raster data)
3. Marine Protected Areas (vector data)

Key information for optimal growth:

  • Sea surface temperatures between 12 and 18 degrees Celsius
  • Net Primary Productivity between 2.6 and 3 mgC/m2/day

Background

Raster or gridded data are stored as a grid of value which are rendered on a map as pixels. Each pixel value represents an area on the Earth’s surface.

Examples

Some examples of raster data include oceanographic datasets such as Sea Surface Temperature, land use maps and digital elevation maps.

What is a GeoTIFF??

Raster data can come in many different formats. In this tutorial, we will use the geotiff format which has the extension .tif. A .tif file stores metadata or attributes about the file as embedded tif tags. These tags can include the following raster metadata:

  1. A Coordinate Reference System (CRS)
  2. Spatial Extent (extent)
  3. Values that represent missing data (NoDataValue)
  4. The resolution of the data

Information in this section is borrowed from NEON’s Intro to Raster Data in R tutorial, another great resource


Setup

Libraries & Settings

There are a lot of spatial packages for R, we will touch on some of them here but not all of them. Here is brief overview, taken from this site:

  • raster: Reading, writing, manipulating, analyzing and modeling of gridded spatial data
  • rgdal: Provides the most important and basic spatial functionalities. Provides bindings to Frank Warmerdam’s Geospatial Data Abstraction Library (GDAL) (>= 1.6.3, < 2) and access to projection/transformation operations from the PROJ.4 library
  • sf and sp are the most important R packages to handle vector data; sf is a successor of sp, but it’s still evolvin
  • rgeos: Provides spatial vector operations like buffer and intersect. Interface to Geometry Engine – Open Source (GEOS) using the C API for topology operations on geometries.
  • maps: This package has pre-loaded maps stored which can be added to your map plots.
  • maptools: tools for reading and writing spatial data (visualisation)
  • ncdf4: Use with NetCDF files. Note that the raster package is also able to read NetCDF files and I prefer to use raster whenever possible.

Spatial data visualization packages

  • ggplot2: With the recent incorporation of sf into ggplot2, you can now use geom_sf() to plot vector data.
  • leaflet: Use leaflet in R to create interactive maps and visualize spatial data
  • tmap: ggplot 2 for maps! A new package that gives Arc-like functionality to creating publication ready maps.
  • rasterVis: a package that can be used to visualize raster data. Provides more functionality and customization than base plot for rasters.
  • mapview: provides functions to create interactive visualisations of spatial data.

Load all libraries. If you do not have all of these libraries installed, use install.packages() to load them.

library(sf)
library(raster)       #Main raster library with nearly all functions used in this analysis
library(rgdal)        #Spatial library - most functions used from rgdal are for vectors (shapefiles)
library(dplyr)        #NOT spatial - this is a data wrangling library

#devtools::install_github("ecohealthalliance/fasterize")  #to install fasterize you need to run this line
library(fasterize)
#devtools::install_github("tidyverse/ggplot2")
library(ggplot2)

Data Prep

My first step in a spatial analysis is prepping the data, which includes the following:

  • Read in data
  • Pre-process the data is it “plays nicely”,
  • Visualize the data

Shapefile

Read in the shapefile we’ll use later on for the West Coast region.

wc_rgns <- st_read(dsn = 'shapefiles', layer = 'wc_regions_clean')
## Reading layer `wc_regions_clean' from data source `/home/afflerbach/github/spatial-analysis-R/shapefiles' using driver `ESRI Shapefile'
## Simple feature collection with 5 features and 5 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -129.1635 ymin: 30.542 xmax: -117.097 ymax: 49.00031
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
plot(wc_rgns[1])

Raster

Sea Surface Temperature

In the data folder, there are 5 .tif files with the naming pattern average_annual_sst_[year].tif, which are 5 annual average sea surface temperatures for our region (2008-2012). We want just one raster file of the average SST over that time period.

Visualize

I like to visualize the raw data before running any calculation or analysis.

Create a raster of the first file by calling raster() and then plot() to visualize.

r <- raster("rasters/average_annual_sst_2008.tif")

plot(r)


Notice the data values are in Kelvin - we will change this to celsius later.

I also like to look at the distribution of data using hist()

hist(r)

Read in raster data

We want to read in all five Sea Surface Temperature files. I like to do this by first listing all the files with list.files().

sst_files = list.files('rasters', pattern='average_', full.names = T) #We need full file paths

sst_files
## [1] "rasters/average_annual_sst_2008.tif"
## [2] "rasters/average_annual_sst_2009.tif"
## [3] "rasters/average_annual_sst_2010.tif"
## [4] "rasters/average_annual_sst_2011.tif"
## [5] "rasters/average_annual_sst_2012.tif"

Stack rasters

To get a single layer of average SST in degrees Celsius we need to first stack all layers.

#stack is a function from the raster package that puts all RasterLayers into a RasterStack
sstStack = stack(sst_files)

plot(sstStack)

Raster calcuations

You can perform operations on a RasterStack by using the calc() function from the raster package. calc() lets you define a function to apply across all layers in the stack.

Calculate the mean value per cell and then convert to Celsius by subtracting 273.15.

# By adding 'filename=' R will directly save the raster into the defined file rather than memory
sstAvg = raster::calc(sstStack, fun=function(x){mean(x, na.rm=T)-273.15})#,filename='rasters/sstAvg.tif', overwrite=T) 

plot(sstAvg, main = 'Mean Sea Surface Temperature (Celsius)')

A more compact way of doing multiple raster analysis is by using pipes…you can run stack() and calc() in one call!

sstAvg = stack(sst_files)%>%
          raster::calc(., fun=function(x){mean(x, na.rm=T)-273.15})

plot(sstAvg, main = 'Mean Sea Surface Temperature (Celsius)')

Net Primary Production (NPP)

Read in raster data

Read in this data the same way as the SST data, using raster(). This data is the net primary production (mgC/m2/day).

npp = raster('rasters/annual_npp.tif');npp
## class       : RasterLayer 
## dimensions  : 145, 114, 16530  (nrow, ncol, ncell)
## resolution  : 112000, 54700  (x, y)
## extent      : -17813502, -5045502, 1526148, 9457648  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs 
## data source : /home/afflerbach/github/spatial-analysis-R/rasters/annual_npp.tif 
## names       : annual_npp 
## values      : 1.898059, 3.783287  (min, max)
plot(npp, main = 'Net Primary Production (mgC/m2/day)')

You’ll see that this is in a different projection, extent and cell size from the SST data. It is really obvious when you look at the plot, but the summary above also gives clues as to what projection/resolution/extent this data is in.

To do any sort of analysis using multiple rasters, they all need to be in the same extent, projection and cell resolution.

First look at the differences:

sstAvg
## class       : RasterLayer 
## dimensions  : 480, 408, 195840  (nrow, ncol, ncell)
## resolution  : 0.04166185, 0.04165702  (x, y)
## extent      : -131.9848, -114.9867, 29.99305, 49.98842  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +ellps=WGS84 +no_defs 
## data source : in memory
## names       : layer 
## values      : 4.98, 32.895  (min, max)
npp
## class       : RasterLayer 
## dimensions  : 145, 114, 16530  (nrow, ncol, ncell)
## resolution  : 112000, 54700  (x, y)
## extent      : -17813502, -5045502, 1526148, 9457648  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs 
## data source : /home/afflerbach/github/spatial-analysis-R/rasters/annual_npp.tif 
## names       : annual_npp 
## values      : 1.898059, 3.783287  (min, max)

To get the primary productivity data in the same format as the SST data, we need to

  1. reproject
  2. crop
  3. resample

Reproject

Use projectRaster() from the raster package to reproject a RasterLayer from one projection to another. You will need to define what the new projection should be by setting a coordinate reference system.

Defining a coordinate reference system (crs) can be done in many ways. See Melanie’s great cheat sheet for more details about Coordinate Reference Systems.