library(tigris)
library(sf)
library(dplyr)
library(ggplot2)
library(janitor)
library(readr)
library(stringr)
library(ggrepel)
library(patchwork)
9.1 Learning Objectives
- Introduce the
sf
package in R for working with spatial data - Walk through a conceptual framework on how you can create a map of the US and its territories
- Explore how you can combine different layers of spatial data in one plot
- Provide additional resources for your to explore and be able to achieve the plots you want
9.2 Big Idea
The primary purpose of this workshop is to demonstrate how to use R to visualize spatial data. Particularly how to create a map of the U.S and its main territories. We will briefly review the basic concepts of working with geospatial data. Therefore we may not cover key concepts when doing spatial data analysis. For more in-depth learning on spatial analysis in R, you can check out the online book “Geocomputation with R” by Lovelace et al. (2024).
9.3 Introduction to spatial data with sf
From the sf vignette:
Simple features or simple feature access refers to a formal standard (ISO 19125-1:2004) that describes how objects in the real world can be represented in computers, with emphasis on the spatial geometry of these objects. It also describes how such objects can be stored in and retrieved from databases, and which geometrical operations should be defined for them.
The sf package is an R implementation of Simple Features. This package incorporates:
- A new spatial data class system in R
- Functions for reading and writing spatial data
- Tools for spatial operations on vectors
- Most of the functions in this package starts with prefix st_ which stands for spatial and temporal.
9.4 About this lesson
In this lesson we will walk through how to plot the a map of the continental US with additional insets of US territories. For this example we use the shift_geometry()
function from the tigris
package to plot the continental US, Alaska, Hawaii and Puerto Rico in one map.
tigris
is an R package that allows users to directly download and use TIGER/Line shapefiles from the US Census Bureau.
The idea of this script is to present an example of how to work with spatial data in R using the sf
package and introduce tools and techniques that will help you create the maps you like for your project.
Large spatial data files can take some time to process and plot.If you are running this code on your local machine, be patient! Your session can freeze or take some time to see the desired outcome.
9.5 Set up
We will start by loading the packages we will be using to create our map.
9.6 Read data
Next step is to read the data we are going to be using today. In this case we will use 4 data sources.
- The US state boundaries that we can load directly from the
tigris
package. - The continental US EPA Ecoregions used in the National Aquatic Resource Surveys.
- The Level 3 EPA Ecoregions by state for the state od Alaska
- List of states and their corresponding CASC region (this data set was manually created, see us-casc-regions.R script)
9.6.1 1. Load US map from tigris
## read data
# cb = T + 20m resol, removes American Samoa, Maraina Islands, Guam and Virgin Islands
<- states(cb = TRUE,
us_states resolution = "20m")
## shift geometries for easy plotting
<- us_states %>%
us_states_shift shift_geometry()
9.6.2 2. Read in Ecoregions
a) Continental US
For the continental US we are using the aggregated ecoregions used in the National Aquatic Resource Surveys. You can download this data at the EPA website.
These data divides the continental US in 9 ecoregions.
Note: We downloade the data and saved in to the data
folder in our Rproj.
## read data
<- read_sf("data/Aggr_Ecoregions_2015.shp")
ecoregions_cont_us
unique(ecoregions_cont_us$WSA9_NAME)
st_crs(ecoregions_cont_us) #Albers, NAD83
b) Alaska L3 Ecoregions
To add additional territories, we will have to download individual files for each area. Here an example for Alaska. In this case we are downloading the shapefiles for Level III EPA ecoregions for Alaska.
Note: We downloade the data and saved in to the data
folder in our Rproj.
## Read data
<- read_sf("data/ak_eco_l3.shp")
ecoregion_ak_l3 st_crs(ecoregion_ak_l3) # EPSG",3338, Alaska Albers
plot(ecoregion_ak_l3$geometry)
9.6.3 3. US major cities
Shapefile indicating the locatio of the major cities across the US. Data downloaded form here
Note: We downloade the data and saved in to the data
folder in our Rproj.
## read
<- read_sf("data/USA_Major_Cities.shp")
us_cities st_crs(us_cities) #WGS 84 / Pseudo-Mercator
plot(us_cities$geometry)
## transform to desired crs
<- st_transform(us_cities,
us_cities_nad83 crs = st_crs(us_states))
st_crs(us_cities_nad83)
plot(us_cities_nad83$geometry)
## shift geometries for easy plotting
<- us_cities_nad83 %>%
us_cities_shift shift_geometry()
plot(us_cities_shift$geometry)
9.6.3.1 4. States by CASC region
This data was created using the code on this script.
## read data
<- read_csv("data/state_by_casc_region.csv")
us_state_casc
## merge polygons to get CASC region areas
<- us_states %>%
casc_shp left_join(us_state_casc, by= c("GEOID", "STUSPS", "NAME")) %>%
group_by(casc_region) %>%
summarise(geometry = st_union(geometry))
plot(casc_shp$geometry)
## shift geometries for easy plotting
<- casc_shp %>%
casc_shift shift_geometry()
plot(casc_shift$geometry)
9.7 Cleaning Ecoregion data
To be able to plot all the regions in one map, we need to get both ecoregions files in the same format. This means column names should match so we can bind both files.
## Rename columns to a generic name
<- ecoregions_cont_us %>%
ecoregions_cont_us_clean rename(code = WSA9,
name = WSA9_NAME)
## Select two main colums from Alaska ecoregion file and rename to match continental file + transforming to the smae CRS than continental file.
## note: choosing L1 because it is the mos broad ecoregion (4 ecoregions for alaska)
<- ecoregion_ak_l3 %>%
ecoregion_ak_clean select(code = NA_L1CODE,
name = NA_L1NAME) %>%
mutate(name = str_to_sentence(name)) %>%
st_transform(st_crs(ecoregions_cont_us_clean))
##checking outcomes
unique(ecoregion_ak_clean$name)
st_crs(ecoregion_ak_clean)
plot(ecoregion_ak_clean$geometry)
9.8 Bind ecoregions data into one file
We use the dplyr::bind_rows()
function to combine both data frames into one with all the ecoregions. One We have all the information in one place we apply the shift_geometry()
to get the polygons in the projection we want for plotting.
<- bind_rows(ecoregions_cont_us_clean,
ecoregions_all
ecoregion_ak_clean)
# shift geom
<- ecoregions_all %>%
ecoregions_shift shift_geometry()
plot(ecoregions_shift$geometry)
9.9 Plot ecoregions + cities + casc
Below is the initial plot that shows how we can layer different sf
objects in s ggplot
by using the geom_sf()
function.
<- ggplot()+
us_plot geom_sf(data = ecoregions_shift,
aes(fill = name))+
geom_sf(data = us_states_shift,
fill = NA,
color = "darkgray")+
geom_sf(data = us_cities_shift)+
geom_sf(data = casc_shift,
color = "red",
fill = NA)+
theme_void()+
scale_fill_viridis_d()
us_plot
9.10 Customize map
Finally, we are adding some customization to our plot to make it look “publication-grade”. Note that these are just a few ways we could customize this plot. ggplot2
and all the packages around ggplot
provide endless possibilities on how to go about your plots. There is lot’s of documentation about ggplot2
online, feel free to search other possibilities to add or modify from this plot.
<- ggplot()+
finalized_plot geom_sf(data = ecoregions_shift,
aes(fill = name),
alpha = 0.5)+
geom_sf(data = us_states_shift,
fill = NA,
color = "darkgray")+
geom_sf(data = us_cities_shift,
alpha = 0.3)+
geom_sf(data = casc_shift,
color = "black",
size = 6,
fill = NA)+
#an alternative to label your can use geom_sf_text, plots just the text not the rectagle aroud it.
geom_sf_label(data = casc_shift[c(1:7), ],
aes(label = casc_region),
size = 3)+
geom_label_repel(
data = casc_shift[c(8:10), ],
aes(label = casc_region,
geometry = geometry),
stat = "sf_coordinates",
size = 3,
min.segment.length = 2)+
# colour = "magenta",
# segment.colour = "magenta")+
theme_void()+
scale_fill_viridis_d(name = "Ecoregion")+
labs(title = "Cities across Ecoregions in the US",
subtitle = "Map is devided into CASC regions")+
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5))
finalized_plot
9.11 Additional Information
More on shift_geometry()
There are different option of output for this function. Remember you can always search for the the help by running ?shift_geometry()
in the console and get more details about how this function works. Let’s see other possible outcomes when using this function.
## Alaska, Hawaii and Puerto Rico "outside" of the continental US as opposed to below.
<- us_states %>%
us_states_outside shift_geometry(position = "outside")
plot(us_states_outside$geometry)
We can also set preserve_area = TRUE
to keep the area of each territory the relative size compared to the continental US.
<- us_states %>%
us_states_area shift_geometry(preserve_area = TRUE)
plot(us_states_area$geometry)
Or, combine both options presented above.
<- us_states %>%
us_states_outside_area shift_geometry(preserve_area = TRUE,
position = "outside")
plot(us_states_outside_area$geometry)
Exploring other ways to plot the US and its territorises
Break down one of the sf
data frames with a shift_geometry()
to inspect in more details the CRS for each territory.
## Filtering each data set to plot independently
<- us_states_shift %>%
alaska filter(NAME == "Alaska")
<- us_states_shift %>%
hawaii filter(NAME == "Hawaii")
<- us_states_shift %>%
puerto_rico filter(NAME == "Puerto Rico")
<- us_states_shift %>%
continent filter(!NAME %in% c("Alaska", "Hawaii", "Puerto Rico"))
Exploring other ways of plotting the US and its territories using patchwork
<- ggplot()+
alaska_plot geom_sf(data = alaska)+
theme_void()
<- ggplot()+
hawaii_plot geom_sf(data = hawaii)+
theme_void()
<- ggplot()+
pr_plot geom_sf(data = puerto_rico)+
theme_void()
<- ggplot()+
cont_plot geom_sf(data = continent)+
theme_void()
## combined plot
+ {
cont_plot | hawaii_plot | pr_plot +
alaska_plot plot_layout(ncol = 3)
+
} plot_layout(ncol = 1)
9.12 Resurces
patchwork
Spatial Data Visualization
Visualizing Spatial Data, NCEAS Learning Hub Lesson by Rachel King
Mapping Census data in R by Kyle Walker (author of the
tigirs
package)ggcart
, an interesting package to look into. “The goal of ggcart is to include Puerto Rico, the Virgin Islands and Guam in the traditional Albers maps”