9  Creating a map of the US and its territories

9.1 Learning Objectives

  • Introduce the sf package in R for working with spatial data
  • Walk through a conceptual framework on how you can create a map of the US and its territories
  • Explore how you can combine different layers of spatial data in one plot
  • Provide additional resources for your to explore and be able to achieve the plots you want

9.2 Big Idea

The primary purpose of this workshop is to demonstrate how to use R to visualize spatial data. Particularly how to create a map of the U.S and its main territories. We will briefly review the basic concepts of working with geospatial data. Therefore we may not cover key concepts when doing spatial data analysis. For more in-depth learning on spatial analysis in R, you can check out the online book “Geocomputation with R” by Lovelace et al. (2024).

9.3 Introduction to spatial data with sf

From the sf vignette:

Simple features or simple feature access refers to a formal standard (ISO 19125-1:2004) that describes how objects in the real world can be represented in computers, with emphasis on the spatial geometry of these objects. It also describes how such objects can be stored in and retrieved from databases, and which geometrical operations should be defined for them.

The sf package is an R implementation of Simple Features. This package incorporates:

  • A new spatial data class system in R
  • Functions for reading and writing spatial data
  • Tools for spatial operations on vectors
  • Most of the functions in this package starts with prefix st_ which stands for spatial and temporal.


9.4 About this lesson

In this lesson we will walk through how to plot the a map of the continental US with additional insets of US territories. For this example we use the shift_geometry() function from the tigris package to plot the continental US, Alaska, Hawaii and Puerto Rico in one map.

tigris is an R package that allows users to directly download and use TIGER/Line shapefiles from the US Census Bureau.

The idea of this script is to present an example of how to work with spatial data in R using the sf package and introduce tools and techniques that will help you create the maps you like for your project.

Spatial Data

Large spatial data files can take some time to process and plot.If you are running this code on your local machine, be patient! Your session can freeze or take some time to see the desired outcome.

9.5 Set up

We will start by loading the packages we will be using to create our map.

library(tigris)
library(sf)
library(dplyr)
library(ggplot2)
library(janitor)
library(readr)
library(stringr)
library(ggrepel)
library(patchwork)

9.6 Read data

Next step is to read the data we are going to be using today. In this case we will use 4 data sources.

  • The US state boundaries that we can load directly from the tigris package.
  • The continental US EPA Ecoregions used in the National Aquatic Resource Surveys.
  • The Level 3 EPA Ecoregions by state for the state od Alaska
  • List of states and their corresponding CASC region (this data set was manually created, see us-casc-regions.R script)

9.6.1 1. Load US map from tigris

## read data
# cb = T + 20m resol, removes American Samoa, Maraina Islands, Guam and Virgin Islands
us_states <- states(cb = TRUE, 
                    resolution = "20m")

## shift geometries for easy plotting
us_states_shift <- us_states %>% 
  shift_geometry()

9.6.2 2. Read in Ecoregions

a) Continental US

For the continental US we are using the aggregated ecoregions used in the National Aquatic Resource Surveys. You can download this data at the EPA website.

These data divides the continental US in 9 ecoregions.

Note: We downloade the data and saved in to the data folder in our Rproj.

## read data
ecoregions_cont_us <- read_sf("data/Aggr_Ecoregions_2015.shp")

unique(ecoregions_cont_us$WSA9_NAME)
st_crs(ecoregions_cont_us) #Albers, NAD83

b) Alaska L3 Ecoregions

To add additional territories, we will have to download individual files for each area. Here an example for Alaska. In this case we are downloading the shapefiles for Level III EPA ecoregions for Alaska.

Note: We downloade the data and saved in to the data folder in our Rproj.

## Read data
ecoregion_ak_l3 <- read_sf("data/ak_eco_l3.shp")
st_crs(ecoregion_ak_l3) # EPSG",3338, Alaska Albers
plot(ecoregion_ak_l3$geometry)

9.6.3 3. US major cities

Shapefile indicating the locatio of the major cities across the US. Data downloaded form here

Note: We downloade the data and saved in to the data folder in our Rproj.

## read
us_cities <- read_sf("data/USA_Major_Cities.shp")
st_crs(us_cities) #WGS 84 / Pseudo-Mercator
plot(us_cities$geometry)

## transform to desired crs
us_cities_nad83 <- st_transform(us_cities,
                                crs = st_crs(us_states))
st_crs(us_cities_nad83)
plot(us_cities_nad83$geometry)

## shift geometries for easy plotting
us_cities_shift <- us_cities_nad83 %>% 
  shift_geometry()

plot(us_cities_shift$geometry)

9.6.3.1 4. States by CASC region

This data was created using the code on this script.

## read data
us_state_casc <- read_csv("data/state_by_casc_region.csv")

## merge polygons to get CASC region areas
casc_shp <- us_states %>% 
  left_join(us_state_casc, by= c("GEOID", "STUSPS", "NAME")) %>% 
  group_by(casc_region) %>% 
  summarise(geometry = st_union(geometry))

plot(casc_shp$geometry)

## shift geometries for easy plotting
casc_shift <- casc_shp %>% 
  shift_geometry()

plot(casc_shift$geometry)

9.7 Cleaning Ecoregion data

To be able to plot all the regions in one map, we need to get both ecoregions files in the same format. This means column names should match so we can bind both files.

## Rename columns to a generic name
ecoregions_cont_us_clean <- ecoregions_cont_us %>% 
  rename(code = WSA9,
         name = WSA9_NAME)
  
## Select two main colums from Alaska ecoregion file and rename to match continental file + transforming to the smae CRS than continental file.
## note: choosing L1 because it is the mos broad ecoregion (4 ecoregions for alaska)
ecoregion_ak_clean <- ecoregion_ak_l3 %>% 
  select(code = NA_L1CODE,
         name = NA_L1NAME) %>% 
  mutate(name = str_to_sentence(name)) %>% 
  st_transform(st_crs(ecoregions_cont_us_clean))

##checking outcomes
unique(ecoregion_ak_clean$name)
st_crs(ecoregion_ak_clean)
plot(ecoregion_ak_clean$geometry)

9.8 Bind ecoregions data into one file

We use the dplyr::bind_rows() function to combine both data frames into one with all the ecoregions. One We have all the information in one place we apply the shift_geometry() to get the polygons in the projection we want for plotting.

ecoregions_all <- bind_rows(ecoregions_cont_us_clean, 
                            ecoregion_ak_clean)


# shift geom
ecoregions_shift <- ecoregions_all %>% 
  shift_geometry()

plot(ecoregions_shift$geometry)

9.9 Plot ecoregions + cities + casc

Below is the initial plot that shows how we can layer different sf objects in s ggplot by using the geom_sf() function.

us_plot <-  ggplot()+
  geom_sf(data = ecoregions_shift,
          aes(fill = name))+
  geom_sf(data = us_states_shift,
          fill = NA,
          color = "darkgray")+
  geom_sf(data = us_cities_shift)+
  geom_sf(data = casc_shift,
          color = "red",
          fill = NA)+
  theme_void()+
  scale_fill_viridis_d()

us_plot

9.10 Customize map

Finally, we are adding some customization to our plot to make it look “publication-grade”. Note that these are just a few ways we could customize this plot. ggplot2 and all the packages around ggplot provide endless possibilities on how to go about your plots. There is lot’s of documentation about ggplot2 online, feel free to search other possibilities to add or modify from this plot.

finalized_plot <-  ggplot()+
  geom_sf(data = ecoregions_shift,
          aes(fill = name),
          alpha = 0.5)+
  geom_sf(data = us_states_shift,
          fill = NA,
          color = "darkgray")+
  geom_sf(data = us_cities_shift,
          alpha = 0.3)+
  geom_sf(data = casc_shift,
          color = "black",
          size = 6,
          fill = NA)+
  #an alternative to label your can use geom_sf_text, plots just the text not the rectagle aroud it.
  geom_sf_label(data = casc_shift[c(1:7), ],
               aes(label = casc_region),
               size = 3)+
  geom_label_repel(
    data = casc_shift[c(8:10), ],
    aes(label = casc_region, 
        geometry = geometry),
    stat = "sf_coordinates",
    size = 3,
    min.segment.length = 2)+
    # colour = "magenta",
    # segment.colour = "magenta")+
  theme_void()+
  scale_fill_viridis_d(name = "Ecoregion")+
  labs(title = "Cities across Ecoregions in the US",
       subtitle = "Map is devided into CASC regions")+
  theme(plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5))


finalized_plot

9.11 Additional Information

More on shift_geometry()

There are different option of output for this function. Remember you can always search for the the help by running ?shift_geometry() in the console and get more details about how this function works. Let’s see other possible outcomes when using this function.

## Alaska, Hawaii and Puerto Rico "outside" of the continental US as opposed to below.
us_states_outside <- us_states %>% 
  shift_geometry(position = "outside")

plot(us_states_outside$geometry)

We can also set preserve_area = TRUE to keep the area of each territory the relative size compared to the continental US.

us_states_area <- us_states %>% 
  shift_geometry(preserve_area = TRUE)

plot(us_states_area$geometry)

Or, combine both options presented above.

us_states_outside_area <- us_states %>% 
  shift_geometry(preserve_area = TRUE,
                 position = "outside")

plot(us_states_outside_area$geometry)

Exploring other ways to plot the US and its territorises

Break down one of the sf data frames with a shift_geometry() to inspect in more details the CRS for each territory.

## Filtering each data set to plot independently
alaska <- us_states_shift %>% 
  filter(NAME == "Alaska")

hawaii <- us_states_shift %>% 
  filter(NAME == "Hawaii")

puerto_rico <- us_states_shift %>% 
  filter(NAME == "Puerto Rico")

continent <- us_states_shift %>% 
  filter(!NAME %in% c("Alaska", "Hawaii", "Puerto Rico"))

Exploring other ways of plotting the US and its territories using patchwork

alaska_plot <- ggplot()+
  geom_sf(data = alaska)+
  theme_void()

hawaii_plot <- ggplot()+
  geom_sf(data = hawaii)+
  theme_void()

pr_plot <- ggplot()+
  geom_sf(data = puerto_rico)+
  theme_void()


cont_plot <- ggplot()+
  geom_sf(data = continent)+
  theme_void()


## combined plot
cont_plot + {
  alaska_plot | hawaii_plot | pr_plot +
    plot_layout(ncol = 3)
  } +
  plot_layout(ncol = 1)

9.12 Resurces

patchwork

Spatial Data Visualization