Learning Objectives
- Practice using common cleaning and wrangling functions
- Practice creating plots using common visualization functions in
ggplot
- Practice saving and sharing data visualizations
About the data
These exercises will be using data on abundance, size, and trap counts (fishing pressure) of California spiny lobster (Panulirus interruptus) and were collected along the mainland coast of the Santa Barbara Channel by Santa Barbara Coastal LTER researchers [@lter2022].
13.1 Exercise: Collaborate on an analysis and create a report to publish using GitHub Pages
13.1.1 Explore, clean and wrangle data
For this portion of the exercise, the Owner will be working with the lobster abundance and size data, and the Collaborator will be working with the lobster trap buoy counts data
Questions 1-3 you will be working independently since you’re working with different data frames, but you’re welcome to check in with each other.
13.2 Convert missing values using mutate()
and na_if()
13.3 filter()
practice
13.5.1 Create visually appealing and informative data visualization
Answer
ggplot(data = lobster_abundance, aes(x = SIZE_MM)) +
geom_histogram() +
facet_wrap(~SITE)
Plots
First, you’ll need to create a new dataset subset called lobsters_summarize
:
- Group the data by
SITE
ANDYEAR
- Calculate the total number of lobsters observed using
count()
Next, create a line graph using ggplot()
and geom_line()
. Use geom_point()
to make the data points more distinct, but ultimately up to you if you want to use it or not. We also want SITE
information on this graph, do this by specifying the variable in the color
argument. Where should the color
argument go? Inside or outside of aes()
? Why or why not?
Plots
First, you’ll need to create a new dataset subset called lobster_size_lrg
:
-
filter()
for the years 2019, 2020, and 2021 - Add a new column called
SIZE_BIN
that contains the values “small” or “large”. A “small” carapace size is <= 70 mm, and a “large” carapace size is greater than 70 mm. Usemutate()
andif_else()
. Check your output - Calculate the number of “small” and “large” sized lobsters using
group()
andsummarize()
. Check your output - Remove the
NA
values from the subsetted data. Hint: check outdrop_na()
. Check your output
Next, create a bar graph using ggplot()
and geom_bar()
. Note that geom_bar()
automatically creates a stacked bar chart. Try using the argument position = "dodge"
to make the bars side by side. Pick which bar position you like best.
Plots
Answer
ggplot(data = lobster_traps, aes(x = TRAPS)) +
geom_histogram() +
facet_wrap( ~ YEAR)
Plots
First, you’ll need to create a new dataset subset called lobsters_traps_summarize
:
- Group the data by
SITE
ANDYEAR
- Calculate the total number of lobster commercial traps observed using
sum()
. Look upsum()
if you need to. Call the new columnTOTAL_TRAPS
. Don’t forget aboutNAs
here!
Next, create a line graph using ggplot()
and geom_line()
. Use geom_point()
to make the data points more distinct, but ultimately up to you if you want to use it or not. We also want SITE
information on this graph, do this by specifying the variable in the color
argument. Where should the color
argument go? Inside or outside of aes()
? Why or why not?
Plots
First, you’ll need to create a new dataset subset called lobster_traps_fishing_pressure
:
-
filter()
for the years 2019, 2020, and 2021 - Add a new column called
FISHING_PRESSURE
that contains the values “high” or “low”. A “high” fishing pressure has exactly or more than 8 traps, and a “low” fishing pressure has less than 8 traps. Usemutate()
andif_else()
. Check your output - Calculate the number of “high” and “low” observations using
group()
andsummarize()
. Check your output - Remove the
NA
values from the subsetted data. Hint: check outdrop_na()
. Check your output
Next, create a bar graph using ggplot()
and geom_bar()
. Note that geom_bar()
automatically creates a stacked bar chart. Try using the argument position = "dodge"
to make the bars side by side. Pick which bar position you like best.
Plots
13.5.2 Collaborate on a report and publish using GitHub pages
The final step! Time to work together again. Collaborate with your partner in lobster-report.Rmd
to create a report to publish to GitHub pages.
Make sure your R Markdown is well organized and includes the following elements:
- citation of the data
- brief summary of the abstract (i.e. 1-2 sentences) from the EDI Portal
- Owner analysis and visualizations (you choose which plots you want to include)
- add alternative text to your plots
- plots can be added either with the data visualization code or with Markdown syntax - it’s up to you if you want to include the code or not.
- Collaborator analysis and visualizations (you choose which plots you want to include)
- add alternative text to your plots
- plots can be added either with the data visualization code or with Markdown syntax - it’s up to you if you want to include the code or not.
Finally, publish on GitHub pages (from Owner’s repository). Refer back to Chapter 9 for steps on how to publish using GitHub pages.
13.6 Bonus: Add marine protected area (MPA) designation to the data
The sites IVEE
and NAPL
are marine protected areas (MPAs). Add this designation to your data set using a new function called case_when()
. Then create some new plots using this new variable. Does it change how you think about the data? What new plots or analysis can you do with this new variable?
Use the object lobster_abundance
and add a new column called DESIGNATION
that contains “MPA” if the site is IVEE
or NAPL
, and “not MPA” for all other values.
Use the object lobster_traps
and add a new column called DESIGNATION
that contains “MPA” if the site is IVEE
or NAPL
, and “not MPA” for all other values.