# 12R Practice: Cleaning and Wrangling

## Learning Objectives

• Practice cleaning and wrangling data
• Practice using `dplyr` and `tidyr` functions
• Learn how to use the `tibble::tibble()` function to create a data frame
• Apply concepts learned about data visualization to plot data using `ggplot2`

## 12.1 Introduction

In this session of R practice, we will continue working with the dataset: Tobias Schwoerer, Kevin Berry, and Jorene Joe. 2022. A household survey documenting experiences with coastal hazards in a western Alaska community (2021-2022). Arctic Data Center. doi:10.18739/A29Z90D3V.

Big Idea

In this practice session, we will build upon the previous session by using `dplyr`, `tidyr`, and other packages form the tidyverse to summarize answers of the survey.

## 12.2 Exercise 2

Set up
• Work in the same Qmd you did during R practice 1.
• Add necessary headers and text to describe what you are doing during this practice.
• Using Split-Apply-Combine strategy
• Creating a Data Frame
• Joining Data Frames
• Plotting Q3 responses
• At the top of your document, under the Setup header, load the necessary packages for this practice: `dplyr`, `tidyr`, `tibble`, and `ggplot2`. .
``````library(readr)
library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)``````

### 12.2.1 Using Split-Apply-Combine strategy

Step 1

Use `group_by` and `summarize` to calculate how many responses there were to each unique answer for question 3.

``````q3_tally <- survey_data %>%
group_by(Q3) %>%
summarize(n_responses = n())``````

### 12.2.2 Creating a Data Frame

Step 2

Create a `data.frame` containing the definitions to the answer codes in Question 3. Use the metadata to get code-definition pairs.

One way of creating a new data frame is by using the `tribble()` or `tibble()` functions from the `tibble` package.

Tip: Search either in the help page or on the web for information about `tribble()` or `tibble()`. Then decide which on to use to create a new data frame.

``````## tribble
q3_definitions <- tribble(
~Q3, ~definition,
1,   "definition of 1",
2,   "definition of 2",
3,   "definition of 3",
4, "definition of 4",
5, "definition of 5",
NA, "definition of NA")

##tibble
Q3 <- c(1,2,3,4,5,NA)

definition <- c("definition 1", "definition 2", "definition 3", "definition 4", "definition 5", "definition NA")

q3_definitions <- tibble(Q3, definition)``````

### 12.2.3 Joining Data Frames

Step 3

Use a `left_join` to join your definitions table to the summarized answers

``````## Option 1
q3_summary <- left_join(q3_tally, q3_definitions,
by = "Q3")

## Option 2

q3_summary <- q3_tally %>%
left_join(q3_definitions, by = "Q3")``````

### 12.2.4 Data Visualization

Step 4

Use `ggplot()` to create a bar graph (`geom_col`) comparing the total number of responses for each option in Q3.

Note: The Example Code provides only the base plot. Reference the Data Visualization lesson to custom your plot. Add a `theme_`, change the labels, add a title, maybe flip the coords to plot the bars horizontally? Feel free to use other functions you know or discover by searching on the web.

``````ggplot(q3_summary,
aes(x = Q3,
y = n_responses))+
geom_col()``````

## 12.3 Bonus

Go Further

Explore how you might summarize other questions in these survey results.