• 1 Reproducible Research Techniques for Synthesis
    • 1.1 Schedule
      • 1.1.1 Code of Conduct
  • 2 Thinking preferences
    • 2.1 Learning Objectives
    • 2.2 About the Whole Brain Thinking System
  • 3 Best Practices: Data and Metadata
    • 3.1 Learning Objectives
    • 3.2 Preserving computational workflows
    • 3.3 Best Practices: Overview
    • 3.4 Organizing Data: Best Practices
  • 4 RStudio and Git/GitHub Setup and Motivation
    • 4.1 Learning Objectives
    • 4.2 Reproducible Research
      • 4.2.1 What is needed for computational reproducibility?
      • 4.2.2 Conceptualizing workflows
    • 4.3 Why use git?
      • 4.3.1 The problem with filenames
    • 4.4 Checking the RStudio environment
      • 4.4.1 R Version
      • 4.4.2 RStudio Version
      • 4.4.3 Package installation
    • 4.5 Setting up git
      • 4.5.1 Note for Windows Users
    • 4.6 Updating a previous R installation
  • 5 Introduction to R and RMarkdown
    • 5.1 Learning Objectives
    • 5.2 Introduction and Motivation
      • 5.2.1 Resources
    • 5.3 R at the console
      • 5.3.1 Error messages are your friends
      • 5.3.2 Logical operators and expressions
      • 5.3.3 Clearing the environment
    • 5.4 RMarkdown
      • 5.4.1 Your Turn
      • 5.4.2 Code chunks
      • 5.4.3 Your turn
    • 5.5 R functions, help pages
      • 5.5.1 A simple example
      • 5.5.2 Getting help
      • 5.5.3 Your turn
      • 5.5.4 Use a function to read a file into R
    • 5.6 Using data.frames
      • 5.6.1 Your Turn
    • 5.7 Troubleshooting
      • 5.7.1 My RMarkdown won’t knit to PDF
      • 5.7.2 I just entered a command and nothing is happening
      • 5.7.3 R says my object is not found
    • 5.8 Literate Analysis
    • 5.9 Exercise
  • 6 Data Documentation and Publishing
    • 6.1 Learning Objectives
    • 6.2 Data sharing and preservation
    • 6.3 Data repositories: built for data (and code)
    • 6.4 Metadata
    • 6.5 Structure of a data package
    • 6.6 DataONE Federation
    • 6.7 Publishing data from the web
      • 6.7.1 Download the data to be used for the tutorial
      • 6.7.2 Login via ORCID
      • 6.7.3 Create and submit the data set
      • 6.7.4 File and variable level metadata
      • 6.7.5 Add workflow provenance
  • 7 Version Control With git and GitHub
    • 7.1 Learning Objectives
    • 7.2 The problem with filenames
    • 7.3 Version control and Collaboration using Git and GitHub
    • 7.4 Let’s look at a GitHub repository
    • 7.5 The Git lifecycle
    • 7.6 Create a remote repository on GitHub
    • 7.7 Working locally with Git via RStudio
    • 7.8 On good commit messages
    • 7.9 Collaboration and conflict free workflows
    • 7.10 Exercise
    • 7.11 Advanced topics
  • 8 Git: Collaboration and Conflict Management
    • 8.1 Learning Objectives
    • 8.2 Collaborating with Git
      • 8.2.1 Activity: Collaborating with a trusted colleague
    • 8.3 Merge conflicts
    • 8.4 How to resolve a conflict
      • 8.4.1 Abort, abort, abort…
      • 8.4.2 Checkout
      • 8.4.3 Pull and edit the file
    • 8.5 Workflows to avoid merge conflicts
  • 9 Collaborating using Git
    • 9.1 Learning Objectives
    • 9.2 Pull requests
    • 9.3 Exercise: Create and merge pull requests
    • 9.4 Branches
      • 9.4.1 Exercise:
    • 9.5 Tags
  • 10 Publishing Analyses to the Web
    • 10.1 Learning Objectives
    • 10.2 Introduction
    • 10.3 A Minimal Example
    • 10.4 Exercise: Sharing your work
  • 11 Data Modeling & Tidy Data
    • 11.1 Learning Objectives
    • 11.2 Benefits of relational data systems
    • 11.3 Data Organization
    • 11.4 Multiple tables
    • 11.5 Inconsistent observations
    • 11.6 Inconsistent variables
    • 11.7 Marginal sums and statistics
    • 11.8 Good enough data modeling
      • 11.8.1 Denormalized data
      • 11.8.2 Tabular data
    • 11.9 Primary and Foreign Keys
    • 11.10 Entity-Relationship Model (ER)
    • 11.11 Merging data
    • 11.12 Simple Guidelines for Effective Data
    • 11.13 Data modeling exercise
    • 11.14 Related resources
  • 12 Data Cleaning and Manipulation
    • 12.1 Learning Objectives
    • 12.2 Introduction
    • 12.3 Setup
    • 12.4 About the pipe (%>%) operator
    • 12.5 Selecting/removing columns: select()
    • 12.6 Quality Check
    • 12.7 Changing column content: mutate()
    • 12.8 Changing shape: pivot_longer() and pivot_wider()
    • 12.9 Renaming columns with rename()
    • 12.10 Adding columns: mutate()
    • 12.11 group_by and summarise
    • 12.12 Filtering rows: filter()
    • 12.13 Sorting your data: arrange()
    • 12.14 Joins in dplyr
    • 12.15 separate() and unite()
    • 12.16 Summary
  • 13 Hands On: Clean and Integrate Datasets
    • 13.1 Learning Objectives
    • 13.2 Outline
    • 13.3 High-level steps
      • 13.3.1 Visual schematic of steps
    • 13.4 Full solution
  • 14 Creating R Functions
    • 14.1 Learning outcomes
    • 14.2 Why functions?
    • 14.3 Temperature conversion
    • 14.4 Creating a function
    • 14.5 Exercise
    • 14.6 Documenting R functions
    • 14.7 Summary
    • 14.8 Spoiler – the exercise answered.
  • 15 Creating R Packages
    • 15.1 Learning Objectives
    • 15.2 Why packages?
    • 15.3 Install and load packages
    • 15.4 Create a basic package
    • 15.5 Add your code
    • 15.6 Add documentation
    • 15.7 Test your package
    • 15.8 Checking and installing your package
    • 15.9 Sharing and releasing your package
    • 15.10 Exercise
    • 15.11 More reading
  • 16 Publication Graphics
    • 16.1 Learning Objectives
    • 16.2 Overview
      • 16.2.1 ggplot vs base vs lattice vs XYZ…
    • 16.3 Setup
      • 16.3.1 Load salmon escapement data
    • 16.4 Static figures using ggplot2
      • 16.4.1 Setting ggplot themes
      • 16.4.2 Smarter tick labels using scales
      • 16.4.3 Creating multiple plots
    • 16.5 Interactive visualization using leaflet and DT
      • 16.5.1 Tables
      • 16.5.2 Maps
    • 16.6 Resources
  • 17 Writing Good Data Management Plans
    • 17.1 Learning Objectives
    • 17.2 When to Plan: The Data Life Cycle
    • 17.3 Why Plan?
    • 17.4 How to Plan
    • 17.5 What to include in a DMP
    • 17.6 NSF DMP requirements
    • 17.7 Tools in Support of Creating a DMP
    • 17.8 Hands-On: Creating a DMP
  • 18 Introduction to Shiny
    • 18.1 Learning Objectives
    • 18.2 Overview
    • 18.3 Create a sample shiny application
    • 18.4 Interactive scatterplots
      • 18.4.1 Load data for the example
      • 18.4.2 Add a simple scatterplot using ggplot
      • 18.4.3 Add sliders to set the min depth and max depth for the X axis
      • 18.4.4 Connect the slider values to the plot
      • 18.4.5 Negative depths?
    • 18.5 Shiny architecture
    • 18.6 Extending the user interface with dynamic plots
      • 18.6.1 Vertical layout
      • 18.6.2 Add the dynamic plot
      • 18.6.3 Finishing touches: data citation
    • 18.7 Publishing Shiny applications
      • 18.7.1 Publishing to shinyapps.io
    • 18.8 Summary
    • 18.9 Full source code for the final application
    • 18.10 Resources
  • 19 Spatial vector analysis using sf
    • 19.1 Learning Objectives
    • 19.2 Introduction
    • 19.3 Reading a shapefile
      • 19.3.1 Setup
      • 19.3.2 Coordinate Reference System
      • 19.3.3 Attributes
    • 19.4 sf & the Tidyverse
      • 19.4.1 Joins
      • 19.4.2 Group and summarize
      • 19.4.3 Save
    • 19.5 Visualize with ggplot
    • 19.6 Incorporate base maps into static maps using ggmap
    • 19.7 Visualize sf objects with leaflet
  • 20 Reproducibility and Provenance
    • 20.1 Learning Objectives
  • 21 Additional Resources: Using NetCDF files
    • 21.1 Learning Objectives
    • 21.2 Introduction
    • 21.3 Reading in data
    • 21.4 Reshaping the data into a data.frame
    • 21.5 Plotting
  • 22 Additional Resources: Collaboration, authorship and data policies
    • 22.1 Resources
    • 22.2 References
  • 23 Additional Resources: Parallel Computing in R
    • 23.1 Learning Outcomes
    • 23.2 Introduction
    • 23.3 Why parallelism?
    • 23.4 Processors (CPUs) and Cores
    • 23.5 When to parallelize
    • 23.6 Loops and repetitive tasks using lapply
    • 23.7 Approaches to parallelization
    • 23.8 Parallelize using: mclapply
    • 23.9 Parallelize using: foreach and doParallel
    • 23.10 Summary
    • 23.11 Readings and tutorials
  • Published with bookdown

Reproducible Research Techniques for Synthesis

22 Additional Resources: Collaboration, authorship and data policies

22.1 Resources

Example Codes of Conduct

  • NCEAS Code of Conduct
  • Arctic Data Center Code of Conduct
  • Carpentries Code of Conduct
  • Mozilla Science Code of Conduct
  • Mozilla Community Participation Guidelines
  • Ecological Society of America Code of Conduct
  • American Geophysical Union Code of Conduct

Policy Templates:

  • Authorship Policy
  • Data Policy

An example lab policy that combines data management and sharing practices with authorship guidelines from the Wolkovich Lab. Shared with permission from Elizabeth Wolkovich.

22.2 References

  • Cheruvelil, K. S., Soranno, P. A., Weathers, K. C., Hanson, P. C., Goring, S. J., Filstrup, C. T., & Read, E. K. (2014). Creating and maintaining high-performing collaborative research teams: The importance of diversity and interpersonal skills. Frontiers in Ecology and the Environment, 12(1), 31-38. DOI: 10.1890/130001