16  Appendix

17 Configuring Two-factor Authentication on GitHub

Learning Objectives

  • Successfully setup two-factor authentication on GitHub
  • Recognize two-factor authentication jargon

17.1 Why Set up Two-factor Authentication (2FA)

  1. Prevents unauthorized access
  2. Strengthens your web security, especially if you have a compromised password
  3. It is an increasing requirement for most websites and online applications or services

In March 2023, GitHub announced that it will require 2FA for “all developers who contribute code on GitHub.com” (GitHub Blog). This rollout will be completed by the end of 2023.

All users have the flexibility to use their preferred 2FA method, including: TOTP, SMS, security keys, or GitHub Mobile app. GitHub strongly recommends using security keys and TOTPs. While SMS-based 2FA is available to use, it does not provide the same level of protection, and is no longer recommended under NIST (National Institute of Standards and Technology) 800-63B.

17.1.1 Additional information about 2FA on GitHub:

For Your Reference

Review the Glossary table to see a comprehensive list of two-factor authentication related terms and definitions

17.2 Steps for Configuring 2FA Using a TOTP App

Additional Resource

GitHub outlines these steps online in an article: Configuring two-factor authentication.

  1. Download a TOTP app
  2. Navigate to your account Settings (click your profile photo in the top right-hand corner)
  3. In the “Access” section, click “Password and Authenticate”
  4. In the “Two-factor authentication” section, click Enable two-factor authentication
  5. Under “Setup authenticator app”, either:
    1. Scan the QR code with your TOTP app. After scanning, the app displays a six-digit code that you can enter on GitHub
    2. If you can’t scan the QR code, click “enter this text code” to see a code that you can manually enter in your TOTP app instead
  6. On GitHub, type the code into the field under “Verify the code from the app”
  7. Under “Save your recovery codes”, click “Download” to download your recovery codes. Save them to a secure location because your recovery codes can help you get back into your account if you lose access.
    1. After saving your two-factor recovery codes, click “I have saved my recovery codes” to enable two-factor authentication for your account
  8. Configure additional 2FA methods, if desired

17.3 Glossary

Term Definition
Quick Response (QR) Code A type of two-dimensional matrix barcode that contains specific information
Recovery Code A unique code(s) used to reset passwords or regain access to accounts
Short Message Service (SMS) A text messaging service that allows mobile devices to exchange short text messages
Time-based one-time password (TOTP) A string of unique codes that changes based on time. Often, these appear as six-digit numbers that regenerate every 30 seconds
Two-factor Authentication (2FA) An identity and access management security method that requires two forms of identification to access accounts, resources, or data

18 Check and Set your GitHub Personal Access Token (PAT)

18.1 Steps to check if your Personal Access Token is valid

  1. Login to included-crab
  2. Open training_LASTNAME Rproj
  3. In the console run: usethis::git_sitrep()

  1. If your Personal Access Token is , you have to go ahead and reset it following the instructions on how to Set (or reset) your PAT.

  1. Set (or reset) your PAT
Setting up your PAT
  1. Run usethis::create_github_token() in the Console.
  2. A new browser window should open up to GitHub, showing all the scopes options. You can review the scopes, but you don’t need to worry about which ones to select this time. Using create_github_token() automatically pre-selects some recommended scopes. Go ahead and scroll to the bottom and click “Generate Token”.
  3. Copy the generated token.
  4. Back in RStudio, run gitcreds::gitcreds_set() in the Console.
  5. Paste your PAT when the prompt asks for it.
  6. Last thing, run usethis::git_sitrep() in the Console to check your Git configuration and that you’ve successful stored your PAT.
  1. Check your PAT is .
  • In the console run usethis::git_sitrep()
  • Expect to see this result:

19 Regular Expressions

19.1 Introduction

Regular expressions are a fantastic tool for filtering and even extracting information out of strings of characters such as site codes, titles, or even entire documents. Regular expressions follow a custom syntax that we’ll need to learn but they are worth learning because:

  • Regular expressions can do things other methods cannot
  • Regular expressions can be used with many other languages and tools so it’s a learn-once, use-everywhere kind of tool

But they’re something that you only need to learn a bit of to get a lot of value out of them. I often use fairly simple regular expressions, like the ones we used on the command line,

ls *.qmd

19.2 Learning Outcomes

Students should:

  • Understand when regular expressions are appropriate
  • Have an introductory-level awareness of regular expression syntax
  • Have some experience executing and working with regular expressions in R

19.3 Lesson

Earlier this week, we used some simple regular expression on the command line (terminal). The same type of operations we used on the command line work in R:

getwd() # Like pwd()
[1] "/home/runner/work/repro-research-course/repro-research-course/nceas-training/materials"
dir() # Like `ls()`
 [1] "_book"                "_quarto.yml"          "book.bib"            
 [4] "cover.png"            "data"                 "DESCRIPTION"         
 [7] "files"                "images"               "index.html"          
[10] "index.qmd"            "LICENSE.md"           "sections"            
[13] "session_01.html"      "session_01.qmd"       "session_02.html"     
[16] "session_02.qmd"       "session_03_files"     "session_03.html"     
[19] "session_03.qmd"       "session_04.html"      "session_04.qmd"      
[22] "session_05.html"      "session_05.qmd"       "session_06.html"     
[25] "session_06.qmd"       "session_07.html"      "session_07.qmd"      
[28] "session_08.html"      "session_08.qmd"       "session_09.html"     
[31] "session_09.qmd"       "session_10_files"     "session_10.html"     
[34] "session_10.qmd"       "session_11.html"      "session_11.qmd"      
[37] "session_12_files"     "session_12.html"      "session_12.qmd"      
[40] "session_13.html"      "session_13.qmd"       "session_14.html"     
[43] "session_14.qmd"       "session_15.qmd"       "session_16.html"     
[46] "session_16.qmd"       "session_17.qmd"       "session_17.rmarkdown"
[49] "session_18.qmd"       "session_19.qmd"       "session_20.qmd"      
[52] "shiny-demo"           "site_libs"            "slides"              
[55] "style.css"            "toc.css"             
library(stringr)
str_view_all(dir(), ".*qmd")
Warning: `str_view_all()` was deprecated in stringr 1.5.0.
ℹ Please use `str_view()` instead.
 [1] │ _book
 [2] │ _quarto.yml
 [3] │ book.bib
 [4] │ cover.png
 [5] │ data
 [6] │ DESCRIPTION
 [7] │ files
 [8] │ images
 [9] │ index.html
[10] │ <index.qmd>
[11] │ LICENSE.md
[12] │ sections
[13] │ session_01.html
[14] │ <session_01.qmd>
[15] │ session_02.html
[16] │ <session_02.qmd>
[17] │ session_03_files
[18] │ session_03.html
[19] │ <session_03.qmd>
[20] │ session_04.html
... and 36 more
str_view_all(dir(), ".*html")
 [1] │ _book
 [2] │ _quarto.yml
 [3] │ book.bib
 [4] │ cover.png
 [5] │ data
 [6] │ DESCRIPTION
 [7] │ files
 [8] │ images
 [9] │ <index.html>
[10] │ index.qmd
[11] │ LICENSE.md
[12] │ sections
[13] │ <session_01.html>
[14] │ session_01.qmd
[15] │ <session_02.html>
[16] │ session_02.qmd
[17] │ session_03_files
[18] │ <session_03.html>
[19] │ session_03.qmd
[20] │ <session_04.html>
... and 36 more

Let’s start off with a simple example of where simpler methods won’t work and see how regular expressions can be used to get what we need done. Let’s say we just received some data we need to analyze and we find this:

site_data <- read.csv("data/site_data.csv", stringsAsFactors = FALSE)
site_data
                           x    temp_c
1          2000-copany bay-2  9.247435
2  2001-choctawhatchee bay-2 29.170777
3         2002-aransas bay-3 62.351057
4  2003-choctawhatchee bay-4 89.888624
5  2004-choctawhatchee bay-4 96.958163
6  2005-choctawhatchee bay-2 49.894849
7  2006-choctawhatchee bay-4 53.401312
8       2007-galveston bay-2 54.335877
9  2008-choctawhatchee bay-1  7.279786
10         2009-copany bay-3  8.806454
11 2000-choctawhatchee bay-2 21.353557
12         2001-copany bay-1  3.229220
13      2002-galveston bay-4 71.312880
14 2003-choctawhatchee bay-4 42.640502
15         2004-copany bay-1 92.070634
16      2005-galveston bay-1 95.573717
17 2006-choctawhatchee bay-3 99.215221
18         2007-copany bay-1 80.570198
19      2008-galveston bay-1 98.582018
20      2009-galveston bay-2 46.406132
21      2000-galveston bay-1 39.235548
22 2001-choctawhatchee bay-2 95.831200
23         2002-copany bay-1 49.300697
24         2003-copany bay-3 29.875656
25         2004-copany bay-1 58.682873
26         2005-copany bay-1 19.943774
27      2006-galveston bay-4 55.101811
28      2007-galveston bay-4 77.270760
29         2008-copany bay-2 25.395214
30      2009-galveston bay-4 51.966997
31 2000-choctawhatchee bay-4 83.095610
32         2001-copany bay-2 40.698851
33      2002-galveston bay-3 24.973809
34 2003-choctawhatchee bay-1 44.232596
35      2004-galveston bay-3 59.023020
36 2005-choctawhatchee bay-4 59.439810
37         2006-copany bay-3 63.713113
38         2007-copany bay-1 63.446845
39 2008-choctawhatchee bay-2 12.215281
40      2009-galveston bay-4 24.810948

It looks like the author of the dataset mixed the year of measurements, site code (e.g., A, CCCC, etc.), and some sub-site code (e.g., 1, 2, 3, etc.) into a single column. If we wanted to, for example, calculate mean temperature by site, we’d need to split these up somehow into separate columns. How could we go about this? We could start with substr which lets us slice a string by its indices:

substr(site_data$x, 1, 4)
 [1] "2000" "2001" "2002" "2003" "2004" "2005" "2006" "2007" "2008" "2009"
[11] "2000" "2001" "2002" "2003" "2004" "2005" "2006" "2007" "2008" "2009"
[21] "2000" "2001" "2002" "2003" "2004" "2005" "2006" "2007" "2008" "2009"
[31] "2000" "2001" "2002" "2003" "2004" "2005" "2006" "2007" "2008" "2009"
substr(site_data$x, 5, 16)
 [1] "-copany bay-" "-choctawhatc" "-aransas bay" "-choctawhatc" "-choctawhatc"
 [6] "-choctawhatc" "-choctawhatc" "-galveston b" "-choctawhatc" "-copany bay-"
[11] "-choctawhatc" "-copany bay-" "-galveston b" "-choctawhatc" "-copany bay-"
[16] "-galveston b" "-choctawhatc" "-copany bay-" "-galveston b" "-galveston b"
[21] "-galveston b" "-choctawhatc" "-copany bay-" "-copany bay-" "-copany bay-"
[26] "-copany bay-" "-galveston b" "-galveston b" "-copany bay-" "-galveston b"
[31] "-choctawhatc" "-copany bay-" "-galveston b" "-choctawhatc" "-galveston b"
[36] "-choctawhatc" "-copany bay-" "-copany bay-" "-choctawhatc" "-galveston b"

But we’d quickly find that, because the number of characters in the site code varies from one to four, we can’t extract just the site code. These are the types of problems where regular expressions come in handy.

Before we start, we’re going to use the str_view_all function from the stringr package which shows a nice display of the result of executing a regular expression against our strings. In real use, we would use another function to actually get and work with the result.

library(stringr)
str_view_all(site_data$x, "[a-z ]+")
 [1] │ 2000-<copany bay>-2
 [2] │ 2001-<choctawhatchee bay>-2
 [3] │ 2002-<aransas bay>-3
 [4] │ 2003-<choctawhatchee bay>-4
 [5] │ 2004-<choctawhatchee bay>-4
 [6] │ 2005-<choctawhatchee bay>-2
 [7] │ 2006-<choctawhatchee bay>-4
 [8] │ 2007-<galveston bay>-2
 [9] │ 2008-<choctawhatchee bay>-1
[10] │ 2009-<copany bay>-3
[11] │ 2000-<choctawhatchee bay>-2
[12] │ 2001-<copany bay>-1
[13] │ 2002-<galveston bay>-4
[14] │ 2003-<choctawhatchee bay>-4
[15] │ 2004-<copany bay>-1
[16] │ 2005-<galveston bay>-1
[17] │ 2006-<choctawhatchee bay>-3
[18] │ 2007-<copany bay>-1
[19] │ 2008-<galveston bay>-1
[20] │ 2009-<galveston bay>-2
... and 20 more

The expression we used above, [a-z ]+, is equivalent to asking for the first consecutive run of the letters a-z or ” ” (a space) in the entire string of characters. This is the type of problem regular expression were created for!

19.4 Overview of Regular Expressions

Regular expressions can match things literally, e.g.,

str_detect("grouper", "striper")
[1] FALSE
str_detect("grouper", "grouper")
[1] TRUE

but they also support a large set of special characters:

  • .: Match any character
fish <- c("grouper", "striper", "sheepshead")
str_view_all(fish, ".p")
[1] │ gro<up>er
[2] │ str<ip>er
[3] │ she<ep>shead

If you actually want to match a period and not any character, you have to do what’s called escaping:

fish <- c("stripers", "striper.", "grouper")
str_view_all(fish, "striper\\.")
[1] │ stripers
[2] │ <striper.>
[3] │ grouper

See how that regular expression only matched the striper with the period at the end and not the string stripers?

  • []: Match any character in this set
fish <- c("grouper", "striper", "sheepshead")
str_view_all(fish, "[aeiou]")
[1] │ gr<o><u>p<e>r
[2] │ str<i>p<e>r
[3] │ sh<e><e>psh<e><a>d
  • [^]: Match any character not in this set
fish <- c("grouper", "striper", "sheepshead")
str_view_all(fish, "[^aeiou]")
[1] │ <g><r>ou<p>e<r>
[2] │ <s><t><r>i<p>e<r>
[3] │ <s><h>ee<p><s><h>ea<d>
  • \s & \S: Match any whitespace (e.g., , \t)
fish <- c("gag grouper", "striper", "red drum")
str_view_all(fish, "\\s") # Note the double \\ before the s. This is an R-specific thing.
[1] │ gag< >grouper
[2] │ striper
[3] │ red< >drum
                           # many of our special characters must be preceded by a \\
str_view_all(fish, "\\S")
[1] │ <g><a><g> <g><r><o><u><p><e><r>
[2] │ <s><t><r><i><p><e><r>
[3] │ <r><e><d> <d><r><u><m>

Note that the lower case version \\s selects any whitespace characters, whereas the uppercase version \\S selects all non-whitespace characters. The next pattern is analogous for digits:

  • \d & \D: Match any digit, equivalent to [0-9]
fish <- c("striper1", "red drum2", "tarpon123")
str_view_all(fish, "\\d")
[1] │ striper<1>
[2] │ red drum<2>
[3] │ tarpon<1><2><3>
  • \w & \W: Match any word character, equivalent to [A-Za-z0-9_]
fish <- c("striper1", "red drum2", "tarpon123")
str_view_all(fish, "\\w")
[1] │ <s><t><r><i><p><e><r><1>
[2] │ <r><e><d> <d><r><u><m><2>
[3] │ <t><a><r><p><o><n><1><2><3>

We can also specify how many of a particular character or class of character to match:

  • ? Optionality / 0 or 1

Say we want to get just the phone numbers out of this vector but we notice that the phone numbers take on some different formats:

phone_numbers <- c("219 733 8965", "apple", "329-293-8753 ", "123banana", "595.794.7569", "3872876718")
str_view_all(phone_numbers, "\\d\\d\\d[ \\.-]?\\d\\d\\d[ \\.-]?\\d\\d\\d\\d")
[1] │ <219 733 8965>
[2] │ apple
[3] │ <329-293-8753> 
[4] │ 123banana
[5] │ <595.794.7569>
[6] │ <3872876718>

The above regular expression matches the number parts of the phone numbers, which can be separated by zero or one space (), ., or -.

  • + 1 -> infinity

We can use the + expression to find words with one or more vowels:

fish <- c("gag grouper", "striper", "red drum", "cobia", "sheepshead")
str_view_all(fish, "[aeiuo]+")
[1] │ g<a>g gr<ou>p<e>r
[2] │ str<i>p<e>r
[3] │ r<e>d dr<u>m
[4] │ c<o>b<ia>
[5] │ sh<ee>psh<ea>d
  • * 0 -> infinity

and the * is zero or more`.

numbers <- c("0.2", "123.1", "547")
str_view_all(numbers, "\\d*\\.?\\d*")
[1] │ <0.2><>
[2] │ <123.1><>
[3] │ <547><>
# Regular expressions are greedy
letters <- "abcdefghijkc"
str_view_all(letters, "a.*c") # Greedy
[1] │ <abcdefghijkc>
str_view_all(letters, "a.*?c") # Lazy
[1] │ <abc>defghijkc
  • (): Grouping

One of the most powerful parts of regular expressions is grouping. Grouping allows us to split up our matched expressions and do more work with them. For example, we can create match the city and state in a set of addresses, splitting it into components:

addresses <- c("Santa Barbara, CA", "Seattle, WA", "New York, NY")
str_view_all(addresses, "([\\w\\s]+), (\\w+)")
[1] │ <Santa Barbara, CA>
[2] │ <Seattle, WA>
[3] │ <New York, NY>
str_match_all(addresses, "([\\w\\s]+), (\\w+)")
[[1]]
     [,1]                [,2]            [,3]
[1,] "Santa Barbara, CA" "Santa Barbara" "CA"

[[2]]
     [,1]          [,2]      [,3]
[1,] "Seattle, WA" "Seattle" "WA"

[[3]]
     [,1]           [,2]       [,3]
[1,] "New York, NY" "New York" "NY"

Once we use groups, (), we can also use back references to work with the result. Back references are \ and a number, where \1 is the first thing in (), \2 is the second thing in (), and so on.

str_replace(addresses, "([\\w\\s]+), (\\w+)", "City: \\1, State: \\2")
[1] "City: Santa Barbara, State: CA" "City: Seattle, State: WA"      
[3] "City: New York, State: NY"     
  • ^ & $

It can also be really useful to make a say something like “strings that start with a capital letter” or “strings that end with a period”:

possible_sentences <- c(
  "This might be a sentence.",
  "So. Might. this",
  "but this could maybe not be?",
  "Am I a sentence?",
  "maybe not",
  "Regular expressions are useful!"
)
# ^ specifies the start so ^[A-z] means "starts with a capital letter""
str_detect(possible_sentences, "^[A-Z]")
[1]  TRUE  TRUE FALSE  TRUE FALSE  TRUE
possible_sentences[str_detect(possible_sentences, "^[A-Z]")]
[1] "This might be a sentence."       "So. Might. this"                
[3] "Am I a sentence?"                "Regular expressions are useful!"
# We can also do "ends with a period"
str_detect(possible_sentences, "\\.$")
[1]  TRUE FALSE FALSE FALSE FALSE FALSE
possible_sentences[str_detect(possible_sentences, "\\.$")]
[1] "This might be a sentence."
# We can put them together:
str_detect(possible_sentences, "^[A-Z].*[\\.\\?!]$")
[1]  TRUE FALSE FALSE  TRUE FALSE  TRUE
possible_sentences[str_detect(possible_sentences, "^[A-Z].*[\\.\\?!]$")]
[1] "This might be a sentence."       "Am I a sentence?"               
[3] "Regular expressions are useful!"

19.5 Finish out our example together

Now that we’ve gone over some basics of regular expressions, let’s finish our example by splitting the various components of column x into a year, site, and sub_site column:

site_data
                           x    temp_c
1          2000-copany bay-2  9.247435
2  2001-choctawhatchee bay-2 29.170777
3         2002-aransas bay-3 62.351057
4  2003-choctawhatchee bay-4 89.888624
5  2004-choctawhatchee bay-4 96.958163
6  2005-choctawhatchee bay-2 49.894849
7  2006-choctawhatchee bay-4 53.401312
8       2007-galveston bay-2 54.335877
9  2008-choctawhatchee bay-1  7.279786
10         2009-copany bay-3  8.806454
11 2000-choctawhatchee bay-2 21.353557
12         2001-copany bay-1  3.229220
13      2002-galveston bay-4 71.312880
14 2003-choctawhatchee bay-4 42.640502
15         2004-copany bay-1 92.070634
16      2005-galveston bay-1 95.573717
17 2006-choctawhatchee bay-3 99.215221
18         2007-copany bay-1 80.570198
19      2008-galveston bay-1 98.582018
20      2009-galveston bay-2 46.406132
21      2000-galveston bay-1 39.235548
22 2001-choctawhatchee bay-2 95.831200
23         2002-copany bay-1 49.300697
24         2003-copany bay-3 29.875656
25         2004-copany bay-1 58.682873
26         2005-copany bay-1 19.943774
27      2006-galveston bay-4 55.101811
28      2007-galveston bay-4 77.270760
29         2008-copany bay-2 25.395214
30      2009-galveston bay-4 51.966997
31 2000-choctawhatchee bay-4 83.095610
32         2001-copany bay-2 40.698851
33      2002-galveston bay-3 24.973809
34 2003-choctawhatchee bay-1 44.232596
35      2004-galveston bay-3 59.023020
36 2005-choctawhatchee bay-4 59.439810
37         2006-copany bay-3 63.713113
38         2007-copany bay-1 63.446845
39 2008-choctawhatchee bay-2 12.215281
40      2009-galveston bay-4 24.810948
# I'll show you how to extract the year part
site_data$year <- str_extract(site_data$x, "\\d{4}")

# You do the rest
site_data$site <- str_extract(site_data$x, "") # <- Fill this in between the ""
site_data$plot <- str_extract(site_data$x, "") # <- Fill this in between the ""

19.6 Common R functions that use regular expressions

  • Base R
    • grep
    • gsub
    • strsplit
  • stringr package
    • string_detect
    • string_match
    • string_replace
    • string_split

19.6.1 Another example

Data often come to us in strange forms and, before we can even begin analyzing the data, we have to do a lot of work to sanitize what we’ve been given. An example, which I just got the other week were temporal data with dates formatted like this:

dates <- c("1July17",
           "02July2017",
           "3July17",
           "4July17")

and so on like that. Do you see how the day of the month and year are represented in different ways through the series? If we want to convert these strings into Date objects for further analysis, we’ll to do some pre-cleaning before we can do that conversion. Regular expressions work great here.

str_match_all(dates, "(\\d{1,2})([A-Za-z]+)(\\d{2,4})")
[[1]]
     [,1]      [,2] [,3]   [,4]
[1,] "1July17" "1"  "July" "17"

[[2]]
     [,1]         [,2] [,3]   [,4]  
[1,] "02July2017" "02" "July" "2017"

[[3]]
     [,1]      [,2] [,3]   [,4]
[1,] "3July17" "3"  "July" "17"

[[4]]
     [,1]      [,2] [,3]   [,4]
[1,] "4July17" "4"  "July" "17"

That above regular expression was complex. Let’s break it down into its main parts. Below, I’ve re-formatted the data and the regular expression a bit so we can see what’s going on.

|---------------|---------------| ---------|
| 1             | July          | 17       |
| 02            | July          | 2017     |
| 3             | July          | 17       |
| 4             | July          | 17       |
|---------------|---------------| ---------|
| \\d{1,2}      | [A-Za-z]+     | \\d{2,4} |
|---------------|---------------| ---------|

19.7 Summary

  • Regular expressions are a crucial tool in the data analysis toolbox
  • Regular expressions help us solve problems we may not be otherwise able to solve
  • Regular expressions are supported in many functions in R

19.8 More

  • Have the group figure out you can put * onto [].
  • Have the group str_split on fixed chars and a regex

19.9 Resources

19.10 Appendicies

Here’s the code I used to generate the fake site_data data.frame above.

# Fake data generation code
site_data <- data.frame(year = rep(seq(2000, 2009), 4))
site_data$site <- sample(c("galveston bay", "choctawhatchee bay", "aransas bay", "copany bay"), nrow(site_data), replace = TRUE)
site_data$subsite <- sample(c(1,2,3,4), nrow(site_data), replace=TRUE)
site_data$temp_c <- runif(nrow(site_data), 0, 100)
site_data$site_code <- paste(site_data$year, site_data$site, site_data$subsite, sep="-")
site_data <- site_data[,c("site_code", "temp_c")]
names(site_data) <- c("x", "temp_c")
write.csv(site_data, file = "site_data.csv", row.names = FALSE)