A string is a collection of characters placed between quotes.
A character is a single input from your keyboard (e.g. a single letter or a single punctuation mark).
string1 <- "Hi!"string2 <- 'Hello, I am C-3PO, it is a pleasure to meet you.'
You can combine strings in a vector.
string3 <- c("It's against", "my programming", "to use inconsistent notation.")string3
## [1] "It's against" "my programming" ## [3] "to use inconsistent notation."
stringr
library(stringr)
... but it's also included in the tidyverse
!
library(stringr)
... but it's also included in the tidyverse
!
stringr
provides many tools to work with strings, including functions that
count the characters in a string: str_count()
concatenate string vectors str_c()
detect patterns str_detect()
trim whitespace str_trim()
library(stringr)
... but it's also included in the tidyverse
!
stringr
provides many tools to work with strings, including functions that
count the characters in a string: str_count()
concatenate string vectors str_c()
detect patterns str_detect()
trim whitespace str_trim()
Begin with str_
All take a vector of strings as their first argument
Why doesn't the code below work?
string3 <- "I say "Hello" to the class"
## Error: <text>:1:20: unexpected symbol## 1: string3 <- "I say "Hello## ^
Why doesn't the code below work?
string3 <- "I say "Hello" to the class"
## Error: <text>:1:20: unexpected symbol## 1: string3 <- "I say "Hello## ^
To include a double quote in a string, escape it using a backslash \
.
Why doesn't the code below work?
string3 <- "I say "Hello" to the class"
## Error: <text>:1:20: unexpected symbol## 1: string3 <- "I say "Hello## ^
To include a double quote in a string, escape it using a backslash \
.
string4 <- "I say \"Hello\" to the class"
Why doesn't the code below work?
string3 <- "I say "Hello" to the class"
## Error: <text>:1:20: unexpected symbol## 1: string3 <- "I say "Hello## ^
To include a double quote in a string, escape it using a backslash \
.
string4 <- "I say \"Hello\" to the class"
What if you want to include an actual backslash?
Why doesn't the code below work?
string3 <- "I say "Hello" to the class"
## Error: <text>:1:20: unexpected symbol## 1: string3 <- "I say "Hello## ^
To include a double quote in a string, escape it using a backslash \
.
string4 <- "I say \"Hello\" to the class"
What if you want to include an actual backslash?
string5 <- "\\"
This may seem tedious but it will come up later!
writeLines
writeLines
shows the contents of the string not
including escapes.
string4
## [1] "I say \"Hello\" to the class"
writeLines(string4)
## I say "Hello" to the class
string5
## [1] "\\"
writeLines(string5)
## \
RockYou developed software for social media platforms such as MySpace and Facebook
Stored user passwords in plain text files
Hacked in 2009 and over 32 million passwords leaked
Let's look at the first 20
rockyou20 <- rockyou[1:20] rockyou20
## [1] "123456" "12345" "123456789" "password" "iloveyou" "princess" ## [7] "1234567" "rockyou" "12345678" "abc123" "nicole" "daniel" ## [13] "babygirl" "monkey" "lovely" "jessica" "654321" "michael" ## [19] "ashley" "qwerty"
str_length
Given a string, return the number of characters.
password = "qwerty"str_length(password)
## [1] 6
Given a vector of strings, return the number of characters in each string.
str_length(rockyou20)
## [1] 6 5 9 8 8 8 7 7 8 6 6 6 8 6 6 7 6 7 6 6
rockyou20
## [1] "123456" "12345" "123456789" "password" "iloveyou" "princess" ## [7] "1234567" "rockyou" "12345678" "abc123" "nicole" "daniel" ## [13] "babygirl" "monkey" "lovely" "jessica" "654321" "michael" ## [19] "ashley" "qwerty"
str_length
Given a string, return the number of characters.
password = "qwerty"str_length(password)
## [1] 6
Given a vector of strings, return the number of characters in each string.
str_length(rockyou20)
## [1] 6 5 9 8 8 8 7 7 8 6 6 6 8 6 6 7 6 7 6 6
rockyou20
## [1] "123456" "12345" "123456789" "password" "iloveyou" "princess" ## [7] "1234567" "rockyou" "12345678" "abc123" "nicole" "daniel" ## [13] "babygirl" "monkey" "lovely" "jessica" "654321" "michael" ## [19] "ashley" "qwerty"
str_c
Combine two or more strings.
str_c("My", "password", "is", "qwerty")
## [1] "Mypasswordisqwerty"
str_c
Combine two or more strings.
str_c("My", "password", "is", "qwerty")
## [1] "Mypasswordisqwerty"
Use sep
to specify how the strings are separated.
str_c("My", "password", "is", "qwerty", sep = " ")
## [1] "My password is qwerty"
str_to_lower
and str_to_upper
Convert the case of a string from lower to upper or upper to lower.
str_to_upper(rockyou20)
## [1] "123456" "12345" "123456789" "PASSWORD" "ILOVEYOU" "PRINCESS" ## [7] "1234567" "ROCKYOU" "12345678" "ABC123" "NICOLE" "DANIEL" ## [13] "BABYGIRL" "MONKEY" "LOVELY" "JESSICA" "654321" "MICHAEL" ## [19] "ASHLEY" "QWERTY"
str_sub
Extract parts of a string from start
to end
, inclusive.
str_sub(rockyou20, 1, 4)
## [1] "1234" "1234" "1234" "pass" "ilov" "prin" "1234" "rock" "1234" "abc1"## [11] "nico" "dani" "baby" "monk" "love" "jess" "6543" "mich" "ashl" "qwer"
str_sub
Extract parts of a string from start
to end
, inclusive.
str_sub(rockyou20, 1, 4)
## [1] "1234" "1234" "1234" "pass" "ilov" "prin" "1234" "rock" "1234" "abc1"## [11] "nico" "dani" "baby" "monk" "love" "jess" "6543" "mich" "ashl" "qwer"
str_sub(rockyou20, -4, -1)
## [1] "3456" "2345" "6789" "word" "eyou" "cess" "4567" "kyou" "5678" "c123"## [11] "cole" "niel" "girl" "nkey" "vely" "sica" "4321" "hael" "hley" "erty"
str_sub
and str_to_upper
Can combine str_sub
and str_to_upper
to capitalize each password.
str_sub(rockyou20, 1, 1) <- str_to_upper(str_sub(rockyou20, 1, 1))rockyou20
## [1] "123456" "12345" "123456789" "Password" "Iloveyou" "Princess" ## [7] "1234567" "Rockyou" "12345678" "Abc123" "Nicole" "Daniel" ## [13] "Babygirl" "Monkey" "Lovely" "Jessica" "654321" "Michael" ## [19] "Ashley" "Qwerty"
str_sort
Sort a string. Here we sort in decreasing alphabetical order.
str_sort(rockyou20, decreasing = TRUE)
## [1] "Rockyou" "Qwerty" "Princess" "Password" "Nicole" "Monkey" ## [7] "Michael" "Lovely" "Jessica" "Iloveyou" "Daniel" "Babygirl" ## [13] "Ashley" "Abc123" "654321" "123456789" "12345678" "1234567" ## [19] "123456" "12345"
A regular expression is a sequence of characters that allows you to describe string patterns. We use them to search for patterns.
To demonstrate the power of regular expressions, let's see if any of the 32 million leaked passwords contain the exact phrase "dog"
str_subset(rockyou, "dog")[1:30]
## [1] "catdog" "hotdog" "bulldogs" "bulldog" "doggie" ## [6] "bigdog" "maddog" "snoopdogg" "puppydog" "doggy" ## [11] "dog123" "snoopdog" "ilovedogs" "doggies" "luckydog" ## [16] "catdog1" "dogdog" "reddog" "bulldog1" "mollydog" ## [21] "hotdog1" "bulldogs1" "dogcat" "doggy1" "hotdogs" ## [26] "dogsrule" "thedog" "catsanddogs" "topdog" "daisydog"
What about "d-g"? Match any character using .
str_subset(rockyou, "d.g")[1:30]
## [1] "asdfgh" "asdfghjkl" "catdog" "hotdog" "bulldogs" ## [6] "bulldog" "asdfg" "doggie" "bigdog" "maddog" ## [11] "digger" "digimon" "digital" "candygirl" "snoopdogg" ## [16] "puppydog" "doggy" "dog123" "snoopdog" "asdfghj" ## [21] "ilovedogs" "doggies" "asdfghjk" "luckydog" "catdog1" ## [26] "indigo" "dogdog" "madagascar" "reddog" "bulldog1"
Match the start of a string using ^
str_view_all(rockyou20, "^P")
rockyou20
## [1] "123456" "12345" "123456789" "password" "iloveyou" "princess" ## [7] "1234567" "rockyou" "12345678" "abc123" "nicole" "daniel" ## [13] "babygirl" "monkey" "lovely" "jessica" "654321" "michael" ## [19] "ashley" "qwerty"
Match the end of a string using $
str_view_all(rockyou20, "u$", match = TRUE)
str_detect
Determine if a character vector matches a pattern.
rockyou20
## [1] "123456" "12345" "123456789" "password" "iloveyou" "princess" ## [7] "1234567" "rockyou" "12345678" "abc123" "nicole" "daniel" ## [13] "babygirl" "monkey" "lovely" "jessica" "654321" "michael" ## [19] "ashley" "qwerty"
str_detect(rockyou20, "a")
## [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE## [13] TRUE FALSE FALSE TRUE FALSE TRUE TRUE FALSE
str_count
How many matches are there in a string?
rockyou20
## [1] "123456" "12345" "123456789" "password" "iloveyou" "princess" ## [7] "1234567" "rockyou" "12345678" "abc123" "nicole" "daniel" ## [13] "babygirl" "monkey" "lovely" "jessica" "654321" "michael" ## [19] "ashley" "qwerty"
str_count(rockyou20, "s")
## [1] 0 0 0 2 0 2 0 0 0 0 0 0 0 0 0 2 0 0 1 0
str_replace_all
Replace all matches with new strings.
str_replace_all(rockyou20, "s", "-")
## [1] "123456" "12345" "123456789" "pa--word" "iloveyou" "prince--" ## [7] "1234567" "rockyou" "12345678" "abc123" "nicole" "daniel" ## [13] "babygirl" "monkey" "lovely" "je--ica" "654321" "michael" ## [19] "a-hley" "qwerty"
The regular expressions below match more than one character.
\d
or [[:digit:]]
\s
or [[:space:]]
[fgh]
[^fgh]
[a-z]
or [[:lower:]]
[A-Z]
or [[:upper:]]
[A-z]
or [[:alpha:]]
Remember these are regular expressions! To match digits you'll need to escape
the string, so use "\\d"
, not "\d"
stringr
website: https://stringr.tidyverse.org/stringr
and regex
cheat sheetA string is a collection of characters placed between quotes.
A character is a single input from your keyboard (e.g. a single letter or a single punctuation mark).
string1 <- "Hi!"string2 <- 'Hello, I am C-3PO, it is a pleasure to meet you.'
You can combine strings in a vector.
string3 <- c("It's against", "my programming", "to use inconsistent notation.")string3
## [1] "It's against" "my programming" ## [3] "to use inconsistent notation."
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |