filter()
: pick rows matching criteriaselect()
: pick columns by namemutate()
: add new variablesslice()
: pick rows using indicesarrange()
: reorder rowsgroup_by()
: for grouped operationssummarize()
: calculate summary statisticsWhen submitting your lab in gradescope, connect each question with exactly all pages the question appears on. WARNING: failure to do so will result in loss of points.
Find lab03-username
at https://github.com/sta199-f21. Copy the SSH URL. Close any open projects in RStudio and go to file > new project > version control and paste the SSH URL to clone your repository.
Open the lab03.Rmd
template and update the YAML header with your name and today’s date. Then, knit the document and make sure the resulting PDF file has the correct date. Stage, commit, and push your changes.
Write your answers in the lab03.Rmd
template file. Your assignment should have at least three meaningful commits and all code chunks should have informative names.
We will also load the tidyverse
package as usual.
library(tidyverse)
This data was originally collected for a FiveThirtyEight piece. The version of the avengers
data we will work with here can be uploaded from the avengers.csv
file provided with this lab. This dataset covers the entire Marvel Cinematic Universe(MCU), so some of the names will be familiar if you are a fan of the films, while other might be from the comics. Don’t worry if you aren’t a Marvel fan though, no background knowledge is needed to succeed on this lab!
More information on the variables is available at this link this link.
filter
, create a new dataset with only Avengers that were created 1) in 1970 or earlier and 2) were not given probationary status. You should call the new dataset classic_avengers
(hint: there should be 27 rows in this dataset.).Now would be a good time to knit, commit, and push.
Who are the newest Classic Avengers? Create a new variable called years_served
that represents the number of years avengers served as of 2021. (Hint: you can use either the year
variable or years_since_joining
variables to do this.) Then arrange the dataset in ascending order of years_served
, select the name
and years_served
variables and print the first three rows. Report in words the names of the three newest Classic Avengers and how long they have served. Please do this using a single pipeline.
Has the percentage of female Avengers changed over time? Find the percentage of female Avengers in the classic_avengers
dataset (hint: could create a new numeric variable e.g. 0 for male, 1 for female). Next find the percentage of female Avengers in the original, complete dataset. Has the addition of new avengers since 1970 changed the proportion of female avengers? Please describe in words what you observe.
Now might be a good time for another knit, commit, and push.
Arrange the overall avengers
dataset in descending order of appearances
and return only the columns name
, appearances
, death1
, and return1
. Slice the first five Avengers here. What do you notice about these Avengers in terms of deaths and returns? Spoiler alert
Using the avengers
dataset, use two dplyr
commands to examine the mean and median number of appearances for Avengers who have died at least once compared to those who have not died at least once. What do you learn about Marvel movies from your results? Also, does this tell you anything about the mean and median?
One more exercise, but first, knit, commit, and push!
geom_smooth()
with the argument se = FALSE
. Please include informative axis labels and an informative title. Can you observe any trends in the data? Which years tend to have the Avengers with the most appearances? Briefly explain whether you need to use any caution in interpreting any portions of this graph.Once you are fully satisfied with your lab, Knit to PDF to create a PDF document.
Follow the instructions in previous labs to submit your PDF to Gradescope.
When submitting your lab to gradescope, connect each question with exactly all pages the question appears on. WARNING: failure to do so will result in loss of points.
Overall: 50 pts.