In this homework assignment, you will…
hw01-username
repo.We will use the tidyverse package for this assignment. If you wish to use the viridis color palettes, you will need the viridis package as well.
library(tidyverse)
library(ggridges)
library(viridis) #optional
Today, we will be working with National Women’s Soccer League (NWSL) team data from the first three full seasons of NC Courage, located near Duke in Cary, NC. The Courage moved to the Triangle from Western New York in 2017 and had three very successful first seasons, culminating in winning the NWSL championship game that was held at their home stadium in Cary in 2019! (Data for this lab was sourced from the nwslR package, and verified with the NC Courage website by Meredith Brown in a previous semester.)
<- read_csv("data/courage.csv") courage
The variables in the dataset are as follows:
game_id
: an ID for the game that identifies the teams and the date.game_date
: the date of the gamegame_number
: the order of the game in the season (i.e., 1st, 2nd, etc.)home_team
: the name of the home teamaway_team
: the name of the away teamopponent
: the name of the Courage’s opponenthome_pts
: the number of points scored by the home teamaway_pts
: the number of points scored by the away teamresult
: the outcome of the game from the Courage’s perspectiveseason
: the season the game took place in (i.e., 2017, 2018, 2019)As we’ve discussed in lecture, your plots should include an informative title, axes should be labeled, and careful consideration should be given to aesthetic choices.
In addition, the code should not exceed the 80 character limit, so that all the code can be read when you knit to PDF. See the Lab 02 instructions for instructions to add a margin line at column 80.
Remember that continuing to develop a sound workflow for reproducible data analysis is important as you complete work in this course and beyond. This assignment contains reminders to knit, commit, and push your changes to GithHub. You should have at least 3 commits with meaningful commit messages by the end of the assignment.
How many rows are in the courage
dataset? How many columns? Include the code and resulting output used to support your answer.
Create a bar chart to visualize the distribution of the result
of the games with count on the y-axis
. Include a clear title and axis labels. What outcome occurred most frequently?
🧶 ✅ ⬆️ Now is a good time to knit, commit, and push.
Now, let’s examine how the Courage performed in each individual season. Create a stacked bar plot, showing one bar for each season, with the number of games on the y-axis going from 0-26, and the fill determined by result. You are encouraged (but not required) to use the viridis color palette. Include a clear title and axis labels. What do you observe from the plot?
Now let’s consider the distribution of points scored by the Courage in a game for all seasons. Make a histogram of the total number of points scored by Courage in a game. Use the histogram to describe the distribution of points scored by Courage.
To get started use the code below to create two new columns:
courage_points
: the number of points scored by Courage in a gamecourage_home
: whether or not Courage was the home team (you will use this variable later on).<- courage %>%
courage mutate(courage_points = if_else(home_team == "NC", home_pts, away_pts),
courage_home = if_else(home_team == "NC", "Home", "Away"))
🧶 ✅ ⬆️ Now is another good time to knit, commit, and push.
Does Courage have a home field advantage? To explore this question,
geom_density_ridges()
. here.Each of Courage’s seasons had 26 games, including playoff games. Do the total number of points scored in a game change over the course of a season? For example, do the total number of points decrease, perhaps due to fatigue, or do they increase over a season as teams get into a groove? To explore this question:
total_points
.geom_jitter()
to create a scatterplot of the total points versus game number. The function geom_jitter()
adds some random noise to the points so they don’t overlap each other.🧶 ✅ ⬆️ Now is another good time to knit, commit, and push.
Let’s explore if the observations from the previous exercise differ by season. Create a new plot that builds upon the plot from the previous exercise by coloring the points by season
and using geom_smooth()
to show the general trend for each season. Include the argument se = FALSE
to omit the bands around the smoothed curves.
Now, let’s focus just on points scored by Courage. when looking only at the Courage. Make a scatter plot to visualize the relationship between game_number
and courage_points
, faceted by season
.
🧶 ✅ ⬆️ Now is another good time to knit, commit, and push.
Once you are finished with the assignment, you will submit the PDF document produced from your final knit, commit, and push to Gradescope.
Before you wrap up the assignment, make sure all documents are updated on your GitHub repo. We will be checking these to make sure you have been practicing how to commit and push changes. Remember – you must turn in a .pdf file to the Gradescope page by the submission deadline to be considered “on time”.
To submit your assignment:
Component | Points |
---|---|
Ex 1 | 2 |
Ex 2 | 6 |
Ex 3 | 6 |
Ex 4 | 4 |
Ex 5 | 6 |
Ex 6 | 8 |
Ex 7 | 6 |
Ex 8 | 6 |
Workflow & formatting | 6 |
Grading notes: