For this assignment you must have at least three meaningful commits and all of your code chunks must have informative names.
For your first commit, update your author name in the YAML header of the template R Markdown file.
All plots should follow the best visualization practices discussed in lecture, including an informative title, labeled axes, and careful consideration of aesthetic choices.
All code should follow the tidyverse style guidelines, including not exceeding the 80 character limit.
For every join
function you should explicitly specify the by
argument
hw02-username
, open a new project in RStudio and clone the project.We will work with the tidyverse
package as usual. You may also want to use viridis.
library(tidyverse)
library(viridis)
In 2018, Democrats won the majority in Congress for the first time since the Tea Party wave in 2010. Yet within the Democratic Party, a wide variety of ideologies and perspectives exist. This dataset works with three related datasets to answer questions about Democratic members of Congress who served in the 116th Congress (2017-2019).
One ideology measure we will work with are DW-Nominate scores. These are created using advanced statistical methods. For the purpose of this assignment, we will be focusing on 1st Dimension DW-Nominate scores. These scores generally vary from -1 (most liberal) to 1 (most conservative). Since we are working with Democrats, all scores will be negative. If you are interested in learning more about this measure, this article provides a primer about it. (Reading it is not required for this assignment.)
A brief description of the datasets and how they are related to each other is provided below.
The ideologies
dataset contains information on Democratic representatives’ ideologies. Observations are uniquely identified by bioname
and icpsr
.
The variables in this dataset are:
bioname
: The name of the reprentative.icpsr
: the ICPSR code given to the representative.state_icpsr
: The ICPSR number given to the representative’s state.district_code
: A code for the representative’s district.nominate_dim
: The representatives’ first dimension DW-Nominate Score.The district_info
dataset contains information about the representatives’ district. Observations are uniquely identified by bioname
and bioguide_id
.
The variables in this dataset are:
bioname
: The name of the reprentative.bioguide_id
: the id number in the Congressional Biogrphical Directory.state_abbrev
: The state abbreviation for the state that the member represents.trump16
: The percentage of the vote that Donald Trump received in the representatives district in 2016 (in theory, from 0 to 100). born
: The year the representative was born.Members of Congress typically join a series of caucuses with representatives who have similar interests, districts, or ideologies. Within the Democratic Party, two prominent caucuses are the Blue Dog Coalition, a group of more moderate Democrats, and the Congressional Progressive Caucus, which is made up of more progressive Democrats.
The caucus
dataset contains three variables:
state_icpsr
: The ICPSR number given to the representative’s state.district_code
: A code for the representative’s district.caucus
: The caucus the representative is a member of. There are three options for this variable: Blue Dog, Progressive, or Neither.First, join the district_info
to the ideologies
data frame. The goal is to keep all of the rows and columns in the ideologies
data frame. Call this new data set full_data
.
Next, use a join to add columns from the caucus
data frame to full_data
. Hint: when matches may be ambiguous, you can join by more than one variable. e.g. blank_join(caucus, by = c("variable1, "variable2))
.
Lastly, we need to add information to the data about the region a state is located in. We will use the region in later exercises. To do so, we will use information from two data sets that are already loaded as part of R - state.region and state.abb. Use the code below to create a tibble called states
that includes the state abbreviation and the region. Then, use an appropriate join to add the region from states
to full_data
.
states <- as_tibble(cbind(state.abb, state.region))
The final full_data data frame should have 238 observations and 11 variables.
We will use full_data
for the remainder of the assignment.
nominate_dim1
. Show all code and output.Which two states have the most progressive Democratic delegations? Which have the most moderate?
Are there any concerns you have with using these values to represent the mean ideology for a state’s delegation? Briefly explain.
Which 9 states do not have a Democratic representative? Use the states data frame and an appropriate join to help answer this question. Show all code and output, and report the names of the states in your narrative.
Is there a relationship between the percentage of the vote Donald Trump received in a district in 2016 and the DW-Nominate score for the district’s representative? to answer this question:
Find the correlation between these two variables.
Make a visualization showing the relationship between trump_16 and nominate_dim1. Include an informative title and axis labels.
Interpret the plot.
Which caucus is the most progressive?
Which group has the most variability in ideology?
Which is the largest group?
if_else
or case_when
.Knit to PDF to create a PDF document. Stage and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.
Only upload your PDF document to Gradescope. Before you submit the uploaded document, mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages. Associate the “Overall” section with the first page.
Ex 1: 8 pts.
Ex 2: 6 pts.
Ex 3: 4 pts.
Ex 4: 8 pts.
Ex 5: 6 pts.
Ex 6: 6 pts.
Ex 7: 6 pts.
Workflow and formatting - 6 pts
This includes having three meaningful commits, updating name, using tidyverse style and naming all code chunks.