Homework #02: Data Wrangling and Joins

due Thu September 23 at 11:59 PM

Goals

For this assignment you must have at least three meaningful commits and all of your code chunks must have informative names.

For your first commit, update your author name in the YAML header of the template R Markdown file.

All plots should follow the best visualization practices discussed in lecture, including an informative title, labeled axes, and careful consideration of aesthetic choices.

All code should follow the tidyverse style guidelines, including not exceeding the 80 character limit.

For every join function you should explicitly specify the by argument

Setup

The Class of 2018

We will work with the tidyverse package as usual. You may also want to use viridis.

library(tidyverse)
library(viridis)

In 2018, Democrats won the majority in Congress for the first time since the Tea Party wave in 2010. Yet within the Democratic Party, a wide variety of ideologies and perspectives exist. This dataset works with three related datasets to answer questions about Democratic members of Congress who served in the 116th Congress (2017-2019).

One ideology measure we will work with are DW-Nominate scores. These are created using advanced statistical methods. For the purpose of this assignment, we will be focusing on 1st Dimension DW-Nominate scores. These scores generally vary from -1 (most liberal) to 1 (most conservative). Since we are working with Democrats, all scores will be negative. If you are interested in learning more about this measure, this article provides a primer about it. (Reading it is not required for this assignment.)

A brief description of the datasets and how they are related to each other is provided below.

The ideologies dataset contains information on Democratic representatives’ ideologies. Observations are uniquely identified by bioname and icpsr.

The variables in this dataset are:

The district_info dataset contains information about the representatives’ district. Observations are uniquely identified by bioname and bioguide_id.

The variables in this dataset are:

Members of Congress typically join a series of caucuses with representatives who have similar interests, districts, or ideologies. Within the Democratic Party, two prominent caucuses are the Blue Dog Coalition, a group of more moderate Democrats, and the Congressional Progressive Caucus, which is made up of more progressive Democrats.

The caucus dataset contains three variables:

Looking at Democrats in the 116th Congress

  1. Let’s start by creating an analysis data set that includes information from all three data sets.

states <- as_tibble(cbind(state.abb, state.region))

The final full_data data frame should have 238 observations and 11 variables.

We will use full_data for the remainder of the assignment.

  1. We can see which states have the most progressive and most moderate Democratic delegations. Find the mean ideology by state and display the most progressive two states and most moderate two states. The ideology is measured by nominate_dim1. Show all code and output.
  1. Which 9 states do not have a Democratic representative? Use the states data frame and an appropriate join to help answer this question. Show all code and output, and report the names of the states in your narrative.

  2. Is there a relationship between the percentage of the vote Donald Trump received in a district in 2016 and the DW-Nominate score for the district’s representative? to answer this question:

  1. Now let’s look at data for caucus. Calculate the mean and standard deviation of ideology and the number of representatives for each caucus.
  1. Let’s examine how caucus membership differs by region. Create a plot of the number of representatives in each caucus faceted by region. Include an informative title and axis labels.
  1. Are younger Democrats more likely to be in the Progressive Caucus than older Democrats? To answer this question, create a new variable indicating whether the Democrat was born in the 1980’s (there has yet not been a Democrat elected to Congress who was born in the 1990s). Then, find the percentage of Democrats in each group (pre-1980 and 1980 or later) who are in the Progressive Caucus. Hint: As a step along the way, you will also want to create a variable indicating if they are a Progressive Caucus member using if_else or case_when.

Submission

Knit to PDF to create a PDF document. Stage and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.

Only upload your PDF document to Gradescope. Before you submit the uploaded document, mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages. Associate the “Overall” section with the first page.

Rubric