Goals

Getting started

Go to the course GitHub organization and locate your lab02 repository, which should have the prefix lab02. Copy the URL (under green code > SSH tab) of the repository and clone in RStudio. If you have trouble, see the first lab for step-by-step instructions.

Write your answers in the lab02.Rmd template file. Update the YAML header with your name and today’s date. Then, knit the document and make sure the resulting PDF file has the correct date. Stage, commit, and push your changes.

Packages

We will use the tidyverse and viridis packages to make plots in R. Notice lab02.Rmd begins with a code chunk loading these libraries. Run this first.

Let’s take a trip to the Midwest

The data we will examine is loaded automatically with the tidyverse. It is called midwest and contains demographic information about midwestern counties.

To begin, familiarize yourself with the dataset by reading the documentation. Remember, you can pull up the documentation by running ?midwest in the console.

All plots should follow ``best visualization practices". Plots should include an informative title, axes should be labeled, and careful consideration should be given to aesthetic choices.

In addition, code and narrative should not exceed the 80 character limit. To help police this, add a vertical line at 80 characters by clicking “Tools” \(\rightarrow\) “Global Options” \(\rightarrow\) “Code” \(\rightarrow\) “Display”, then set “Margin Column” to 80 and click “Apply”.

Note: Your assignment should have at least three meaningful commits.

Exercises

  1. Make a histogram of the population density of counties. Set the bin width to 10. Please label your axes and give the plot a title. Does it appear that there are any outliers? Does the distribution appear to be symmetric or skewed?

Before going to Exercise 2, now would be a good time to do your first knit and then commit and push.

  1. Generate a scatterplot of percentage of people with a college degree (percollege) versus percentage below poverty (percbelowpoverty) with points colored by state. Please label your axes and legend and give the plot a title.

(Hint: if you would like to use the viridis palette, the line scale_color_viridis(discrete=TRUE, option = "C", name = "State") will be useful to your code.)

  1. Describe what you observe in the plot for the previous exercise. Do you observe the same pattern for every state?

  2. Now, examine the relationship between the same two variables, with a separate plot for each state. Hint: use faceting. Please label your axes and add a line of best fit without standard errors (i.e., please set se = FALSE). Which plot do you prefer? Briefly explain your choice.

Now would be a good time to knit, commit, and push, again.

  1. Do some states have counties that tend to be geographically larger than others? Create side-by-side boxplots of area (area) for whether a county is located by state (state). Briefly comment on what you notice. Which state has the single largest county? How do you know based upon the plot?

  2. Do some states have a higher percentage of their counties located in a metropolitan area? Please create a segmented bar chart with one bar per state, each bar going from 0 - 1, with the fill determined by whether the state is located in a metropolitan area. What do you notice?

Note: you will want to begin with the data wrangling code below. We will learn more about this command next week, but I am providing this to help on this lab:

midwest %>%
  mutate(metro = ifelse(inmetro == 1, "Yes", "No")) %>%

Before going to the last exercise, try knitting, committing, and pushing one more time.

  1. Recreate the plot below. This page will be helpful in determining the theme. The size of the points is 0.75. (Please note: the points are colored different shades of blue.)

Submission

Once you are fully satisfied with your lab, Knit to PDF to create a PDF document.

Before you wrap up the assignment, make sure all documents are updated on your GitHub repo. we will be checking these to make sure you have been practicing how to commit and push changes.

Remember – you must turn in a PDF file to the Gradescope page, with exercise-page labeling before the submission deadline for full credit.

Once your work is finalized in your GitHub repo, you will submit it to Gradescope. Your assignment must be submitted on Gradescope by the deadline to be considered “on time”.

Grading

Total: 50 pts

  • Exercise 1: 4 pts
  • Exercise 2: 6 pts
  • Exercise 3: 4 pts
  • Exercise 4: 8 pts
  • Exercise 5: 6 pts
  • Exercise 6: 6 pts
  • Exercise 7: 8 pts
  • Overall: 8 pts
    • Overall includes the number of commits made, naming chunks, updating the name on the assignment to your name, and following tidyverse style (see: https://style.tidyverse.org/).