In this lab you will…
A repository has already been created for you and your teammates. Everyone in your team has access to the same repo.
Go to the course organization page on GitHub.
You should see a repo with the lab-06 prefix.
Each person on the team should clone the repository and open a new project in RStudio. Do not make any changes to the .Rmd file until the instructions tell you do to so.
You may have seen this already through the course of your collaboration in the past few weeks. When two collaborators make changes to a file and push the file to their repository, git merges these two files.
If these two files have conflicting content on the same line, git will produce a merge conflict. Merge conflicts need to be resolved manually, as they require a human intervention:
To resolve the merge conflict, decide if you want to keep only your text, the text on GitHub, or incorporate changes from both texts. Delete the conflict markers <<<<<<<
, =======
, >>>>>>>
and make the changes you want in the final merge.
Assign numbers 1, 2, 3, and 4 to each of your team members (if only 3 team members, just number 1 through 3). Go through the following steps in detail, which simulate a merge conflict. Completing this exercise will be part of the lab grade.
Step 1: Everyone Clone the repo with the prefix merge-conflict and open the .Rmd file.
Team Member 4 should look at the group’s repo on GitHub.com to ensure that the other members’ files are pushed to GitHub after every step.
Step 2: Team Member 1 Change the team name to your team name. Knit, commit, and push.
Step 3: Member 2 Change the team name to something different (i.e., not your team name). Knit, commit, and push.
You should get an error.
Pull and review the document with the merge conflict. Read the error to your teammates. You can also show them the error by sharing your screen. A merge conflict occurred because you edited the same part of the document as Member 1. Resolve the conflict with whichever name you want to keep, then knit, commit and push again.
Step 4: Member 3 Write some narrative in the space provided. Commit and Push. You should get an error.
This time, no merge conflicts should occur, since you edited a different part of the document from Members 1 and 2. Read the error to your teammates. You can also show them the error by sharing your screen.
Click to pull. Then, knit, commit, and push. All errors should be resolved and all documents updated in the GitHub repo.
You do not need to submit anything on Gradescope for the merge conflict activity.
We will use the tidyverse and tidymodels packages in this lab.
library(tidyverse)
library(tidymodels)
Today’s data comes from the City of Durham’s annual Resident Satisfaction Survey for 2020. Click here to read the full report of results from the survey. In particular, durham-survey-2020.csv
contains data from over 800 Durham residents on a variety of questions about their experience living in the city. Assume that the data are representative of Durham residents and may be generalized to the wider population of all city residents.
The following variables are used in this analysis:
age
: Age category
length_in_durham
: How long respondent has lived in Durham (in years)
mask_public_outdoor
: How often respondent reported wearing a mask in an public outdoor gathering
condition_public_art
: How respondent rated their satisfaction with the condition of public art in Durham (1: lowest - 5: highest, 9: no response provided)
Fill in the code to load the data set.
<- read_csv("_______") ____
Hint: be careful with how missing values are coded in this survey. As well, don’t forget to set the seed specified in the instructions in order to ensure reproducibility!
We’ll begin by analyzing the typical number of years current residents have lived in Durham.
Calculate a 95% bootstrap confidence interval for the typical number of years current residents have lived in Durham. The confidence interval should be calculated for the parameter (mean or median) you chose in Exercise 1. Use set.seed(2)
.
Then, then interpret the interval in the context of the data.
Next, let’s look at how frequently Durham residents wore masks at public outdoor events in 2020. Hint: You will need to make a new variable.
Calculate the proportion of survey respondents who wore a mask frequently (4
) or always (5
) at public outdoor events, among those who provided a response to this question.
Calcualte a 98% bootstrap confidence interval for the proportion of Durham residents who wore a mask frequently or always at public outdoor events in 2020. Use set.seed(3)
.
Interpret the confidence interval in the context of the data.
According to data from the United States Census, 46% of US adults are 18 - 44 years old. Is the proportion of adults in Durham who are 18 - 44 years different from the proportion of adults in this age range in the United States?
Create a new variable indicating if a survey respondent is 18 - 44 years old.
Calculate the proportion of survey respondents who are 18-44 years old, among those who provided an age.
Conduct a hypothesis test to test the question stated in Exercise 4.
set.seed(5)
. Then visualize the distribution and the shaded region corresponding to the p-value.Are Durham residents generally satisfied with the condition of public art in the city? We’ll considered “generally satisfied” as having a mean satisfaction score greater than 3.
set.seed(6)
. Then visualize the distribution and the shaded region corresponding to the p-value.Given your conclusion in Exercise 6, which type of error could you possibly have made? What would making such an error mean in the context of the analysis question?
Go back through your write up to make sure you followed the coding style guidelines we discussed in class (e.g. no long lines of code).
There should only be one submission per team on Gradescope.
Component | Points |
---|---|
Ex 1 | 6 |
Ex 2 | 6 |
Ex 3 | 8 |
Ex 4 | 4 |
Ex 5 | 6 |
Ex 6 | 6 |
Ex 7 | 5 |
Merge conflict activity | 4 |
Workflow & formatting | 5 |