In this lab you will…
A repository has already been created for you and your teammates. Everyone in your team has access to the same repo.
Go to the sta199-f21 course organization on GitHub.
You should see a repo with the *lab08** prefix.
Each person on the team should clone the repository and open a new project in RStudio. Do not make any changes to the .Rmd file until the instructions tell you do to so.
We will use the tidyverse
and tidymodels
packages in this assignment
The goal of today’s lab is to use CLT-based inference to evaluate the synergy of burritos.
Today’s dataset has been adapted from Scott Cole’s Burritos of San Diego project, located here. The goal of the project was to identify the best and worst burritos in San Diego, characterize variance in burrito quality, and generate predictive models for what makes a burrito great.
As part of this project, 71 participants reviewed burritos from 79 different taco shops. Reviewers captured objective measures of the burrito (such as whether it contains certain ingredients) and reviewed it on a number of metrics (such as quality of the tortilla, the temperature, quality of meat, etc.). For the purposes of this lab, you may consider each of these observations to be an independent and representative sample of all burritos.
The subjective ratings in the dataset are as follows. Each variable is ranked on a 0 to 5 point scale, with 0 being the worst and 5 being the best.
tortilla
: quality of the tortillatemp
: temperature of the burritomeat
: quality of the meatfillings
: quality of non-meat fillingssalsa
: quality of the salsamfr
: meat-to-filling ratiouniformity
: whether each bite contains a uniform slew of ingredients (e.g., a bite entirely composed of tortilla and sour cream would probably be terrible)synergy
: how well it all comes togetherIn addition, the reviewers noted the presence of the following burrito components. Each of the following variables is a binary variable taking on values present
or none
:
guac
: guacamolecheese
: cheesefries
: fries (it’s a thing, look it up.)sourcream
: sour creamrice
: ricebeans
: beansThe data are available in burritos.csv
The goal of this analysis is to use inference based on the Central Limit Theorem to analyze the mean synergy rating of burritos.
We’ll start by examining the distribution of synergy
, a rating indicating how well all the ingredients in the burrito come together.
Visualize the distribution of synergy
using a histogram with binwidth of 0.5.
Calculate the following summary statistics: the mean synergy, standard deviation of synergy, and sample size size. Save the summary statistics as summary_stats
. Then display summary_stat
.
The goal of this analysis is to use CLT-based inference to understand the true mean synergy rating of all burritos. The idea is that if CLT holds, we can assume the distribution of the sample mean is normal and thus easily generate a normal null distribution to test hypotheses.
Based on the data, what is your “best guess” for the mean synergy rating of burritos?
Before conducting inference, we need to check the conditions to make sure the Central Limit Theorem can be applied in this analysis. For each condition, indicate whether it is satisfied and provide a brief explanation supporting your response.
- Independence?
- Sample size / distribution?
State the null and alternative hypotheses to evaluate the question posed in the previous exercise. Write the hypotheses in words and in statistical notation.
Let \(\bar{x}\) be the mean synergy score in a sample of 330 randomly selected burritos. Given the Central Limit Theorem and the hypotheses from the previous exercise
\[T = \frac{\bar{x}- \mu_{0}}{s/\sqrt{n}}\] where \(\bar{x}\) is the sample mean, \(\mu_0\) is the mean under the null, \(s\) is the sample s.d. and \(n\) is the sample size.
Explain what this value means in the context of this analysis. Refer to sliders here from the preparation for last class if necessary.
What is the distribution of the test statistic, \(T\)? Be specific. Hint: It is ___ distributed with ___ degrees of ____.
pt()
function to calculate the p-value.\[\bar{x} \pm t^*_{n-1} \times \frac{s}{\sqrt{n}}\]
We already know \(\bar{x}\) and \(\frac{s}{\sqrt{n}}\), so let’s focus on the calculating \(t^*_{n-1}\). We will use the qt()
function to calculate the critical value \(t^*_{n-1}\).
Here is an example: If we want to calculate a 95% confidence interval for the mean, we will use qt(0.975, n-1)
, where 0.975 is the cumulative probability at the upper bound of the 95% confidence interval (recall we used this value to find the upper bound when calculating bootstrap confidence intervals), and (n-1) are the degrees of freedom.
Calculate the critical value, \(t^*_{n-1}\), of the 90% confidence interval for the mean synergy rating of all burritos.
Use R as a “calculator” to calculate the 90% confidence interval.
Interpret the interval in the context of the data.
infer
for the calculations in CLT-based inference using the t_test()
function.The results should be the same as the calculations you did in exercises in the previous exercises.
%>%
burritos t_test(response = _____,
alternative = "______",
mu = ______,
conf_int = FALSE)
The results should be the same as the calculations from Exercise 8.
%>%
burritos t_test(response = _____,
conf_int = TRUE,
conf_level = _____) %>%
select(lower_ci, upper_ci)
Knit to PDF to create a PDF document. Knit and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.
Please only upload your PDF document to Gradescope. Associate the “Overall” graded section with the first page of your PDF, and mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages.
Component | Points |
---|---|
Ex 1 | 5 |
Ex 2 | 2 |
Ex 3 | 4 |
Ex 4 | 4 |
Ex 5 | 4 |
Ex 6 | 5 |
Ex 7 | 6 |
Ex 8 | 8 |
Ex 9 | 4 |
Ex 10 | 4 |
Workflow & formatting | 4 |