library(tidyverse)
library(tidymodels)
yawn <- read_csv("data/yawn.csv")
Due today:
Due Thursday:
Upcoming
Use and understand simulation-based methods to …
First let’s, watch the experiment from Mythbusters.
Let \(t\) be the treatment group who saw a person yawn, \(c\) be the control group who did not see anyone yawn, and \(p\) be the proportion of people who yawned.
We want to use simulation-based inference to assess whether or not yawning and seeing someone yawn are independent.
State the null and alternative hypotheses in words:
Select the appropriate null and alternative hypotheses written in mathematical notation:
Type I and Type II error
Truth | Reject \(H_0\) | Fail to reject \(H_0\) |
---|---|---|
\(H_0\) is true | Type 1 error (false +) | Good decision |
\(H_0\) is false | Good decision | Type 2 error (false -) |
What does type I error look like in this case?
What does type II error look like?
Is type I or type II error more worrisome here?
How might we mitigate the more dangerous error?
Click here for further reading.
Before using R to construct the null distribution, let’s generate the null distribution using playing cards! See AE-17 for the simulation instructions. You can also find them in the README
of this application exercise.
Uncomment the code to see read in the data from the class and visualize the null distribution.
#sim_data <- read_csv("https://sta199-f21-001.netlify.app/appex/data/yawn-sim.csv")
#ggplot(data = sim_data, mapping = aes(x = diff_in_prop)) +
# geom_histogram(binwidth = 0.05) +
# labs(title = "Your Results: Difference in Proportion of Yawners")
What is the approximate center of the distribution? Is this what you expected? Why or why not?
The observed difference in proportions from the Mythbusters episode is 0.0441. Based on your simulated distribution, do yawning and seeing someone yawn appear to be dependent?
Let’s use the data from the Mythbusters episode and simulation-based inference in R to test this claim. Based on their experiment, do yawning and seeing someone yawn appear to be dependent?
Evaluate this question using a simulation based approach. We will use the same null and alternative hypotheses as before. The data from Mythbusters is available in the yawn
data frame.
set.seed(101921)
#null_dist <- yawn %>%
# specify(response = ____, explanatory = _____, success = "yawn") %>%
# hypothesize(null = "______") %>%
# generate(100, type = "permute") %>%
# calculate(stat = "______",
# order = c("trmt", "ctrl"))
# add code
# add code
Do you believe the conclusions from this experiment? Why or why not?
Simulation activity from Data science in a box↩︎