library(tidyverse)
library(knitr)
library(infer)

Bulletin

Midterm grade calculation

Your midterm grade is computed using the assignment grades reported on sakai but according to the weight scale described on the syllabus

For purposes of the midterm grade, no lowest assignment scores were dropped.

mean ae score (as %) * 2.5
+ mean quiz score (as %) * 5
+ mean lab score (as %) * 15
+ exam 01 score (as %) * 17.5
+ hw score (as %) * 25

all divided by 65.

Learning goals

Vocabulary activity

What’s the difference between a parameter and a statistic?

Is a parameter “random” or “fixed”? Typically “Unknown” or “known”? Why?

Bootstrapping activity: Rent in Manhattan

On a given day in 2018, twenty one-bedroom apartments were randomly selected on Craigslist Manhattan from apartments listed as “by owner”. The data are in the manhattan data frame. We will use this sample to conduct inference on the typical rent of 1 bedroom apartments in Manhattan.

manhattan <- read_csv("data/manhattan.csv")

Exercise 1

Visualize the distribution of rent. Is the mean or the median a better measure of typical rent of one-bedroom apartments in Manhattan?

Exercise 2

What is a point estimate of the typical rent?

Exercise 3

Let’s bootstrap!

  • To bootstrap we will sample with replacement by drawing a value from the bowl.
  • How many draws do we need for our bootstrap sample?

Fill in the values from the bootstrap sample conducted in class. Once the values are filled in, uncomment the code.

# class_bootstrap <- c()

Exercise 4

# add code

Does this statistic align with your expectations?


Here we’ve take one bootstrap sample, but in practice we will need about 10,000 - 15,000! In the next lecture we will discuss how we can calculate bootstrap samples using the infer package in R.

Sneak peek!

boot_dist = manhattan %>%
  # specify the variable of interest
  specify(response = rent) %>% 
  # generate 15000 bootstrap samples
  generate(reps = 15000, type = "bootstrap") %>% 
  # calculate the statistic of each bootstrap sample
  calculate(stat = "mean")

boot_dist %>%
  ggplot(aes(x = stat)) +
  geom_histogram() + 
  labs(x = "mean rent")