Learning goals

By the end of today, you will be able to…

use and understand the statistic
conduct a hypothesis test about a particular
model interactions between variables

To begin, we’ll work with a dataset on iris flowers contained with base R. This dataset comes from an old statisical analysis paper aptly titled The use of multiple measurements in taxonomic problems

data(iris)
glimpse(iris)

## Rows: 150
## Columns: 5
## $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
## $ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
## $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
## $ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
## $ Species      <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…

See ?iris for more information.

discussion

Exercise 1

In your own words, what is ?

[write here]

Exercise 2

Conceptualize

Let’s focus on the second term to build intuition.

The numerator “sum of squared error” is the amount of variability not explained by the model.
The denominator is proportional to the variance in the data, i.e. the amount of variability in the data.
Together, this second term represents the proportion of variability not explained by the model.

If the proportion not explained is 0, the model explains all variability and .

If the proportion not explained is , i.e. the model does not explain any variability, then .

Revisit your explanation in exercise 1, would you update your description in any way? If so, do so here:

[updated explanation]

Example:

Suppose that we want to go out and collect more data on irises, but measuring several parts of each iris flower is time consuming. To save on the time it takes to collect future data, let’s see if one of the observations (petal length) could be predicted from just measuring a flower’s sepal. (This might make sense since the sepal is the part of the flower below the bud that contains the petals before it blooms)

To do this, we will setup three linear models:

predict petal length from sepal length
predict petal length from sepal width
predict petal length from both sepal width and sepal length

Exercise 3.

Why is it useful to investigate each of these models?

[answer here]

Before writing any code, how do you think will vary between models 1, 2 and 3? Why?

Exercise 4

Fit each linear regression model above and compare .

# code here

Do your results match your expectations? Which model offers the best fit? Why?

[answer here]

Now compare ‘adjusted’ between models. Which model offers the best fit? Why? See here for details on finding adjusted .

# code here

Hypothesis testing in a regression framework

Exercise 5

Returning to previous joint model,

Does our data offer sufficient evidence that Sepal.Width is actually associated with (and therefore might help us predict) Petal.Length?

iris %>%
  ggplot(aes(x = Sepal.Width, y = Petal.Length)) + 
  geom_point() +
  theme_minimal()

Let’s conduct a hypothesis test in a regression framework to find out.

If Sepal.Width does not help explain Petal.Length, , this is our null hypothesis.

What is the alternative?

For OLS regression, our test statistic is

We want to see if our observed statistic,

, falls far in the tail under the null.

R takes care of much of this behind the scenes with the tidy output and reports a p-value for each by default.

Fit the regression model and display the tidy output below.

# code here

Is significant?

Interactions

Exercise 6

iris %>%
  ggplot(aes(x = Sepal.Length, y = Petal.Length, color = Species)) + 
  geom_point() +
  scale_color_viridis_d()

In the plot above, it appears the relationship between sepal length and petal length, i.e. the slope of Petal.Length ~ Sepal.Length varies drastically from one species of iris to another.

Fit a model with predictors Sepal.Length, Species and an interaction effect between Sepal.Length and Species. See here for example from the prep.

# code here

Write the fitted linear model below (using , notation) but replacing s with their fitted values.

[write here]

AE 22: Multiple regression pt 2

YOUR NAME GOES HERE

2021-11-11

Bulletin

Learning goals

discussion

Exercise 1

Exercise 2

Example:

Exercise 3.

Exercise 4

Hypothesis testing in a regression framework

Exercise 5

Interactions

Exercise 6

AE 22: Multiple regression pt 2

YOUR NAME GOES HERE

2021-11-11

Bulletin

Learning goals

R2 discussion

Exercise 1

Exercise 2

Example: R2

Exercise 3.

Exercise 4

Hypothesis testing in a regression framework

Exercise 5

Interactions

Exercise 6

discussion

Example: