library(tidyverse)
library(tidymodels)
library(scatterplot3d)
tidymodels
to make inference under a linear regression modelTo begin, let’s load the data. Again, we’ll work with the pokemon data from last week.
pokemon <- read_csv("data/pokemon.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## pokedex_number = col_double(),
## name = col_character(),
## generation = col_double(),
## leg_status = col_character(),
## type_1 = col_character(),
## type_2 = col_character(),
## height_m = col_double(),
## weight_kg = col_double(),
## bst = col_double(),
## hp = col_double(),
## atk = col_double(),
## def = col_double(),
## spa = col_double(),
## spd = col_double(),
## spe = col_double()
## )
What is a linear model?
A linear model is a simple way to mathematically model the relationship between two or more observed phenomena.
Is there a relationship between a pokemon’s size and their hit points (hp)? Specifically, does a pokemon’s height/weight tell us something about how many hit points they have?
Create two scatterplots, one for hp vs height and another illustrating hp vs weight.
# code here
Describe the relationship you see here.
Label the following:
Response variable:
Explanatory variable(s):
For now, let’s focus on just two of these variables, namely hp and height.
Write down a model (using \(x\), \(y\), \(\beta\) notation) that describes a linear relationship between hp and height. Define each variable.
[write here]
Click here to interact with an ordinary least squares (OLS) linear regression model.
Select I and move the data points around.
Describe what you see.
Now let’s fit a ordinary least squares lm
.
Use functions from the preparation video/slides to fit a linear model to hp
and height
as described in exercise 2. Save the model in a variable entitled hp_height_fit
.
# code here
Write out the equation of the fitted model.
Uncomment and fill in blanks below to visualize the linear model on top of the scatterplot from Exercise 1.
#hp_height_fit_aug <- augment(hp_height_fit$fit) # need $fit for plotting
# hp_height_fit_aug %>%
# ggplot() +
# geom_point(aes(___, ___)) +
# geom_line(aes(x = ___, y = .fitted), size = 0.75, color = "darkred") +
# theme_minimal()
Use the equation of the fitted model to predict the hp of an unknown pokemon with a height of 1.1 meters.
# code here
If you were asked to predict the hp of a pokemon that is 2.5 meters tall, is this model appropriate? Why or why not?
What about a linear model with multiple explanatory variables?
scatterplot3d(pokemon[,c("weight_kg", "height_m", "hp")], pch = 19, color="steelblue")
Describe what you see.