library(tidyverse)
library(tidymodels)
library(scatterplot3d)

Bulletin

Learning goals

To begin, let’s load the data. Again, we’ll work with the pokemon data from last week.

pokemon <- read_csv("data/pokemon.csv")
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   pokedex_number = col_double(),
##   name = col_character(),
##   generation = col_double(),
##   leg_status = col_character(),
##   type_1 = col_character(),
##   type_2 = col_character(),
##   height_m = col_double(),
##   weight_kg = col_double(),
##   bst = col_double(),
##   hp = col_double(),
##   atk = col_double(),
##   def = col_double(),
##   spa = col_double(),
##   spd = col_double(),
##   spe = col_double()
## )

The language of linear modeling

What is a linear model?

A linear model is a simple way to mathematically model the relationship between two or more observed phenomena.

Exercise 1)

Is there a relationship between a pokemon’s size and their hit points (hp)? Specifically, does a pokemon’s height/weight tell us something about how many hit points they have?

Create two scatterplots, one for hp vs height and another illustrating hp vs weight.

# code here

Describe the relationship you see here.

Label the following:

Response variable:

Explanatory variable(s):

Exercise 2)

For now, let’s focus on just two of these variables, namely hp and height.

Write down a model (using \(x\), \(y\), \(\beta\) notation) that describes a linear relationship between hp and height. Define each variable.

[write here]

Exercise 3)

Click here to interact with an ordinary least squares (OLS) linear regression model.

Select I and move the data points around.

Describe what you see.

Exercise 4)

Now let’s fit a ordinary least squares lm.

Use functions from the preparation video/slides to fit a linear model to hp and height as described in exercise 2. Save the model in a variable entitled hp_height_fit.

# code here

Write out the equation of the fitted model.

Uncomment and fill in blanks below to visualize the linear model on top of the scatterplot from Exercise 1.

#hp_height_fit_aug <- augment(hp_height_fit$fit) # need $fit for plotting

# hp_height_fit_aug %>%
#   ggplot() +
#   geom_point(aes(___, ___)) +
#   geom_line(aes(x = ___, y = .fitted), size = 0.75, color = "darkred") +
#   theme_minimal()

Exercise 5)

Use the equation of the fitted model to predict the hp of an unknown pokemon with a height of 1.1 meters.

# code here

If you were asked to predict the hp of a pokemon that is 2.5 meters tall, is this model appropriate? Why or why not?

Exercise 6)

What about a linear model with multiple explanatory variables?

scatterplot3d(pokemon[,c("weight_kg", "height_m", "hp")], pch = 19, color="steelblue") 

Describe what you see.