Bulletin

Main Ideas

Hot Keys

Task / function Windows & Linux macOS
Insert R chunk Ctrl+Alt+I Command+Option+I
Knit document Ctrl+Shift+K Command+Shift+K
Run current line Ctrl+Enter Command+Enter
Run current chunk Ctrl+Shift+Enter Command+Shift+Enter
Run all chunks above Ctrl+Alt+P Command+Option+P
<- Alt + - Option + -
%>% Ctrl+Shift+M Command+Shift+M
Comment code (online) Ctrl+Shift+C Command+Shift+C
Comment code (local) Ctrl+/ Command+/

Lecture Notes and Exercises

library(tidyverse)
library(sf)

Spatial data is different.*

Our typical “tidy” dataframe.

mpg

A new simple feature object.

nc <- st_read("data/nc_regvoters.shp", quiet = TRUE)
nc

Question: What differences do you observe when comparing a typical tidy data frame to the new simple feature object?

Simple features

A simple feature is a standard, formal way to describe how real-world spatial objects (country, building, tree, road, etc) can be represented by a computer.

The package sf implements simple features and other spatial functionality using tidy principles. Simple features have a geometry type. Common choices are shown in the slides associated with today’s lecture.

Simple features are stored in a data frame, with the geographic information in a column called geometry. Simple features can contain both spatial and non-spatial data.

All functions in the sf package helpfully begin st_.

sf and ggplot

To read simple features from a file or database use the function st_read().

nc <- st_read("data/nc_regvoters.shp", quiet = TRUE)

Notice nc contains both spatial and nonspatial information.

We can build up a visualization layer-by-layer beginning with ggplot. Let’s start by making a basic plot of North Carolina counties.

ggplot(nc) +
  geom_sf() +
  labs(title = "North Carolina counties")

Now adjust the theme with theme_bw().

ggplot(nc) +
  geom_sf() +
  labs(title = "North Carolina counties with theme") + 
  theme_bw()

Now adjust color in geom_sf to change the color of the county borders.

ggplot(nc) +
  geom_sf(color = "darkgreen") +
  labs(title = "North Carolina counties with theme and aesthetics") + 
  theme_bw() 

Then increase the width of the county borders using size.

ggplot(nc) +
  geom_sf(color = "darkgreen", size = 1.5) +
  labs(title = "North Carolina counties with theme and aesthetics") +
  theme_bw()

Fill the counties by specifying a fill argument.

ggplot(nc) +
  geom_sf(color = "darkgreen", size = 1.5, fill = "orange") +
  labs(title = "North Carolina counties with theme and aesthetics") +
  theme_bw()

Finally, adjust the transparency using alpha.

ggplot(nc) +
  geom_sf(color = "darkgreen", size = 1.5, fill = "orange", alpha = 0.50) +
  labs(title = "North Carolina counties with theme and aesthetics") +
  theme_bw()

Our current map is a bit much. Adjust color, size, fill, and alpha until you have a map that effectively displays the counties of North Carolina.

North Carolina Registered Voters

The nc data was obtained from the NC Board of Elections website and contains statistics on NC registered voters as of September 4, 2021.

The dataset contains the following variables on all North Carolina counties, categories provided by the NCSBE:

Let’s use the NCBSE data to generate a choropleth map of the number of registered voters by county.

ggplot(nc) +
  geom_sf(aes(fill = total)) + 
  labs(title = "Number of Registered Voters by County",
       fill = "# voters") + 
  theme_bw() 

It is sometimes helpful to pick diverging colors, colorbrewer2 can help.

One way to set fill colors is with scale_fill_gradient().

ggplot(nc) +
  geom_sf(aes(fill = total)) +
  scale_fill_gradient(low = "#fee8c8", high = "#7f0000") +
  labs(title = "The Triangle and Charlotte have the Most Voters",
       fill = "# cases") + 
  theme_bw() 

Challenges

  1. Different types of data exist (raster and vector).

  2. The coordinate reference system (CRS) matters.

  3. Manipulating spatial data objects is similar, but not identical to manipulating data frames.

dplyr

The sf package plays nicely with our earlier data wrangling functions from dplyr.

select()

Maybe you are interested in the partisan breakdown of a county.

nc %>% 
  select(county, dem, gop, total)

mutate()

Maybe you are interested in the percentage of registered Democrats in a county.

nc %>% 
  mutate(pct_dem = dem/total)

filter()

You could filter for the percentage of Dems being over 50% (a majority).

nc %>% 
  mutate(pct_dem = dem/total) %>%
  filter(pct_dem > 0.5)

summarize()

We can also calculate summary statistics for our new variable.

nc %>% 
  mutate(pct_dem = dem/total) %>%
  summarize(mean_pct_dem = mean(pct_dem),
            min_pct_dem = min(pct_dem),
            max_pct_dem = max(pct_dem))

Geometries are “sticky”. They are kept until deliberately dropped using str_drop_geometry.

nc %>% 
  select(county, total) %>% 
  st_drop_geometry()

Practice

  1. Construct an effective visualization investigating the percentage of all voters in NC that are Native American. Use #f7fbff as “low” on the color gradient and #08306b as “high”. Which county has the highest percentage of Native American voters? (You might want to use Google here.)
nc %>%
  mutate(pct_ntv_a = ntv_a/total) %>%
ggplot() +
  geom_sf(aes(fill = pct_ntv_a)) +
  scale_fill_gradient(low = "#f7fbff", high = "#08306b") +
  labs(title = "Percent Native American by County",
       fill = "Percent Native American")

  1. Write a brief research question that you could answer with this dataset and then investigate it here.

  2. What are limitations of your visualizations above?