Main Ideas
- Spatial data is important
- exploratory data analysis
- detecting spatial patterns and trends
- understanding spatial data relationships
- analysis of spatial data should reflect spatial structure
Task / function | Windows & Linux | macOS |
---|---|---|
Insert R chunk | Ctrl+Alt+I | Command+Option+I |
Knit document | Ctrl+Shift+K | Command+Shift+K |
Run current line | Ctrl+Enter | Command+Enter |
Run current chunk | Ctrl+Shift+Enter | Command+Shift+Enter |
Run all chunks above | Ctrl+Alt+P | Command+Option+P |
<- |
Alt + - | Option + - |
%>% |
Ctrl+Shift+M | Command+Shift+M |
Comment code (online) | Ctrl+Shift+C | Command+Shift+C |
Comment code (local) | Ctrl+/ | Command+/ |
library(tidyverse)
library(sf)
Spatial data is different.*
Our typical “tidy” dataframe.
mpg
A new simple feature object.
nc <- st_read("data/nc_regvoters.shp", quiet = TRUE)
nc
Question: What differences do you observe when comparing a typical tidy data frame to the new simple feature object?
A simple feature is a standard, formal way to describe how real-world spatial objects (country, building, tree, road, etc) can be represented by a computer.
The package sf
implements simple features and other spatial functionality using tidy principles. Simple features have a geometry type. Common choices are shown in the slides associated with today’s lecture.
Simple features are stored in a data frame, with the geographic information in a column called geometry
. Simple features can contain both spatial and non-spatial data.
All functions in the sf
package helpfully begin st_
.
sf
and ggplot
To read simple features from a file or database use the function st_read()
.
nc <- st_read("data/nc_regvoters.shp", quiet = TRUE)
Notice nc
contains both spatial and nonspatial information.
We can build up a visualization layer-by-layer beginning with ggplot
. Let’s start by making a basic plot of North Carolina counties.
ggplot(nc) +
geom_sf() +
labs(title = "North Carolina counties")
Now adjust the theme with theme_bw()
.
ggplot(nc) +
geom_sf() +
labs(title = "North Carolina counties with theme") +
theme_bw()
Now adjust color
in geom_sf
to change the color of the county borders.
ggplot(nc) +
geom_sf(color = "darkgreen") +
labs(title = "North Carolina counties with theme and aesthetics") +
theme_bw()
Then increase the width of the county borders using size
.
ggplot(nc) +
geom_sf(color = "darkgreen", size = 1.5) +
labs(title = "North Carolina counties with theme and aesthetics") +
theme_bw()
Fill the counties by specifying a fill
argument.
ggplot(nc) +
geom_sf(color = "darkgreen", size = 1.5, fill = "orange") +
labs(title = "North Carolina counties with theme and aesthetics") +
theme_bw()
Finally, adjust the transparency using alpha
.
ggplot(nc) +
geom_sf(color = "darkgreen", size = 1.5, fill = "orange", alpha = 0.50) +
labs(title = "North Carolina counties with theme and aesthetics") +
theme_bw()
Our current map is a bit much. Adjust color
, size
, fill
, and alpha
until you have a map that effectively displays the counties of North Carolina.
The nc
data was obtained from the NC Board of Elections website and contains statistics on NC registered voters as of September 4, 2021.
The dataset contains the following variables on all North Carolina counties, categories provided by the NCSBE:
county
: county namedem
: total number of voters who are registered Democratsgop
: total number of voters who are registered Republicanslib
: total number of voters who are registered Libertariansunaf
: total number of voters who are unaffiliatedwhite
: total number of voters who are whiteblack
: total number of voters who are Blackntv_a
: total number of voters who are Native Americanntv_h
: total number of voters who are Native Hawaiianother
: total number of voters who are classified as “other” for racehispanic
: total number of voters who are Hispanicmale
: total number of voters who identify as malefemale
: total number of voters who identify as female
total
: total number of registered voters in that countygeometry
: geographic coordinates of the countyLet’s use the NCBSE data to generate a choropleth map of the number of registered voters by county.
ggplot(nc) +
geom_sf(aes(fill = total)) +
labs(title = "Number of Registered Voters by County",
fill = "# voters") +
theme_bw()
It is sometimes helpful to pick diverging colors, colorbrewer2 can help.
One way to set fill colors is with scale_fill_gradient()
.
ggplot(nc) +
geom_sf(aes(fill = total)) +
scale_fill_gradient(low = "#fee8c8", high = "#7f0000") +
labs(title = "The Triangle and Charlotte have the Most Voters",
fill = "# cases") +
theme_bw()
Different types of data exist (raster and vector).
The coordinate reference system (CRS) matters.
Manipulating spatial data objects is similar, but not identical to manipulating data frames.
dplyr
The sf
package plays nicely with our earlier data wrangling functions from dplyr
.
select()
Maybe you are interested in the partisan breakdown of a county.
nc %>%
select(county, dem, gop, total)
mutate()
Maybe you are interested in the percentage of registered Democrats in a county.
nc %>%
mutate(pct_dem = dem/total)
filter()
You could filter for the percentage of Dems being over 50% (a majority).
nc %>%
mutate(pct_dem = dem/total) %>%
filter(pct_dem > 0.5)
summarize()
We can also calculate summary statistics for our new variable.
nc %>%
mutate(pct_dem = dem/total) %>%
summarize(mean_pct_dem = mean(pct_dem),
min_pct_dem = min(pct_dem),
max_pct_dem = max(pct_dem))
Geometries are “sticky”. They are kept until deliberately dropped using str_drop_geometry
.
nc %>%
select(county, total) %>%
st_drop_geometry()
nc %>%
mutate(pct_ntv_a = ntv_a/total) %>%
ggplot() +
geom_sf(aes(fill = pct_ntv_a)) +
scale_fill_gradient(low = "#f7fbff", high = "#08306b") +
labs(title = "Percent Native American by County",
fill = "Percent Native American")
Write a brief research question that you could answer with this dataset and then investigate it here.
What are limitations of your visualizations above?