Geospatial analysis is one of the fastest growing areas of R. In particular, the sf
package provides a tidy and intuitive way to analyze and visualize spatial data.
sf
: For managing and analyzing spatial dataframestigris
: For downloading in Census geographiesggplot2
: For making publication ready static mapsurbnmapr
: For automatically adding Urban styling to static mapsmapview
: For making exploratory interactive mapssf
packagelibrary(tidyverse)
library(sf)
library(urbnthemes)
library(urbnmapr)
library(tigris)
library(mapview)
The st_read()
function allows you to read in any spatial file format, such as shapefiles (.shp
) or GeoJSONs (.geojson
), as an sf
dataframe. Once read in, you can reproject the coordinate reference system for your data. When mapping and performing geospatial operations, it is important that all data are in the same projection.
list.files("shapefiles")
## [1] "2015 MSA Boundary.cpg" "2015 MSA Boundary.dbf"
## [3] "2015 MSA Boundary.prj" "2015 MSA Boundary.sbn"
## [5] "2015 MSA Boundary.sbx" "2015 MSA Boundary.shp"
## [7] "2015 MSA Boundary.shp.xml" "2015 MSA Boundary.shx"
msa_boundary <- st_read(dsn = "shapefiles/2015 MSA Boundary.shp")
## Reading layer `2015 MSA Boundary' from data source `/home/ajjit123/Documents/GitHub/geospatial-training/shapefiles/2015 MSA Boundary.shp' using driver `ESRI Shapefile'
## Simple feature collection with 24 features and 18 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -78.38668 ymin: 37.99093 xmax: -76.32191 ymax: 39.72004
## geographic CRS: NAD83
msa_boundary <- st_transform(msa_boundary, crs = 4326)
If you have a CSV or other dataset that contains latitudes and longitudes, you can easily convert your data to a sf
point dataframe. It is important to know what coordinate reference system the coordinates are based on, so you can properly assign the CRS or projection. Note that st_set_crs
is used to assign a projection; st_transform
is used to alter a projection that already exists.
banks <- read_csv("data/fdic-sod-dc_2017.csv")
banks_sf <- banks %>%
# Filter out 2 rows without lat or lons
filter(!is.na(sims_longitude) | !is.na(sims_latitude)) %>%
# Convert regular dataframe to an sf dataframe. Need to specify lon, lat (in that order!)
st_as_sf(coords = c("sims_longitude", "sims_latitude")) %>%
st_set_crs(4326)
There are many packages that make it easy to read in Census boundaries. The tigris
package for example allows you to pull states, tracts, CBSAs, and even roads and military bases. For a full list of supported Census geographies see here.
options(tigris_class = "sf", use_tigris_cache = FALSE)
dc_tracts <- tracts(state = "DC",
cb = TRUE,
year = 2010) %>%
st_transform(crs = 4326)
The urbnmapr
package contains geographic data that can be used to easily make state and county level maps, with AL and HI displayed as insets. This package also supports US territories.
dc <- get_urbn_map(map = "states", sf = TRUE) %>%
filter(state_fips == "11") %>%
st_transform(crs = 4326)
geom_sf
from ggplot2
allows you to map point, line, polygon, and multipolygon sf
dataframes. The syntax follows that of other geom_*
functions- provide a data set and a set of aesthetics. If you are using an sf
dataframe (which contains the coordinates saved in a column called geometry
), then you are not required to provide any aesthetics. But you can if you say want to color in states by a particular variable.
You can also layer geoms on top of each other; they will appear from bottom to top.
ggplot() +
geom_sf(data = msa_boundary,
mapping = aes()) +
geom_sf(data = banks_sf,
mapping = aes(),
alpha = .2,
color = palette_urbn_cyan[5],
stroke = FALSE) +
theme_urbn_map() +
labs(title = "FDIC Insured Bank Branches in the Washington DC MSA")
For examples of choropleth maps, see the urbnmapr
vignette.. For examples of other types of maps, see the Urban Institute mapping guide.
st_crop
will crop your data to the bounding box of another spatial data set.
banks_crop <- st_crop(banks_sf, dc)
ggplot() +
geom_sf(data = dc,
mapping = aes(),
fill = "grey") +
geom_sf(data = banks_crop,
mapping = aes(),
color = palette_urbn_cyan[5]) +
theme_urbn_map()
st_intersection
will provide an exact clip.
banks_int <- st_intersection(banks_sf, dc)
ggplot() +
geom_sf(data = dc,
mapping = aes(),
fill = "grey") +
geom_sf(data = banks_int,
mapping = aes(),
color = palette_urbn_cyan[5]) +
theme_urbn_map()
Say, for instance, you want to map out census tracts in DC, colored by the number of FDIC-insured bank branches they have. Doing this would involve:
The function st_join
has many different options to spatially join geometries. The sf
cheatsheet is a great resource for remembering and determining which function to use. When using st_join
, the the geometry of the first object is retained, and the geometry of the second object is lost.
# Perform spatial join between banks (point) and tracts (polygons)
banks_with_tract_appended <- st_join(banks_int, dc_tracts, join = st_intersects)
# Count number of banks by tract
bank_count_by_tract = banks_with_tract_appended %>%
# GEO_ID = unique id for a tract
count(GEO_ID,
name = "num_banks") %>%
st_drop_geometry()
# Append count of number of banks to dc_tracts dataframe
dc_tracts = dc_tracts %>%
left_join(bank_count_by_tract,
by = "GEO_ID") %>%
mutate(num_banks = ifelse(is.na(num_banks), 0, num_banks))
# Create map of banks by Census tract
dc_tracts %>%
ggplot() +
geom_sf(mapping = aes(fill = num_banks)) +
scale_fill_gradientn() +
theme_urbn_map() +
labs(fill = "Number of banks per census tract")
The function st_buffer
can be used to create buffers around geometries. It is important to reproject your data out of long/lat based projectsion to calculate accurate buffers.
Here, I project the bank coordinates into Marlyand State plane in feet in order to calculate a half mile buffer. library(units)
is helpful for setting and converting between units.
banks_proj <- banks_int %>%
st_transform(crs = 2248)
banks_buffer <- banks_proj %>%
st_buffer(units::set_units(.5, mi) %>% units::set_units(ft)) %>%
st_transform(crs = 4326)
ggplot() +
geom_sf(dc, mapping = aes(),
fill = "grey") +
geom_sf(banks_buffer, mapping = aes(),
fill = "#1696d2", alpha = .3, color = NA) +
theme_urbn_map()
The mapview
package makes it very easy to create interactive maps with sf
dataframes. All you have to do is run mapview(your_sf_object)
. You can layer multiple sf
dataframes on top of each other with mapview(sf1) + mapview(sf2)
mapview(dc_tracts)
mapview(dc_tracts) + mapview(banks_int, col.regions = palette_urbn_green[6])
For more information on the options available on the mapview
libary, see Day 4 of our Mapping 101 training.