Premise
In certain methods of spatial cross-validation , you need to set an inclusion radius and a buffer distance for the folds into which you divide your data. I am working with census tracts, and think it would make sense to use distances based on neighborhoods to set the inclusion radius and buffer distance.
Set-Up
Load the data wrangling and mapping libraries, then import Chicago neighborhoods:
Code
library (tidyverse)
library (sf)
library (tmap)
# Source: Chicago Data Portal - Boundaries - Neighborhoods
# URL: https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Neighborhoods/bbvz-uum9
neighborhoods <- st_read (here:: here ("data" , "boundaries_neighborhoods_chicago.kml" ))
Reading layer `Layer0' from data source
`/home/riggins/blog/data/boundaries_neighborhoods_chicago.kml'
using driver `LIBKML'
Simple feature collection with 98 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -87.94011 ymin: 41.64454 xmax: -87.52414 ymax: 42.02304
Geodetic CRS: WGS 84
Calculations
Use S2 backend of sf library to calculate areas of each neighborhood
Calculate a rough radius of each neighborhood by treating it as if it were a circle
Calcuate the average rough radius
Code
neighborhoods |>
mutate (
area = st_area (neighborhoods),
rough_radius = sqrt (area/ pi)
) |>
summarize (avg_rough_radius = mean (rough_radius))
Simple feature collection with 1 feature and 1 field
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -87.94011 ymin: 41.64454 xmax: -87.52414 ymax: 42.02304
Geodetic CRS: WGS 84
avg_rough_radius geometry
1 1273.473 [m] MULTIPOLYGON (((-87.94001 4...
The average rough radius is about 1273 meters.
Visualize
Visualize what it looks like to add a 1300 meter buffer zone around each neighborhood:
Code
buffers <- st_buffer (neighborhoods, 1300 )
Code
tmap_mode ("view" )
tm_shape (buffers) + tm_polygons (col = "pri_neigh" , alpha = 0.5 , border.alpha = 0 ) + tm_shape (neighborhoods) + tm_polygons (alpha = 0 )