Hall Riggins Blog - Exploratory Data Analysis of SARS-CoV-2 in Cook County Wastewater

Rationale

I am working on a project to support Cook County Department of Public Health with wastewater surveillance of SARS-CoV-2. We want to detect surges of viral copies in wastewater and see if those surges can act as early warnings for surges in hospital cases of COVID-19. In this post, I will perform some preliminary exploratory data analysis of the data.

Dataset and Prep

The data was derived from the CDC’s National Wastewater Surveillance System. Please see this git commit for a specification of the data preparation pipeline I performed using the {targets} package.

Here is a glimpse of the data:

Code

library(tidyverse)
library(timetk)
library(tmap)

ww_data <- arrow::read_parquet(here::here("data", "ww_cook_county.parquet")) 

glimpse(ww_data)

Rows: 629
Columns: 14
$ date                              <date> 2021-11-01, 2021-11-01, 2021-11-03,…
$ day_num                           <dbl> 610, 610, 612, 612, 617, 624, 624, 6…
$ sample_collect_date_time          <dttm> 2021-11-01 03:00:00, 2021-11-01 03:…
$ wwtp_name                         <chr> "mwrdgc calumet wrp", "mwrdgc o'brie…
$ short_name                        <chr> "calumet", "obrien", "calumet", "obr…
$ display_name                      <chr> "Calumet, South Suburbs and Chicago"…
$ population_served                 <dbl> 1134897, 1263110, 1134897, 1263110, …
$ longitude                         <dbl> -87.68732, -87.88357, -87.68732, -87…
$ latitude                          <dbl> 41.81193, 42.05788, 41.81193, 42.057…
$ pcr_target_avg_conc               <dbl> 9910.641, 58259.228, 2040.000, 19864…
$ pcr_target_units                  <chr> "copies/l wastewater", "copies/l was…
$ rec_eff_percent                   <dbl> 0.000, 0.015, 0.000, 6.455, 3.055, 6…
$ flow_rate                         <dbl> 354, 333, 354, 333, 354, 333, 354, 3…
$ M_viral_copies_per_day_per_person <dbl> 0.000000000, 0.008721118, 0.00000000…

Laboratories report results as concentration of viral copies recovered per liter of wastewater at each sampling site. In order to enable standardized comparison of samples across different sampling sites, the CDC recommends standardizing by:

Efficiency of viral recovery during the sampling process (variable rec_eff_percent). This is estimated by spiking wastewater with a known quantity of a different virus and seeing what proportion is recovered.
Flow rate of wastewater at the sampling site (variable flow_rate in millions of gallons per day).
Number of people supplying waste to the sewershed (variable population_served).

After standardizing by all these variables, the measurements units convert to million viral copies per day per person (variable M_viral_copies_per_day_per_person).

There are 7 wastewater treatment plants at which SARS-CoV-2 is sampled in Cook County:

Code

ww_data_locations <-
    ww_data |>  
    select(display_name) |> 
    distinct() |> 
    arrange(display_name)

ww_data_locations

# A tibble: 8 × 1
  display_name                          
  <chr>                                 
1 Calumet, South Suburbs and Chicago    
2 Egan, Far Northwest Suburbs           
3 Hanover Park, Far Northwest Suburbs   
4 Kirie, Mid Northwest Suburbs          
5 Lemont, Far Southwest Suburbs         
6 O'Brien, Northeast Suburbs and Chicago
7 Stickney (1), West Suburbs and Chicago
8 Stickney (2), West Suburbs and Chicago

Here are the locations on the map:

Code

ww_data_locations <- 
    ww_data |>  
    select(display_name) |> 
    distinct() |> 
    arrange(display_name) |> 
    # CDC-provided long/lats are not accurate
    mutate(
        longitude = c(-87.606416, -88.037690, -88.138314, -87.936888, -87.998061, -87.717100, -87.766175, -87.766175),
        latitude = c(41.662910, 42.019824, 41.999846, 42.021035, 41.678252, 42.020932, 41.817061, 41.817061)
    ) |> 
    sf::st_as_sf(coords = c("longitude", "latitude"), crs = 4326)
    
tmap_mode("view")

ww_data_locations |> 
    select(display_name) |> 
    distinct() |> 
    tm_shape() + 
    tmap::tm_markers()