Spatial Data Science

for Public Health

Best Practices and Case Studies

Dr. Arun Mitra Peddireddy
SCTIMST, Trivandrum

Spatial Epidemiology Series | Webinar No. 3 | 04 Nov 2023

Outline

Introduction
Foundations of Spatial Data Science
Best Practices in Spatial Data Science
Case Studies from India
Challenges and Future Directions
Q&A Session

Introduction

GIS and Public Health

Extremely useful in providing a fresh outlook to public health.
Provides opportunity to enable overlaying data with its spatial representation
Supports better planning and decision-making.
The convergence of many new sub-disciplines:
- medical geography
- public health informatics
- data science

Map of the plague in the province of Bari, Naples, 1690-1692

The map shows areas most affected and the boundaries of a military quarantine imposed to prevent its spread to neighboring towns and to other provinces.

Applications of GIS in Public Health

disease surveillance
environmental health
infectious diseases
- mathematical modelling
- agent based modelling
population genetics
medical imagining
cancer biology

While traditional uses of GIS in healthcare still are relevant, newer methods and advancing technology would be monumental for public health research.

What is Spatial Data Science?

Definition

Spatial data science (SDS) is a subset of Data Science that focuses on the unique characteristics of spatial data, moving beyond simply looking at where things happen to understand why they happen there.

CARTO - https://carto.com/what-is-spatial-data-science

Like data science, spatial data science seems to be a field that arises bottom-up in and from many existing scientific disciplines and industrial activities concerned with application of spatial data, rather than being a sub-discipline of an existing scientific discipline.

Edzer Pebesma, Roger Bivand - Spatial Data Science With Applications in R

How is it different from Data Science?

Why Spatial Data Science for Public Health?

Wealth of Spatial Data
70% of all data that is generated data has spatial attributes
Routine health data can be geo-referenced
Provide a gateway for researchers and practitioners to examine the role and harness the power of SDS in public health
Coupled with the emerging field of spatial statistics, the analysis of this location-based data is developing new and novel directions for public health.

Foundational Concepts

Spatial Dependence and Complete Spatial Randomness

Spatial dependence is “the propensity for nearby locations to influence each other and to possess similar attributes”.

This means natural phenomenon are not spatially distributed at random.

temparature,
rainfall,
population density,
socio-economic conditions etc.

It can be measured by the indices of Spatial Autocorrelation.

Spatial Autocorrelation

Refers to the presence of systematic spatial variation in a mapped variable.

The terms spatial association and spatial dependence are often used to reflect spatial auto- correlation as well.

Indices to measure Spatial Dependence

Covariance Functions and Variograms
Global Spatial Autocorrelation Measures
- Moran’s I index
- General G-Statistic
- Geary’s C index
Local Indicators of Spatial Association (LISA)
- Local Moran’s I index
- Getis-Ord G_i and G_i^∗ statistics
Space-Time Correlation Analysis
- Bivariate Moran’s I for STC
- Differential Moran’s I
- Emerging Hot Spot Analysis (EHSA)

Why is the CRS Important?

The Mercator projection, for example, is used where angular relationships are important, but the relationship of areas are distorted.

The Mollweide Equal Area Cylindrical projection, for example, ensures that all mapped areas have the same proportional relationship to the areas on the Earth.

The Plate Carree Equidistant Cylindrical projection, for example, is used when accurate distance measurement is important.

The Robinson projection is a compromise where distortions of area, angular conformity and distance are acceptable.

The United Nations Logo uses the Azimuthal Equidistant projection

What four commonly used projections do, as shown on the human head

CRS in Action

Data Science Approach as a methodological approach

Note

The key word in data science is not data, it is science.

– Jeff Leek, JHU Data Science Lab

Reproducible Research

There are four key elements of reproducible research:

data documentation
data publication
code publication,
output publication.

Tools for Spatial Data Science

GIS related
Data Science related
Spatial Data Science related

R is the best spatial data science tool available for public health !!!

R provides a range of powerful packages for geospatial analysis, enabling advanced computations and analytics.

R Spatial Analysis Ecosystem

CRAN Task View - Spatial Analysis

R Spatial Learning Resources

Wealth of Resource material
Powerful tools/packages
seamlessly handle vector and raster data
inractive visualization
end-to-end solution

Newest addition: Spatial Data Science: With Applications in R

The `sf` package

install.packages("sf")

The sf package is an R implementation of Simple Features.

This package incorporates:

a new spatial data class system in R
functions for reading and writing data
tools for spatial operations on vectors

Geometry Types in `sf`

Loading `sf` package

library(sf)

fs::dir_tree(here("spatial_files", "kl_pop_centers"))

C:/Users/Arun/Dropbox/Research/Spatial_Data_Science_Talk/spatial_files/kl_pop_centers
├── kl_pop_centers.dbf
├── kl_pop_centers.prj
├── kl_pop_centers.shp
└── kl_pop_centers.shx

Load spatial data into R

shape_file <- here("spatial_files", "kl_pop_centers", "kl_pop_centers.shp")

kl_pop_centers <- st_read(shape_file)

Reading layer `kl_pop_centers' from data source 
  `C:\Users\Arun\Dropbox\Research\Spatial_Data_Science_Talk\spatial_files\kl_pop_centers\kl_pop_centers.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 170 features and 14 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 74.95388 ymin: 8.35761 xmax: 77.28071 ymax: 12.60804
Geodetic CRS:  WGS 84

View the `sf` object

kl_pop_centers

Simple feature collection with 170 features and 14 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 74.95388 ymin: 8.35761 xmax: 77.28071 ymax: 12.60804
Geodetic CRS:  WGS 84
First 10 features:
   Rotation Scale    name_of_to      district  state ELEVATION     District_1
1         0     0 VADAKKANCHERI      PALAKKAD KERALA         0       P>LAKK>D
2         0     0      ANGAMALI     ERNAKULAM KERALA         0      ERN>KULAM
3         0     0    MALAYATLUR     ERNAKULAM KERALA         0      ERN>KULAM
4         0     0        KALADI     ERNAKULAM KERALA         0      ERN>KULAM
5         0     0      TOMALLUR PATTANAMTITTA KERALA         0 PATTANAMTHITTA
6         0     0     GURUVAYUR      THRISSUR KERALA         0        TRISS@R
7         0     0 TRIMBRANALLUR      THRISSUR KERALA         0        TRISS@R
8         0     0      KADIKKAD      THRISSUR KERALA         0        TRISS@R
9         0     0     CHALAKUDI      THRISSUR KERALA         0        TRISS@R
10        0     0          MALA      THRISSUR KERALA         0        TRISS@R
   STATE_1     TEHSIL Shape_Leng Shape_Area pop_2020      lon       lat
1   KERALA    >LATT@R   150475.2  578919605   456575 76.48236 10.591936
2   KERALA      >LUVA   155736.4  550828742   615928 76.38866 10.200267
3   KERALA      >LUVA   155736.4  550828742   615928 76.51483 10.197496
4   KERALA      >LUVA   155736.4  550828742   615928 76.43405 10.167068
5   KERALA       AD@R   110539.2  270281878   187760 76.68601  9.227245
6   KERALA CH>LAKKUDI   376808.1 1270626860  1153806 76.04651 10.598107
7   KERALA CH>LAKKUDI   376808.1 1270626860  1153806 76.10642 10.523958
8   KERALA CH>LAKKUDI   376808.1 1270626860  1153806 75.96130 10.681153
9   KERALA CH>LAKKUDI   376808.1 1270626860  1153806 76.34039 10.308017
10  KERALA CH>LAKKUDI   376808.1 1270626860  1153806 76.26185 10.250355
                    geometry
1  POINT (76.48236 10.59194)
2  POINT (76.38866 10.20027)
3   POINT (76.51483 10.1975)
4  POINT (76.43405 10.16707)
5  POINT (76.68601 9.227245)
6  POINT (76.04651 10.59811)
7  POINT (76.10642 10.52396)
8   POINT (75.9613 10.68115)
9  POINT (76.34039 10.30802)
10 POINT (76.26185 10.25036)

Plot the `sf` object

kl_pop_centers %>% 
  ggplot() +
  geom_sf()

Plot the `sf` object

kl_pop_centers %>% 
  ggplot() +
  geom_sf(aes(color = district))

Concept of the `sf` package

Dependencies of the `sf` package

Methods in `sf`

methods(class="sf")

  [1] $<-                          [                           
  [3] [[<-                         aggregate                   
  [5] anti_join                    arrange                     
  [7] as.data.frame                cbind                       
  [9] coerce                       dbDataType                  
 [11] dbWriteTable                 distinct                    
 [13] dplyr_reconstruct            drop_na                     
 [15] filter                       full_join                   
 [17] gather                       group_by                    
 [19] group_split                  identify                    
 [21] initialize                   inner_join                  
 [23] left_join                    merge                       
 [25] mutate                       nest                        
 [27] pivot_longer                 pivot_wider                 
 [29] plot                         print                       
 [31] rbind                        rename                      
 [33] right_join                   rowwise                     
 [35] sample_frac                  sample_n                    
 [37] select                       semi_join                   
 [39] separate                     separate_rows               
 [41] show                         slice                       
 [43] slotsFromS3                  spread                      
 [45] st_agr                       st_agr<-                    
 [47] st_area                      st_as_s2                    
 [49] st_as_sf                     st_as_sfc                   
 [51] st_bbox                      st_boundary                 
 [53] st_break_antimeridian        st_buffer                   
 [55] st_cast                      st_centroid                 
 [57] st_collection_extract        st_concave_hull             
 [59] st_convex_hull               st_coordinates              
 [61] st_crop                      st_crs                      
 [63] st_crs<-                     st_difference               
 [65] st_drop_geometry             st_filter                   
 [67] st_geometry                  st_geometry<-               
 [69] st_inscribed_circle          st_interpolate_aw           
 [71] st_intersection              st_intersects               
 [73] st_is                        st_is_valid                 
 [75] st_join                      st_line_merge               
 [77] st_m_range                   st_make_valid               
 [79] st_minimum_rotated_rectangle st_nearest_points           
 [81] st_node                      st_normalize                
 [83] st_point_on_surface          st_polygonize               
 [85] st_precision                 st_reverse                  
 [87] st_sample                    st_segmentize               
 [89] st_set_precision             st_shift_longitude          
 [91] st_simplify                  st_snap                     
 [93] st_sym_difference            st_transform                
 [95] st_triangulate               st_triangulate_constrained  
 [97] st_union                     st_voronoi                  
 [99] st_wrap_dateline             st_write                    
[101] st_z_range                   st_zm                       
[103] summarise                    transform                   
[105] transmute                    ungroup                     
[107] unite                        unnest                      
see '?methods' for accessing help and source code

Interactive `sf`

Light weight
Interactive
Cross Platform

Where to look for help?

https://posit.co/wp-content/uploads/2022/10/sf.pdf

Best Practices

Data Related
- Data Acquisition
- Data Cleaning
- Data Curation
Analysis Related

Exploratory Data Analysis

EDA is the critical first step.
EDA is a state of mind.
EDA is exploring your ideas.
EDA has no strict rules.
EDA helps understand your data.
EDA is an iterative cycle.
EDA is a creative process.

What is EDA?

It is mostly a philosophy of data analysis where the researcher examines the data without any pre-conceived ideas in order to discover what the data can tell him or her about the phenomena being studied.

detective work – numerical detective work – or counting detective work – or graphical detective work”

– Tukey, 1977** Page 1, Exploratory Data Analysis

Questions to ask in EDA

The easiest way to do EDA is to use questions as tools to guide your investigation. EDA is an important part of any data analysis, even if the questions are known already.

“There are no routine statistical questions, only questionable statistical routines.”

– Sir David Cox

“Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.”

– John Tukey

Asking the right questions

Key to asking quality questions is to generate a large quantity of questions.

It is difficult to ask revealing questions at the start of the analysis.

But, each new question will expose a new aspect and increase your chance of making a discovery.

6 W’s of Spatial EDA / ESDA

What?
Where?
When?
Who?
Why?
How?

Questions to ask:

What type of variation occurs within your variables?
What type of covariation occurs between your variables?
Whether your data meets your expectations or not.
Whether the quality of your data is robust or not.

The process of EDA

It is an iterative process

Import
Tidy
Explore

Transform
Visualize
Transform
Visualize
Transform
Visualise …
…

Steps for any good data anlysis project

Preparing Tidy Data

Data Cleaning
Data Wrangling

Data Exploration

Data Transformation
Data Visualization

Statistical Analysis

Prepare Results

Draw Inferences

Report Findings

Spatial Data Visualization

Cartographic Principles

Geography and Geospatial Science Working Group (GeoSWG) recognised the need for best practices in cartography.
- Visual contrast
- Legibility
- Figure-Ground Orientation
- Hierarchical Organization
- Balance

These guidelines, help the researchers develop high-quality, consistent map products.

Cartographic Guidelines for Public Health

CDC, Atlanta
Some important aspects:
- Map Elements
  - Title and Borders
  - North Arrow / Graticule / Scale
  - Inset Maps
  - Labels and Legend
- Other Elements
  - Data Sources
  - Dates
  - Projection

https://www.cdc.gov/dhdsp/maps/gisx/resources/cartographic_guidelines.pdf

Case Studies

Point Pattern Data
Areal Data
Raster Data
Network Data
Spatio-temporal Models
Machine Learning Methods
Big Data

Challenges and Future Directions

New Requirements for Spatial Analysis

Immediate: The time from action to insight is reducing dramatically
Fresh: Primary data needs to be days or months old not years old
Multi-source: Competitive alternative sources for completeness or validation
Continuous: Analysis can no longer be a point in time
Automated: Possibility to continuously replicate and connect to decision tools

Digital Twins

:::

Thank you!

Outline

Introduction

GIS and Public Health

Applications of GIS in Public Health

What is Spatial Data Science?

Spatial data science (SDS) is a subset of Data Science that focuses on the unique characteristics of spatial data, moving beyond simply looking at where things happen to understand why they happen there.

How is it different from Data Science?

How is it different from Data Science?

How is it different from Data Science?

Why Spatial Data Science for Public Health?

Why Spatial Data Science for Public Health?

Why Spatial Data Science for Public Health?

Why Spatial Data Science for Public Health?

Foundational Concepts

Core Concepts related to GIS

First Law of Geography

Spatial Dependence and Complete Spatial Randomness

Spatial Autocorrelation

Indices to measure Spatial Dependence

Core Concepts related to GIS

Map Projections & coordinate reference system (CRS)

Why is the CRS Important?

CRS in Action

Data Science Approach as a methodological approach

Reproducible Research

Reproducible Research

Tools for Spatial Data Science

R Spatial Analysis Ecosystem

R Spatial Learning Resources

The sf package

Geometry Types in sf

Loading sf package

Load spatial data into R

View the sf object

Plot the sf object

Plot the sf object

Concept of the sf package

Dependencies of the sf package

Methods in sf

Interactive sf

Where to look for help?

Best Practices

Best Practices

Exploratory Data Analysis

What is EDA?

Questions to ask in EDA

Asking the right questions

6 W’s of Spatial EDA / ESDA

Questions to ask:

The process of EDA

Steps for any good data anlysis project

Preparing Tidy Data

Data Exploration

Statistical Analysis

Prepare Results

Draw Inferences

Report Findings

Spatial Data Visualization

Cartographic Principles

Cartographic Guidelines for Public Health

Case Studies

Case Studies

Challenges and Future Directions

New Requirements for Spatial Analysis

Digital Twins

The `sf` package

Geometry Types in `sf`

Loading `sf` package

View the `sf` object

Plot the `sf` object

Plot the `sf` object

Concept of the `sf` package

Dependencies of the `sf` package

Methods in `sf`

Interactive `sf`