Data Science for Public Health

Concepts, Applications and Case Studies



Dr. Arun Mitra Peddireddy
Dept of Community and Family Medicine,
All India Institute of Medical Sciences (AIIMS), Hyderabad

dr.arunmitra@gmail.com

Concepts

  • Essential concepts
  • Mainly to get the big picture
  • Spark interest rather than teaching

What is Public Health?


  • Public health is the science and practice of protecting and improving the health of a population.

  • Core functions include disease prevention, health promotion, and surveillance.


Public Health

Public Health is multi-faceted and interdisciplinary, involving experts from various fields, including epidemiology, biostatistics, health services research, and behavioral sciences, to address the complex determinants of health.

Why is Public Health Important?

Public Health vs. Clinical Medicine

  • Prevents disease, prolongs life, promotes health through organized efforts.
  • Focuses on population-level health issues, preventive measures rather than curative.
  • Public health prioritizes common good over individual interests.
  • It promotes structural actions for disease prevention in society.

Grand Challenges in Healthcare


To measure health & disease status accurately and economically in poor countries

  • GC 13: Develop technologies that permit quantitative assessment of population health status.
  • GC 14: Develop technologies that allow assessment of individuals for multiple conditions or pathogens at point-of-care.

Public health Challenges for the 21st century


  • poverty and economic disparities
  • climate change
  • infectious disease epidemics
  • living environments (urban living)
  • safe water and healthy food
  • non-communicable diseases including mental health
  • inadequate human resources in health
  • poor health systems financing


Towards Digital Transformation

Digtization \(\rightarrow\) Digitalization \(\rightarrow\) Digital Transformation


  • Paper Based Systems to Digital Systems
  • Rise of Health Information Systems
  • Threshold of exponential growth, fueled by rapid digital transformation.
  • Contributes to 20% of world’s data
  • One Billion digital health IDs by 2030
  • 10x in digital healthcare industry by 2030

National Push for a Digital Health Ecosystem

Recognising the need for a creating a unified Digital Health Ecosystem



2017: National Health Policy

“…attainment of the highest possible level of health and wellbeing for all at all ages”

\(\rightarrow\)

2019: National Digtal Health Blueprint

“…to create a framework for the National Health Stack and to create an action plan the comprehensive and holistic implementation of Digital Health”

National Digital Heath Blueprint

Domain Principles

Think Big, Start Small, Scale Fast

  • Educate
  • Empower
  • Accountability
  • Portability
  • Secure by Design

Technology Principles

Single Source of Truth

  • Building Blocks
  • Interoperable
  • Leverage Legacy
  • Open APIs
  • Minimalist by Design

Federated Architechture of the NDHB

The Last Mile Problem

Policy Intention - Health Outcomes: Gaps


  • Gap between health policy intentions and their practical implementation

  • Challenges within the existing health information systems

  • Need for Research on how to transform these technology adoption policies into actionable strategies that improve health outcomes

Policy Implementation Science


Learning Health Systems

Changing Public Health Landscape

  • Public health changes are slow, but crucial for societal well-being.
  • Public Health: Science vs. Advocacy
  • The need for participatory decision making in public health
  • Growing volume and complexity of public health data
  • Urgent need to generate evidence for policy making
  • Reproducibility and transparency in public health research

Applications

Doing Data Science

Data Science as a methodological approach

Note

The key word in data science is not data, it is science.

– Jeff Leek, JHU Data Science Lab

Reproducible Research

Reproducible Research


There are four key elements of reproducible research:

  • data documentation
  • data publication
  • code publication,
  • output publication.

GIS and Public Health

  • Spatial epidemiology studies health patterns in relation to geographic location.
  • Crucial for public health (e.g., disease outbreak investigations).
  • Extremely useful in providing a fresh outlook to public health.

  • Provides opportunity to enable overlaying data with its spatial representation

  • Supports better planning and decision-making.

  • The convergence of many new sub-disciplines:

    • medical geography
    • public health informatics
    • data science

Map of the plague in the province of Bari, Naples, 1690-1692

The map shows areas most affected and the boundaries of a military quarantine imposed to prevent its spread to neighboring towns and to other provinces.

Applications of GIS in Public Health

  • disease surveillance
  • environmental health
  • infectious diseases
    • mathematical modelling
    • agent based modelling
  • population genetics
  • medical imagining
  • cancer biology

While traditional uses of GIS in healthcare still are relevant, newer methods and advancing technology would be monumental for public health research.

What is Spatial Data Science?

Definition


Spatial data science (SDS) is a subset of Data Science that focuses on the unique characteristics of spatial data, moving beyond simply looking at where things happen to understand why they happen there.

CARTO - https://carto.com/what-is-spatial-data-science

Like data science, spatial data science seems to be a field that arises bottom-up in and from many existing scientific disciplines and industrial activities concerned with application of spatial data, rather than being a sub-discipline of an existing scientific discipline.

Edzer Pebesma, Roger Bivand - Spatial Data Science With Applications in R

How is it different from Data Science?

How is it different from Data Science?

How is it different from Data Science?

Why Spatial Data Science for Public Health?

Why Spatial Data Science for Public Health?

Why Spatial Data Science for Public Health?

Why Spatial Data Science for Public Health?



  • Wealth of Spatial Data

  • 70% of all data that is generated data has spatial attributes

  • Routine health data can be geo-referenced

  • Provide a gateway for researchers and practitioners to examine the role and harness the power of SDS in public health

  • Coupled with the emerging field of spatial statistics, the analysis of this location-based data is developing new and novel directions for public health.

Spatial Dependence and Complete Spatial Randomness


Spatial dependence is “the propensity for nearby locations to influence each other and to possess similar attributes”.

This means natural phenomenon are not spatially distributed at random.

  • temparature,
  • rainfall,
  • population density,
  • socio-economic conditions etc.

It can be measured by the indices of Spatial Autocorrelation.

Spatial Autocorrelation

Refers to the presence of systematic spatial variation in a mapped variable.

The terms spatial association and spatial dependence are often used to reflect spatial auto- correlation as well.

Indices to measure Spatial Dependence

  • Covariance Functions and Variograms

  • Global Spatial Autocorrelation Measures

    • Moran’s I index
    • General G-Statistic
    • Geary’s C index
  • Local Indicators of Spatial Association (LISA)

    • Local Moran’s I index
    • Getis-Ord Gi and Gi statistics
  • Space-Time Correlation Analysis

    • Bivariate Moran’s I for STC
    • Differential Moran’s I
    • Emerging Hot Spot Analysis (EHSA)

Tools for Spatial Data Science

  • GIS related
  • Data Science related
  • Spatial Data Science related

R is the best spatial data science tool available for public health !!!


R provides a range of powerful packages for geospatial analysis, enabling advanced computations and analytics.

R Spatial Analysis Ecosystem

R Spatial Learning Resources

  • Wealth of Resource material

  • Powerful tools/packages

  • seamlessly handle vector and raster data

  • inractive visualization

  • end-to-end solution


Newest addition: Spatial Data Science: With Applications in R

The sf package

install.packages("sf")

The sf package is an R implementation of Simple Features.

This package incorporates:

  • a new spatial data class system in R

  • functions for reading and writing data

  • tools for spatial operations on vectors

Geometry Types in sf

Case Studies

Case Study 1: COVID-19 in India

AMCHSS COVID-19 Dashboard

District Level Reports

# Specify the State 
state = "Kerala"

# Specify the District 
district = "Wayanad"

ICMR National COVID-19 Testing Database Project

Transforming COVID-19 testing data into actionable evidence for public health decision-making using epidemiological, spatiotemporal, and data-science methods

  • Dr. Biju Soman
  • Dr. Rakhal Gaitonde
  • Dr. Srikant Ambatipudi
  • Dr. Gurpreet Singh

Case Study 2: Caesarean Sections in India

Rising Caesaren Sections in India




  1. To study the patterns of caesarean section at the state and district level.
  2. To investigate spatial clustering of caesarean sections across districts.

Case Study 3: Access to Stroke in India

Stroke Access in India

  • We sought to evaluate the spatial distribution of and geographic accessibility to stroke centers in India.

  • Data Science Approach

  • Driving Distance and Travel Time estimations

  • Population Coverage estimations

The Way Forward

Data for Better Lives

Trust and Data

Where are we now?

Learning Health Systems

References

Additional Resources:

Thank You