C:/Users/Arun/Dropbox/Research/Spatial_Data_Science_Talk/spatial_files/kl_pop_centers
├── kl_pop_centers.dbf
├── kl_pop_centers.prj
├── kl_pop_centers.shp
└── kl_pop_centers.shx
Spatial Data Science
for Public Health
Best Practices and Case Studies
Dr. Arun Mitra Peddireddy
SCTIMST, Trivandrum
Spatial Epidemiology Series | Webinar No. 3 | 04 Nov 2023
Extremely useful in providing a fresh outlook to public health.
Provides opportunity to enable overlaying data with its spatial representation
Supports better planning and decision-making.
The convergence of many new sub-disciplines:
Map of the plague in the province of Bari, Naples, 1690-1692
The map shows areas most affected and the boundaries of a military quarantine imposed to prevent its spread to neighboring towns and to other provinces.
Koch T. Mapping the miasma: air, health, and place in early medical mapping. Cartographic Perspectives. 2005 Sep 1(52):4-27.
While traditional uses of GIS in healthcare still are relevant, newer methods and advancing technology would be monumental for public health research.
Definition
Like data science, spatial data science seems to be a field that arises bottom-up in and from many existing scientific disciplines and industrial activities concerned with application of spatial data, rather than being a sub-discipline of an existing scientific discipline.
Edzer Pebesma, Roger Bivand - Spatial Data Science With Applications in R
Wealth of Spatial Data
70% of all data that is generated data has spatial attributes
Routine health data can be geo-referenced
Provide a gateway for researchers and practitioners to examine the role and harness the power of SDS in public health
Coupled with the emerging field of spatial statistics, the analysis of this location-based data is developing new and novel directions for public health.
Spatial data are fundamental to many geographical analyses and spatial data science draws strongly from key geographical concepts
Tobler’s First Law
“Everything is related to everything else, but near things are more related than distant things”
Waldo Tobler, 1970
Spatial dependence is “the propensity for nearby locations to influence each other and to possess similar attributes”.
This means natural phenomenon are not spatially distributed at random.
It can be measured by the indices of Spatial Autocorrelation.
Refers to the presence of systematic spatial variation in a mapped variable.
The terms spatial association and spatial dependence are often used to reflect spatial auto- correlation as well.
Covariance Functions and Variograms
Global Spatial Autocorrelation Measures
Local Indicators of Spatial Association (LISA)
Space-Time Correlation Analysis
Map projections try to transform the earth from its spherical shape (3D) to a planar shape (2D).
A CRS then defines how the two-dimensional, projected map in your GIS relates to real places on the earth.
The decision of which depends on the extent of the area, analysis type, and often on the availability of data.
Note
The key word in data science is not data, it is science.
– Jeff Leek, JHU Data Science Lab
There are four key elements of reproducible research:
R is the best spatial data science tool available for public health !!!
R provides a range of powerful packages for geospatial analysis, enabling advanced computations and analytics.
Wealth of Resource material
Powerful tools/packages
seamlessly handle vector and raster data
inractive visualization
end-to-end solution
Newest addition: Spatial Data Science: With Applications in R
sf
packageThe sf package is an R implementation of Simple Features.
This package incorporates:
a new spatial data class system in R
functions for reading and writing data
tools for spatial operations on vectors
sf
sf
packageshape_file <- here("spatial_files", "kl_pop_centers", "kl_pop_centers.shp")
kl_pop_centers <- st_read(shape_file)
Reading layer `kl_pop_centers' from data source
`C:\Users\Arun\Dropbox\Research\Spatial_Data_Science_Talk\spatial_files\kl_pop_centers\kl_pop_centers.shp'
using driver `ESRI Shapefile'
Simple feature collection with 170 features and 14 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 74.95388 ymin: 8.35761 xmax: 77.28071 ymax: 12.60804
Geodetic CRS: WGS 84
sf
objectSimple feature collection with 170 features and 14 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 74.95388 ymin: 8.35761 xmax: 77.28071 ymax: 12.60804
Geodetic CRS: WGS 84
First 10 features:
Rotation Scale name_of_to district state ELEVATION District_1
1 0 0 VADAKKANCHERI PALAKKAD KERALA 0 P>LAKK>D
2 0 0 ANGAMALI ERNAKULAM KERALA 0 ERN>KULAM
3 0 0 MALAYATLUR ERNAKULAM KERALA 0 ERN>KULAM
4 0 0 KALADI ERNAKULAM KERALA 0 ERN>KULAM
5 0 0 TOMALLUR PATTANAMTITTA KERALA 0 PATTANAMTHITTA
6 0 0 GURUVAYUR THRISSUR KERALA 0 TRISS@R
7 0 0 TRIMBRANALLUR THRISSUR KERALA 0 TRISS@R
8 0 0 KADIKKAD THRISSUR KERALA 0 TRISS@R
9 0 0 CHALAKUDI THRISSUR KERALA 0 TRISS@R
10 0 0 MALA THRISSUR KERALA 0 TRISS@R
STATE_1 TEHSIL Shape_Leng Shape_Area pop_2020 lon lat
1 KERALA >LATT@R 150475.2 578919605 456575 76.48236 10.591936
2 KERALA >LUVA 155736.4 550828742 615928 76.38866 10.200267
3 KERALA >LUVA 155736.4 550828742 615928 76.51483 10.197496
4 KERALA >LUVA 155736.4 550828742 615928 76.43405 10.167068
5 KERALA AD@R 110539.2 270281878 187760 76.68601 9.227245
6 KERALA CH>LAKKUDI 376808.1 1270626860 1153806 76.04651 10.598107
7 KERALA CH>LAKKUDI 376808.1 1270626860 1153806 76.10642 10.523958
8 KERALA CH>LAKKUDI 376808.1 1270626860 1153806 75.96130 10.681153
9 KERALA CH>LAKKUDI 376808.1 1270626860 1153806 76.34039 10.308017
10 KERALA CH>LAKKUDI 376808.1 1270626860 1153806 76.26185 10.250355
geometry
1 POINT (76.48236 10.59194)
2 POINT (76.38866 10.20027)
3 POINT (76.51483 10.1975)
4 POINT (76.43405 10.16707)
5 POINT (76.68601 9.227245)
6 POINT (76.04651 10.59811)
7 POINT (76.10642 10.52396)
8 POINT (75.9613 10.68115)
9 POINT (76.34039 10.30802)
10 POINT (76.26185 10.25036)
sf
objectsf
objectsf
packagesf
packagesf
[1] $<- [
[3] [[<- aggregate
[5] anti_join arrange
[7] as.data.frame cbind
[9] coerce dbDataType
[11] dbWriteTable distinct
[13] dplyr_reconstruct drop_na
[15] filter full_join
[17] gather group_by
[19] group_split identify
[21] initialize inner_join
[23] left_join merge
[25] mutate nest
[27] pivot_longer pivot_wider
[29] plot print
[31] rbind rename
[33] right_join rowwise
[35] sample_frac sample_n
[37] select semi_join
[39] separate separate_rows
[41] show slice
[43] slotsFromS3 spread
[45] st_agr st_agr<-
[47] st_area st_as_s2
[49] st_as_sf st_as_sfc
[51] st_bbox st_boundary
[53] st_break_antimeridian st_buffer
[55] st_cast st_centroid
[57] st_collection_extract st_concave_hull
[59] st_convex_hull st_coordinates
[61] st_crop st_crs
[63] st_crs<- st_difference
[65] st_drop_geometry st_filter
[67] st_geometry st_geometry<-
[69] st_inscribed_circle st_interpolate_aw
[71] st_intersection st_intersects
[73] st_is st_is_valid
[75] st_join st_line_merge
[77] st_m_range st_make_valid
[79] st_minimum_rotated_rectangle st_nearest_points
[81] st_node st_normalize
[83] st_point_on_surface st_polygonize
[85] st_precision st_reverse
[87] st_sample st_segmentize
[89] st_set_precision st_shift_longitude
[91] st_simplify st_snap
[93] st_sym_difference st_transform
[95] st_triangulate st_triangulate_constrained
[97] st_union st_voronoi
[99] st_wrap_dateline st_write
[101] st_z_range st_zm
[103] summarise transform
[105] transmute ungroup
[107] unite unnest
see '?methods' for accessing help and source code
sf
https://posit.co/wp-content/uploads/2022/10/sf.pdf
EDA is the critical first step.
EDA is a state of mind.
EDA is exploring your ideas.
EDA has no strict rules.
EDA helps understand your data.
EDA is an iterative cycle.
EDA is a creative process.
It is mostly a philosophy of data analysis where the researcher examines the data without any pre-conceived ideas in order to discover what the data can tell him or her about the phenomena being studied.
detective work – numerical detective work – or counting detective work – or graphical detective work”
– Tukey, 1977** Page 1, Exploratory Data Analysis
The easiest way to do EDA is to use questions as tools to guide your investigation. EDA is an important part of any data analysis, even if the questions are known already.
“There are no routine statistical questions, only questionable statistical routines.”
– Sir David Cox
“Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.”
– John Tukey
Key to asking quality questions is to generate a large quantity of questions.
It is difficult to ask revealing questions at the start of the analysis.
But, each new question will expose a new aspect and increase your chance of making a discovery.
What?
Where?
When?
Who?
Why?
How?
What type of variation occurs within your variables?
What type of covariation occurs between your variables?
Whether your data meets your expectations or not.
Whether the quality of your data is robust or not.
It is an iterative process
Geography and Geospatial Science Working Group (GeoSWG) recognised the need for best practices in cartography.
These guidelines, help the researchers develop high-quality, consistent map products.
Point Pattern Data
Areal Data
Raster Data
Network Data
Spatio-temporal Models
Machine Learning Methods
Big Data
Immediate: The time from action to insight is reducing dramatically
Fresh: Primary data needs to be days or months old not years old
Multi-source: Competitive alternative sources for completeness or validation
Continuous: Analysis can no longer be a point in time
Automated: Possibility to continuously replicate and connect to decision tools
:::
Thank you!