Introduction to Spatial Data Science
Acknowledge
- Dr. Elisabetta Pietrostefani &
- Dr. Carmen Cabrera-Arnau
A course in Geographic Data Science
SCTIMST, Trivandrum
- Sree Chitra Tirunal Institute for Medical Sciences & Technology, Trivandrum
- An Institution of National Importance established by the Act of the Indian Parliament (Act No.52, 1980)
Dept. of Science & Technology
Three major focus areas
- Bio-Medical Technology Wing
- Super specialty Hospital
- Public Health (AMCHSS)
Healthcare Technology development
inter-disciplinary initiatives
Running MPH program since 1997, PhD programs since 2003
Introduction
- Public Health - Science vs. Advocacy
- The need for participatory decision making in public health
- The transparency of open data science approach
- The beauty of computational reports, presentations, etc.
Work plan
Lectures
- Essential concepts
- Mainly to get the big picture
- Enthusing interest rather than teaching
- Welcome to the open data science initiative!
Lab work
- Do it yourself
- Get skilled in the process
- Come out of your comfort zones and collaborate!
- Use data for dialogue!
What information does GIS use?
Data that defines geographical features like roads, rivers
Soil types, land use, elevation
Demographics, socioeconomic attributes
Environmental, climate, air-quality
Annotations that label features and places
What is Spatial Data Science?
Spatial Data Science
- Analyse and extract insights from geospatial data
- Work with real-world data on a number of domains and problems
- Acquire key data science skills and important tools to answer spatial questions
It is especially true in public health
GIS
Layers - Image - Data
GIS world vs. Real World
Skills for public health data science
Hard Skills - Programming Language - Transparency and Reproducibility - Version control
Soft Skills - Communication - Storytelling - Geospatial analytics acumen - Ethical skills
R software for Spatial Data Science (SDS)
Graphical User Interfaces (GUIs)
- QGIS and GRASS has revolutionized Open source Spatial Information Systems (GIS).
- However, the reproducibility aspect has many challenges
Command Line Interfaces (CLIs)
Command Line Interface (CLIs) of R software is a good way to bring in reproducible algorithms for GIS/SDS
The Spatial Data ‘Revolution’
Advanced Hardware: High-performance computer hardware and efficient algorithms allow us to process vast data sets quickly.
Scalable Software: Scalable solutions with the R environment help us to sift through the data deluge, and extract valuable insights from the noise.
Spatial Databases: The advent of spatial databases empowers us to store and manipulate manageable subsets within the vast ocean of spatial data.
SDS in Public Health
- Data Science:“gathering data messaging it into a tractable form, making it tell its story and presenting that story to others”
Loukides (2011) What is Data Science?
Traditional datasets in healthcare
- Collected for the purpose (carefully designed)
- Detailed and informative (“rich profile and portraits of the country”)
- High quality
Traditional health and allied sector data
- Massive enterprises (very costly)
- Coarse in resolution (to preserve privacy they need to be aggregated)
- Slow - the more detailed, the less frequent they are available
Examples
- Decennial census (census geographies)
- Longitudinal surveys
- Custom collected surveys, interviews etc.
- Economic or well-being indicators
New Forms of spatial data
Tied into the geo-data revolution
Accidental : created for different purposes but available for analysis as a side effect
Very diverse in nature: resolution and quality but, potentially much more detailed in both space and time
Challenges (Arribas-Bel, 2014)
- Bias
- Technical barriers
- Methodological “mismatch”
Part 2
(Geo)visualisation
By encoding information visually, they allow to present large amounts of numbers in a meaningful way.
A map for everyone
A real public health tool
Maps can fulfill several needs, looking very different depending on the end-goal.
MacEachren & Kraak (1997) identify three main dimensions:
- Knowledge of what is being plotted
- Target audience
- Degree of interactivity
MacEachren & Kraak (1997)
DiBiase’s (1990) “Swoopy”
Translating numbers into a (visual) language that the human brain “speaks better”
Exploratory Visualization
“forces us to notice what we never expected to see” (Tukey 1977: vi)
Mostly for ourselves in the course of the research process.
Many, quick and dirty, and rather unattractive graphs.
Explanatory Visualization
“forces readers to see the information the designer wanted to convey” (Kosslyn 1994: 271)
Mostly for others after the research is completed.
Few, carefully crafted, and attractive graphs.
Choropleths
Thematic map in which values of a variable are encoded using a color gradient of some sort
Counterpart of the histogram
Both allows us to gage the distribution of a variable
Part 3
Spatial Weights
For a statistical method to be explicitly spatial, it needs to contain some representation of the geography, or spatial context. One of the most common ways is through Spatial Weights Matrices
- (Geo)Visualization: translating numbers into a (visual) language (colors) that the human brain can interpret.
- Spatial Weights Matrices: translating geography into a (numerical) language that a computer can interpret.
Spatial Weight Matrices
Spatial Weights Matrices are building block for spatial analysis and statistics.
They are used to assign a weighted average or sum of neighbouring data values to an observation, or other point in space.
Relates to concepts of spatial ‘smoothing’ and interpolating data
They can be used to see how one’s characteristics or outcomes is correlated with their neighbours: e.g. education, criminality, disease risk factors,…
Core element in several spatial analysis techniques
- Spatial autocorrelation
- Spatial clustering/geo-demographics
- Spatial regression
Spatial Weights
Spatial Weights represented by \[W\] N x N positive matrix that contains spatial relations that are translated into values
- If you are not a neighbour, \(value = 0\)
- If you are a neighbour, \(value<0\)
Website
https://drarunmitra.github.io/GIS4Epidemiology/