A course in Geographic Data Science
Dept. of Science & Technology
Three major focus areas
Healthcare Technology development
inter-disciplinary initiatives
Running MPH program since 1997, PhD programs since 2003
Data that defines geographical features like roads, rivers
Soil types, land use, elevation
Demographics, socioeconomic attributes
Environmental, climate, air-quality
Annotations that label features and places
It is especially true in public health
Hard Skills - Programming Language - Transparency and Reproducibility - Version control
Soft Skills - Communication - Storytelling - Geospatial analytics acumen - Ethical skills
Graphical User Interfaces (GUIs)
Command Line Interfaces (CLIs)
Command Line Interface (CLIs) of R software is a good way to bring in reproducible algorithms for GIS/SDS
Advanced Hardware: High-performance computer hardware and efficient algorithms allow us to process vast data sets quickly.
Scalable Software: Scalable solutions with the R environment help us to sift through the data deluge, and extract valuable insights from the noise.
Spatial Databases: The advent of spatial databases empowers us to store and manipulate manageable subsets within the vast ocean of spatial data.
Loukides (2011) What is Data Science?
Tied into the geo-data revolution
Accidental : created for different purposes but available for analysis as a side effect
Very diverse in nature: resolution and quality but, potentially much more detailed in both space and time
By encoding information visually, they allow to present large amounts of numbers in a meaningful way.
A real public health tool
Maps can fulfill several needs, looking very different depending on the end-goal.
MacEachren & Kraak (1997) identify three main dimensions:
Translating numbers into a (visual) language that the human brain “speaks better”
“forces us to notice what we never expected to see” (Tukey 1977: vi)
Mostly for ourselves in the course of the research process.
Many, quick and dirty, and rather unattractive graphs.
“forces readers to see the information the designer wanted to convey” (Kosslyn 1994: 271)
Mostly for others after the research is completed.
Few, carefully crafted, and attractive graphs.
Thematic map in which values of a variable are encoded using a color gradient of some sort
Counterpart of the histogram
Both allows us to gage the distribution of a variable
For a statistical method to be explicitly spatial, it needs to contain some representation of the geography, or spatial context. One of the most common ways is through Spatial Weights Matrices
Spatial Weights Matrices are building block for spatial analysis and statistics.
They are used to assign a weighted average or sum of neighbouring data values to an observation, or other point in space.
Relates to concepts of spatial ‘smoothing’ and interpolating data
They can be used to see how one’s characteristics or outcomes is correlated with their neighbours: e.g. education, criminality, disease risk factors,…
Spatial Weights represented by \[W\] N x N positive matrix that contains spatial relations that are translated into values
https://drarunmitra.github.io/GIS4Epidemiology/