-
Notifications
You must be signed in to change notification settings - Fork 16
Expand file tree
/
Copy path01-intro.Rmd
More file actions
184 lines (121 loc) · 12.3 KB
/
01-intro.Rmd
File metadata and controls
184 lines (121 loc) · 12.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
# Introduction to GIS {#introGIS}
```{r echo = FALSE, message = FALSE}
#source('libs/Common.R')
```
## What is a GIS?
A Geographic Information System is a multi-component environment used to create, manage, visualize and analyze data and its spatial counterpart. It's important to note that most datasets you will encounter in your lifetime can all be assigned a spatial location whether on the earth's surface or within some arbitrary coordinate system (such as a soccer field or a gridded petri dish). So in essence, any dataset can be represented in a GIS: the question then becomes "does it need to be analyzed in a GIS environment?" The answer to this question depends on the purpose of the analysis. If, for example, we are interested in identifying the ten African countries with the highest conflict index scores for the 1966-78 period, a simple table listing those scores by country is all that is needed.
<style type="text/css">
table {
width: 600px !important;
}
td{
font-size: 10px;
padding-top: 0px !important;
padding-bottom: 0px !important;
padding-right: 45px !important;
padding-left: 5px !important;
}
</style>
```{r table-conflict, fig.cap="Index of total African conflict for the 1966–78 period [@Anselin1992a].", fig.env="table", fig.align='center', out.width='80%', echo = FALSE}
library(xtable)
library(spdep)
library(kableExtra)
data(afcon)
x <- data.frame(Country = afcon$name, Conflicts = afcon$totcon)
# Split dataframe into two parts
x <- x[ order(-x$Conflicts),]
x2 <- split(x, c(rep(1,floor(nrow(x)/2)), rep(2,floor(nrow(x)/2))))
x3 <- as.data.frame(x2)
colnames(x3) <- rep(c("Country", "Conflicts"),2)
rownames(x3) <-NULL
# Print output
knitr::kable(x3, booktabs=TRUE, caption="Index of total African conflict for the 1966-78 period [@Anselin1992a].")
```
*Data source: Anselin, L. and John O'Loughlin. 1992. Geography of international conflict and cooperation: spatial dependence and regional context in Africa. In The New Geopolitics, ed. M. Ward, pp. 39-75.*
A simple sort on the Conflict column reveals that `r as.vector(x[1:10,]$Country)` are the top ten countries.
What if we are interested in knowing whether countries with a high conflict index score are geographically clustered, does the above table provide us with enough information to help answer this question? The answer, of course, is no. We need additional data pertaining to the geographic location and shape of each country. A map of the countries would be helpful.
```{r map-africa, echo=FALSE, fig.cap = "Choropleth representation of African conflict index scores. Countries for which a score was not available are not mapped.", fig.width=5.5, fig.height=5.5}
knitr::include_graphics("img/Africa_conflicts.png", dpi=300)
```
This example demonstrates how spatial data can uncover spatial patterns that are invisible in tabular formats. It highlights the importance of integrating location into data analysis. Maps often prioritize spatial relationships, while tables emphasize numerical comparisons. Understanding this hierarchy helps choose the right tool for the question.
Maps are ubiquitous: available online and in various print medium. But we seldom ask how the boundaries of the map features are encoded in a computing environment? After all, if we expect software to assist us in the analysis, the spatial elements of our data should be readily accessible in a digital form. Spending a few minutes thinking through this question will make you realize that simple tables or spreadsheets are not up to this task. A more complex data storage mechanism is required. This is the core of a GIS environment: a spatial database that facilitates the storage and retrieval of data that define the spatial boundaries, lines or points of the entities we are studying. This may seem trivial, but without a spatial database, most spatial data exploration and analysis would not be possible!
### GIS software
Many GIS software applications are available--both commercial and open source. Two popular applications are **ArcGIS Pro** and **QGIS**.
#### ArcGIS
A popular commercial desktop GIS software is [**ArcGIS Pro**](https://www.esri.com/en-us/arcgis/products/arcgis-pro/overview) developed by [Esri](https://en.wikipedia.org/wiki/Esri) (pronounced *ez-ree*). Esri was once a small land-use consulting firm which did not start developing GIS software until the mid 1970s. ArcGIS Pro comes in different licensing levels and can be purchased with additional *add-on* packages. As such, a single license can range from a few thousand dollars to well over ten thousand dollars. In addition to software licensing costs, ArcGIS is only available for Windows operating systems--so, if your workplace is a Mac only environment, the purchase of a Windows PC would add to the expense.
### QGIS
A very capable open source (free) GIS software is [**QGIS**](http://qgis.org). It encompasses most of the functionality included in ArcGIS Pro. If you are looking for a GIS application for your Mac or Linux environment, QGIS is a wonderful choice given its multi-platform support. Built into the current versions of QGIS are functions from another open source software: **GRASS**. GRASS has been around since the 1980's and has many advanced GIS data manipulation functions however, its use is not as intuitive as that of QGIS or ArcGIS (hence the preferred QGIS alternative).
## What is Spatial Analysis?
A distinction is made in this course between **GIS** and **spatial analysis**.
In mainstream GIS software, the term *analysis* typically refers to operations such as data manipulation and querying. In contrast, *spatial analysis* focuses on the statistical examination of spatial patterns and the processes that may have generated them. More broadly, spatial analysis seeks to answer questions like: *"What could have caused the observed spatial pattern?"* It is an **exploratory process** in which we quantify spatial patterns and investigate the underlying mechanisms that may explain their distribution.
For example, imagine you record the location of each tree within a well-defined study area. Mapping these locations is a typical **GIS task**. Once the trees are mapped, you may begin to draw inferences about the spatial pattern: *Are the trees clustered or dispersed? Is tree density consistent across the study area? Could environmental factors such as soil type or slope have influenced the observed distribution?* These are the kinds of questions addressed through **spatial analysis**, using quantitative and statistical techniques to explore and explain spatial patterns.
```{r f01-ppp, echo=FALSE, fig.cap = "Distribution of Maple trees in a 1,000 x 1,000 ft study area.", fig.width=2, fig.height=2}
library(spatstat)
OP <- par( mar=c(0,0,0,0) )
plot(split(lansing)$maple, pch=16, cex=0.5, cols="#222222",main="")
par(OP)
```
In this course, you’ll learn that while popular GIS software like **ArcGIS Pro** excels at creating and manipulating spatial data, it is limited when it comes to analyzing the patterns and processes that may have produced those data. To move beyond basic data handling and explore deeper spatial relationships, we turn to more robust quantitative tools. One such tool is **R**--a free, open-source data analysis environment.
R offers one of the richest collections of spatial data analysis and statistical packages available today. Learning to work in the **R** programming environment will be highly beneficial, as many of the skills you acquire are transferable to a wide range of quantitative analysis tasks, both spatial and non-spatial.
[R](http://www.r-project.org/) can be installed on both Windows and Mac operating systems. Another related piece of software that you might find useful is [RStudio](https://posit.co/download/rstudio-desktop/) which offers a nice interface to R. To learn more about data analysis in R, visit the [ES218 course website](http://mgimond.github.io/ES218/).
## What's in an Acronym?
GIS is a ubiquitous technology. Many of you are taking this course in part because you have seen GIS listed as a "desirable"" or "required" skill in job postings. Often, GIS is thought of primarily as a “map-making” tool, a perception shared by many casual users in the workforce. While visualizing data is indeed a key feature of GIS, it is equally important to consider *what* data is being visualized and *why*.
O'Sullivan and Unwin [@Unwin1] use the term **accidental geographer** to describe individuals *"whose understanding of geographic science is based on the operations made possible by GIS software"*. Building on this idea, we introduce the term **accidental data analyst**--someone whose grasp of data and its analysis is limited to the point-and-click interfaces of popular software such as spreadsheets, statistical packages, and GIS platforms. The aggressive marketing of GIS technology has at times, placed *technology* ahead of *purpose* and *theory*. This concern is not unique to GIS; similar issues arose decades ago when personal computers made it easier to graph non-spatial data and perform statistical procedures.
The different purposes of mapping spatial data closely parallel the goals of graphing non-spatial data. **John Tukey** [@Tukey1972] identified three broad categories of graphical displays:
* "*Graphs from which numbers are to be read off--substitutes for tables.*
* *Graphs intended to show the reader what has already been learned (by some other technique)--these we shall sometimes impolitely call propaganda graphs.*
* *Graphs intended to let us see what may be happening over and above what we have already described- these are the analytical graphs that are our main topic.*"
A GIS-based analogy to Tukey’s categories might be:
* **Reference maps** (USGS maps, hiking maps, road maps): used to navigate landscapes or identify locations of interest.
* **Presentation maps**: designed to convey a specific narrative. While we avoid Tukey’s term “propaganda,” it’s worth noting that maps can be used to persuade.
* **Statistical maps**: created to manipulate raw data in ways that reveal patterns not immediately visible. These often require multiple data transformations and may benefit from being explored both within and outside a spatial context.
This course emphasizes the last two categories of spatial data visualization, with a particular focus on **statistical maps**.
## Course Roadmap
This course is divided into two main parts, each focusing on distinct aspects of spatial data science.
### Part 1: Working with Spatial Data
This section introduces foundational GIS concepts and tools for data manipulation and visualization.
1. **Introduction to GIS & Spatial Analysis**
- What is GIS?
- What is spatial analysis?
- GIS software overview
2. **Feature Representation**
- Vector vs. Raster
- Object vs. Field views
- Scale and attribute tables
3. **GIS Data Management**
- File formats and project organization
- Managing data in ArcGIS
4. **Symbolizing Features**
- Color theory and classification
- Choropleth mapping techniques
5. **Statistical Maps**
- Mapping distributions and uncertainty
- Classification intervals and outlier detection
6. **Pitfalls to Avoid**
- MAUP, ecological fallacy, unstable rates
7. **Good Map Making Tips**
- Map elements, layout, and typography
8. **Spatial Operations and Vector Overlays**
- Selection, overlays, and spatial queries
9. **Coordinate Systems**
- Geographic vs. projected systems
- Spatial properties and geodesic geometries
10. **Map Algebra**
- Local, focal, zonal, and global raster operations
### Part 2: Exploratory Spatial Data Analysis
This section focuses on statistical analysis of spatial patterns.
12. **Spatial Trends**
- First order analysis of field variables
- Fitting polynomial models
13. **Spatial Autocorrelation**
- Second order property of field variables
- Global and local Moran’s I
14. **Point Pattern Analysis: First order analysis**
- Density and distance-based methods
- Testing for CSR processes
15. **Point Pattern Analysis: Second order analysis**
- ANN analysis
- K and L functions
- Paired correlation function
15. **Spatial Interpolation**
- Deterministic (IDW, Thiessen) and statistical (Kriging) methods