Spatial/05-statistical-maps.Rmd at main · mgimond/Spatial · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
---
output: html_document
editor_options:
  chunk_output_type: console
---
# Statistical maps

```{r, message=FALSE,warning=FALSE,echo=FALSE}
library(sf)

load(url("https://github.com/mgimond/Spatial/raw/main/Data/moransI.RData"))

#z <- gzcon(url("https://github.com/mgimond/Spatial/raw/main/Data//ma2.rds"))
#ma <- unwrap(readRDS(z))
ma <- readRDS("Data/ma2.rds")
```


## Introduction

In the previous chapter, we explored how visual variables--particularly color--can be used to symbolize spatial features. We examined the perceptual dimensions of hue, lightness, and saturation, and how different color schemes (qualitative, sequential, and divergent) can be matched to the nature of the data being mapped. We also saw how classification intervals influence the appearance and interpretability of choropleth maps.

This chapter builds on those foundations by shifting focus from the aesthetics and perception of map design to the **statistical logic** behind classification schemes. Rather than choosing breaks arbitrarily or purely for visual balance, we explore how statistical principles can guide the discretization of continuous spatial data. This includes methods such as equal intervals, quantiles, boxplots, and standard deviation units, each offering a different lens through which to interpret spatial distributions.

We also extend the discussion to mapping uncertainty, a critical but often overlooked aspect of spatial analysis. Many datasets--especially those derived from surveys--carry margins of error or standard errors that affect how confidently we can interpret mapped patterns. This chapter introduces techniques for visualizing uncertainty and simulating its impact on spatial rankings and statistical relationships.


## Statistical distribution maps

Many spatial datasets contain continuous variables, meaning that each geographic unit--such as a polygon in a data layer--can have a unique value. When these values are mapped using a one-to-one color assignment, the result is a type of thematic map known as a **choropleth map**, where each polygon is shaded according to its attribute value.  For example, a map of Massachusetts showing median household income with a unique color for each tract would produce a visually complex and potentially overwhelming display.

```{r continuous, fig.width = 6.5, fig.height=2.5, echo=FALSE, fig.cap = "Example of a continuous color scheme applied to a choropleth map.", fig.align='center'}
library(ggplot2)
library(gridExtra)
library(classInt)
ma$per_vac <- ma$vacant / ma$units * 100


ggplot(ma, aes(fill=house_inc)) + geom_sf() + theme_void() +
  scale_fill_gradientn(colours = c("lightgreen", "darkgreen"), name = "Income ($)")


```

While technically accurate, such maps often obscure broader patterns and make interpretation difficult. To address this, we turn to statistical classification methods that group continuous values into meaningful categories, allowing for clearer visual communication and more effective spatial analysis.

```{r equalint, fig.width = 6.5, fig.height=2.5, echo=FALSE, fig.cap = "An equal interval choropleth map using 10 bins.", fig.align='center'}
clint <- classIntervals(ma$house_inc, style = "equal", n = 10)$brks

p1 <- ggplot(ma, aes(fill=house_inc)) + geom_sf() + theme_void() +
  scale_fill_stepsn(colors = c("#D9EF8B",  "darkgreen") ,
                    breaks = clint[2:(length(clint)-1) ],
                    values = scales::rescale(clint[2:(length(clint)-1) ], c(0,1)),
                    guide = guide_coloursteps(even.steps = FALSE,
                                              show.limits = TRUE,
                                              title = NULL,
                                              barheight = unit(2.2, "in"),
                                              barwidth = unit(0.15, "in"),
                                              label.position = "left"))

p2<- ggplot(ma, aes(house_inc)) + geom_histogram(bins = 10) +
  scale_x_continuous(breaks = c(0,125000.5, 250001),position = "top") +
  coord_flip() + scale_y_continuous(labels = NULL, breaks = NULL) +
  ylab(NULL) +xlab(NULL) +
   theme(plot.margin = margin(0.1,0,0,0.1, "in"),
         axis.text = element_text(colour = "grey"))
grid.arrange(p1,p2, layout_matrix=rbind(c(1,1,1,2)))
```

The histogram accompanying the map is rotated vertically to align each bin with its corresponding color swatch. The length of each gray bar in the histogram reflects the number of polygons assigned to each color category, offering a quick sense of how values are distributed across the map.

An equal interval classification scheme divides the full range of data values into intervals of equal width. This approach ensures that each color swatch represents the same span of values, making it easier to compare differences between categories. Because this method does not assume a central reference point, a sequential color scheme—typically progressing from light to dark—is used to convey increasing magnitude.

### Quantile map

While equal interval classification offers intuitive comparisons by assigning each color swatch an equal range of values, it can be misleading when the data are unevenly distributed. In such cases, many polygons may cluster within a few intervals, leaving others sparsely populated. This imbalance can obscure meaningful spatial patterns.

Quantile classification addresses this issue by dividing the data into intervals that each contain an equal number of observations. For example, a map using six quantiles ensures that each color swatch is applied to approximately the same number of polygons. This approach enhances the map's exploratory power and can help reveal spatial clusters that might be hidden in an equal interval map.

```{r quantile, fig.width = 6.5, fig.height=2.5, echo=FALSE, fig.cap = "Example of a quantile map.", fig.align='center'}
clint <- classIntervals(ma$house_inc, style = "quantile", n = 6)$brks

p1 <- ggplot(ma, aes(fill=house_inc)) + geom_sf() + theme_void() +
  scale_fill_stepsn(colors = c("#D9EF8B",  "darkgreen") ,
                    breaks = clint[2:(length(clint)) ],
                    values = scales::rescale(clint[1:(length(clint)) ], c(0,1)),
                    guide = guide_coloursteps(even.steps = FALSE,
                                              show.limits = FALSE,
                                              title = NULL,
                                              barheight = unit(2.0, "in"),
                                              barwidth = unit(0.15, "in"),
                                              label.position = "left"))

p2<-  ggplot(ma, aes(house_inc)) + geom_histogram(breaks = clint, col = "white") +
  scale_x_continuous(breaks = c(0,125000.5, 250001),position = "top") +
  coord_flip() + scale_y_continuous(labels = NULL, breaks = NULL) +
  ylab(NULL) +xlab(NULL) +
  theme(plot.margin = margin(0.17,0,0.1,0.1, "in"),
        axis.text = element_text(colour = "grey"))

grid.arrange(p1,p2, layout_matrix=rbind(c(1,1,1,2)))
options(scipen = 9999)
```

You'll note the differing color swatch *lengths* in the color bar reflecting the different ranges of values covered by each color swatch. For example, the darkest color swatch covers the largest range of values, [`r clint[6:7]`], yet it is applied to the same number of polygons as most other color swatches in this classification scheme.

### Boxplot map

Another approach to classifying continuous spatial data is to use **summary statistics** that describe the distribution’s central tendency and spread. The **boxplot**, a common statistical visualization, provides five summary statistics including the median, the upper and lower quartiles (within which 50% of the data lie--also known as the interquartile range,IQR), and upper and lower "whiskers" that encompass 1.5 times the interquartile range. The boxplot may also display "outliers"--data points that may be deemed unusual or not characteristic of the bulk of the data.

In the context of mapping, these summary statistics can be used to define classification breaks. A **boxplot map** applies color swatches to polygons based on where their values fall within the boxplot-defined intervals.

This method is particularly useful when the goal is to highlight the shape of the distribution--whether it is symmetrical, skewed, or contains outliers. Because the boxplot includes a measure of centrality (the median), a divergent color scheme is often appropriate. This allows the map to visually emphasize deviations from the center, helping to identify regions that are unusually high or low relative to the bulk of the data.

```{r boxmap, fig.width = 6.5, fig.height = 2.56, echo = FALSE, fig.cap = "Example of a boxplot map.", fig.align='center'}
clint <- classIntervals(ma$house_inc, style = "box")$brks

p1 <- ggplot(ma, aes(fill=house_inc)) + geom_sf() + theme_void() +
  scale_fill_stepsn(colors = c("#1A9850", "#91CF60", "#D9EF8B",
                               "#FEE08B", "#FC8D59", "#D73027") ,
                    breaks = clint[2:(length(clint)) ],
                    values = scales::rescale(clint[2:(length(clint)) ], c(0,1)),
                    guide = guide_coloursteps(even.steps = FALSE,
                                              show.limits = FALSE,
                                              title = NULL,
                                              barheight = unit(2.0, "in"),
                                              barwidth = unit(0.15, "in"),
                                              label.position = "left"))

p2<- ggplot(ma, aes(house_inc)) + geom_boxplot() +
  scale_x_continuous(breaks = clint[1:length(clint)],
                     labels = clint[1:length(clint)],
                     position = "top") +
                                 coord_flip() +
  scale_y_continuous(labels = NULL, breaks = NULL) + xlab("") +
   theme(plot.margin = margin(0.22, 0.0, 0.15, 0.1, "in"),
         axis.text = element_text(colour = "grey"))
grid.arrange(p1,p2, layout_matrix=rbind(c(1,1,1,2)))
```

Boxplot maps strike a balance between statistical rigor and visual interpretability. They are especially effective when the data distribution is not normal and when understanding the spread and extremes is as important as identifying central values.

### IQR map

The **interquartile range (IQR) map** is a simplified version of the boxplot map that focuses on the middle 50% of the data. Instead of dividing the distribution into multiple segments, the IQR map reduces the classification to just three categories: values within the IQR, values below the IQR, and values above the IQR.

This approach is particularly useful when the goal is to highlight the "core" of the distribution--those observations that are neither exceptionally high nor low. By emphasizing the middle range, the IQR map can reveal spatial patterns that are less influenced by outliers or extreme values. For example, while previous maps may have consistently emphasized an east-west gradient in income across Massachusetts, the IQR map may show that middle-income households are more evenly distributed across the state.

```{r iqrmap, fig.width = 6.5, fig.height = 2.56, echo = FALSE, fig.cap = "Example of an IQR map.", fig.align='center'}
clint <- classIntervals(ma$house_inc, style = "box")$brks
clint <- clint[c(1,3,5,7)]

p1 <- ggplot(ma, aes(fill=house_inc)) + geom_sf(col="grey70") + theme_void() +
  scale_fill_stepsn(colors = c("#FFA50000", "black", "#00660000") ,
                    breaks = clint[1:(length(clint)) ],
                    values = scales::rescale(clint[1:(length(clint)) ], c(0,1)),
                    guide = guide_coloursteps(even.steps = FALSE,
                                              show.limits = FALSE,
                                              title = NULL,
                                              barheight = unit(2.0, "in"),
                                              barwidth = unit(0.15, "in"),
                                              label.position = "left"))

p2<- ggplot(ma, aes(house_inc)) + geom_boxplot() +
  scale_x_continuous(breaks = clint[1:length(clint)],
                     labels = clint[1:length(clint)],
                     position = "top") +
                                 coord_flip() +
  scale_y_continuous(labels = NULL, breaks = NULL) + xlab("") +
   theme(plot.margin = margin(0.22, 0.0, 0.15, 0.1, "in"),
         axis.text = element_text(colour = "grey"))
grid.arrange(p1,p2, layout_matrix=rbind(c(1,1,1,2)))
```

Visually, the IQR category benefits from being assigned a darker hue to distinguish it from the lighter tones used for the upper and lower extremes. This design choice helps draw attention to the central portion of the data while still acknowledging regions with higher and lower values.

### Standard deviation map

When a dataset approximates a normal (bell-shaped) distribution, classification based on standard deviation units can be a powerful way to highlight how values deviate from the mean. In a **standard deviation map**, class breaks are defined at regular intervals above and below the **mean**--typically at ±1, ±2, and ±3 standard deviations. This creates a symmetrical classification scheme centered on the average value.

Each class represents a specific range of deviation from the mean, making it easy to identify which regions fall within the expected range and which stand out as unusually high or low. For example, areas within one standard deviation of the mean might be considered typical, while those beyond two or three standard deviations may be flagged as exceptional.

This method is particularly useful when the data are approximately normally distributed and when the goal is to emphasize variation relative to the average. A divergent color scheme is typically used, with a neutral color at the mean and increasingly intense hues in opposite directions to represent values above and below the mean.

This method is particularly useful when the data are approximately normally distributed and when the goal is to emphasize variation relative to the average. A divergent color scheme is typically used, with a neutral color at the mean and increasingly intense hues in opposite directions to represent values above and below the mean.

```{r sdmap, fig.width = 6.5, fig.height = 2.56, echo = FALSE, fig.cap = "Example of a standard deviation map.", fig.align='center'}
inc_sd <- sd(ma$house_inc)
inc_mean <- mean(ma$house_inc)
clint <- c(inc_mean - 3 * inc_sd,  inc_mean - 2 * inc_sd, inc_mean - inc_sd, inc_mean,
           inc_mean + inc_sd, inc_mean + 2 * inc_sd, inc_mean + 3 * inc_sd )

p1 <- ggplot(ma, aes(fill=house_inc)) + geom_sf() + theme_void() +
  scale_fill_stepsn(colors = c("#1A9850", "#91CF60", "#D9EF8B",
                               "#FEE08B", "#FC8D59", "#D73027") ,
                    breaks = clint[1:(length(clint)) ],
                    values = scales::rescale(clint[1:(length(clint)-1) ], c(0,1)),
                    limits = range(clint),
                    guide = guide_coloursteps(even.steps = FALSE,
                                              show.limits = FALSE,
                                              title = NULL,
                                              barheight = unit(2.0, "in"),
                                              barwidth = unit(0.15, "in"),
                                              label.position = "left"))

p2<- ggplot(ma, aes(house_inc)) +
  geom_histogram(aes(y = after_stat(density)), bins = 10, fill = "grey70") +
  coord_flip() +
  stat_function(fun = dnorm, args = list(mean = inc_mean, sd = inc_sd),
                colour = "blue") +
  scale_x_continuous(breaks = clint,
                     labels = c("-3SD", "-2SD" , "-1SD", "Mean", "1SD", "2SD", "3SD"),
                     position = "top", limits = range(clint)) +
  scale_y_continuous(labels = NULL, breaks = NULL) +
  ylab(NULL) + xlab(NULL) +
  theme(plot.margin = margin(0.24, 0, 0.18, 0.1, "in"),
        panel.grid.minor=element_blank(),
        axis.text = element_text(colour = "grey", size=9))

grid.arrange(p1,p2, layout_matrix=rbind(c(1,1,1,2)))
```

However, caution is warranted when applying this method to skewed data (as seems to the be the case in the working example). If the distribution is not symmetrical, the resulting map may misrepresent the data by assigning more polygons to one side of the mean. In such cases, the visual balance of the map may not reflect the actual distribution of values.

### Outlier maps

While previous classification schemes aim to represent the full range or central tendencies of a dataset, **outlier maps** focus specifically on identifying and emphasizing extreme values--those that fall significantly above or below the bulk of the distribution. These maps are particularly useful when the goal is to highlight regions that deviate sharply from expected norms, such as areas with unusually high income, low population density, or elevated disease rates.

Outliers can be defined in several ways, depending on the statistical framework used. Examples of **boxplot outlier map**, **standard deviation outlier map** and **quantile otulier map** follow.

#### Boxplot outlier map

A boxplot outlier map identifies values that fall outside the whiskers of a boxplot--typically 1.5 times the interquartile range (IQR) above the lower and upper quartiles. These regions are often assigned darker hues, while all other values are grouped into a single lighter category to draw attention to the extremes.

```{r outlier1, fig.width = 6.5, fig.height = 2.56, echo = FALSE, fig.cap = "Example of a boxplot outlier choropleth map.", fig.align='center'}
clint <- classIntervals(ma$house_inc, style = "box")$brks

p1 <- ggplot(ma, aes(fill=house_inc)) + geom_sf(colour = "grey70") + theme_void() +
  scale_fill_stepsn(colors = c("#1A9850", "#f7f7f7"  , "#D73027") ,
                    breaks = clint[ c(2, length(clint) -1, length(clint)) ],
                    values = scales::rescale(clint[ c(1,2, length(clint) -1, length(clint)) ], c(0,1)),
                    guide = guide_coloursteps(even.steps = FALSE,
                                              show.limits = FALSE,
                                              title = NULL,
                                              barheight = unit(2.0, "in"),
                                              barwidth = unit(0.15, "in"),
                                              label.position = "left"))

p2<- ggplot(ma, aes(house_inc)) + geom_boxplot() +
  scale_x_continuous(breaks = clint[c(1, 2, length(clint) -1,  length(clint))],
                     labels = clint[c(1, 2, length(clint) -1,  length(clint))],
                     position = "top") +
                                 coord_flip() +
  scale_y_continuous(labels = NULL, breaks = NULL) + xlab("") +
   theme(plot.margin = margin(0.22, 0.0, 0.15, 0.1, "in"),
         axis.text = element_text(colour = "grey"))
grid.arrange(p1,p2, layout_matrix=rbind(c(1,1,1,2)))
```

You'll note the asymmetrical distribution of outliers with a little over a dozen regions with unusually high income values and just one region with unusually low income values.

#### standard deviation outlier map

A standard deviation outlier map flags values beyond ±2 standard deviations from the mean. If the data follow a normal distribution, this corresponds to roughly the top and bottom 2.5 percent of observations. These outliers are visually emphasized using a divergent color scheme, often with neutral tones for typical values and saturated colors for extremes.

```{r outlier2, fig.width = 6.5, fig.height = 2.56, echo = FALSE, fig.cap = "Example of a standard deviation outlier choropleth map.", fig.align='center'}
inc_sd <- sd(ma$house_inc)
inc_mean <- mean(ma$house_inc)
clint <- c(inc_mean - 3 * inc_sd, inc_mean - 2 * inc_sd, inc_mean + 2 * inc_sd, inc_mean + 3 * inc_sd)

p1 <- ggplot(ma, aes(fill=house_inc)) + geom_sf() + theme_void() +
  scale_fill_stepsn(colors = c("#1A9850", "#f7f7f7"  , "#D73027") ,
                    breaks = clint[1:(length(clint)) ],
                    values = scales::rescale(clint[1:(length(clint)) ], c(0,1)),
                    limits = range(clint),
                    guide = guide_coloursteps(even.steps = FALSE,
                                              show.limits = FALSE,
                                              title = NULL,
                                              barheight = unit(2.0, "in"),
                                              barwidth = unit(0.15, "in"),
                                              label.position = "left"))

p2<- ggplot(ma, aes(house_inc)) +
  geom_histogram(aes(y = after_stat(density)), bins = 10, fill = "grey70") +
  coord_flip() +
  stat_function(fun = dnorm, args = list(mean = inc_mean, sd = inc_sd),
                colour = "blue") +
  scale_x_continuous(breaks = clint,
                     labels = c("-Inf", "-2SD" , "2SD", "Inf"),
                     position = "top", limits = range(clint)) +
  scale_y_continuous(labels = NULL, breaks = NULL) +
  ylab(NULL) + xlab(NULL) +
  theme(plot.margin = margin(0.24, 0, 0.18, 0.1, "in"),
        panel.grid.minor=element_blank(),
        axis.text = element_text(colour = "grey", size=9))

grid.arrange(p1,p2, layout_matrix=rbind(c(1,1,1,2)))
```

#### Quantile otulier map

A quantile outlier map defines outliers based on percentile thresholds. For instance, the top and bottom 2.5% of values can be isolated by dividing the data into 40 quantiles and mapping only the outermost ones. This method is especially useful when the data distribution is skewed or non-normal.

```{r outlier3, fig.width = 7.3, fig.height = 2.56, echo = FALSE, fig.cap = "Example of a quantile outlier choropleth map where the top and bottom 2.5% regions are characterized as outliers.", fig.align='center'}
clint <- classIntervals(ma$house_inc, style = "quantile", n = 40)$brks
clint <- clint[c(1,2,40,41)]
p1 <- ggplot(ma, aes(fill=house_inc)) + geom_sf() + theme_void() +
  scale_fill_stepsn(colors = c("#1A9850", "#f7f7f7"  , "#D73027") ,
                    breaks = clint[1:(length(clint)) ],
                    values = scales::rescale(clint[1:(length(clint)) ], c(0,1)),
                    guide = guide_coloursteps(even.steps = FALSE,
                                              show.limits = FALSE,
                                              title = NULL,
                                              barheight = unit(2.0, "in"),
                                              barwidth = unit(0.15, "in"),
                                              label.position = "left"))

# p2<- ggplot(ma, aes(house_inc)) + stat_ecdf() + coord_flip() +
#   scale_x_continuous(breaks = clint, labels = clint, position = "top") +
#   scale_y_continuous(breaks = c(0.05,0.95), labels = sprintf("%1.0f%%", c(0.025,0.975)*100)) +
#   ylab(NULL) + xlab(NULL) +
#   theme(plot.margin = margin(0.23, 0, 0 ,0.1, "in"),
#         panel.grid.minor=element_blank(),
#         axis.text = element_text(colour = "grey", size=8))

p2<-  ggplot(ma, aes(house_inc)) + geom_histogram(breaks = c(0,50505.85, 194872.45, 250001), col = "white") +
  scale_x_continuous(breaks = c(50505.85, 194872.45),position = "top",
                      labels = sprintf("%1.1f%%", c(0.025,0.975)*100)) +
  coord_flip() + scale_y_continuous(labels = NULL, breaks = NULL) +
  ylab(NULL) +xlab(NULL) +
  theme(plot.margin = margin(0.17,0,0.1,0.1, "in"),
        axis.text = element_text(colour = "grey"))

#grid.arrange(p1,p2, layout_matrix=rbind(c(1,1,1,1,1,1,2,2,2)))
grid.arrange(p1,p2, layout_matrix=rbind(c(1,1,1,2)))
```

## Mapping uncertainty

Many spatial datasets--particularly those derived from surveys like the U.S. Census Bureau’s American Community Survey (ACS)--are not direct measurements but estimates accompanied by a measure of uncertainty. This uncertainty is often expressed as a **margin of error (MoE)** or a **standard error (SE)**, which reflects the confidence we have in the reported values. For example, the ACS uses a 90% confidence interval, meaning there is a 90% chance that the true value lies within the reported range.

Mapping such data presents a challenge: how do we visualize both the estimate and its uncertainty in a way that supports meaningful spatial interpretation?

One common approach is to display side-by-side maps—one showing the estimated values and another showing the associated SE or MoE.

```{r f07-map1, message=FALSE,warning=FALSE,echo=FALSE,fig.width = 7, fig.height = 3, fig.fullwidth = TRUE, fig.cap = "Maps of income estimates (left) and associated standard errors (right).", fig.align='center'}
library(RColorBrewer)
brks1 <- quantile(s1$Income, seq(0,1,0.2))
brks1[length(brks1)] <- brks1[length(brks1)] + 1
brks2 <- quantile(s1$IncomeSE, seq(0,1,0.2))
brks2[length(brks2)] <- brks2[length(brks2)] + 1
P1 <- raster::spplot(s1, "Income", at=brks1, col.regions=brewer.pal(7,"Greens"))
P2 <- raster::spplot(s1, "IncomeSE", at=brks2, col.regions=brewer.pal(7,"Reds"))

print(P1, split=c(1, 1, 2, 1), more=TRUE)
print(P2, split=c(2, 1, 2, 1), more=FALSE)
```

An alternative is to overlay uncertainty directly onto the estimate map using textures or hatch marks. For example, a map of income estimates might use shades of green to represent income levels, with different hatch patterns indicating the degree of uncertainty. This approach allows viewers to assess both value and reliability simultaneously.

```{r f07-map2, echo=FALSE, fig.cap = "Map of estimated income (in shades of green) superimposed with different hash marks representing the ranges of income SE.", out.width=400, fig.align='center'}

knitr::include_graphics("img/Income_and_uncertainty.jpg")
```

Another technique involves mapping the upper and lower bounds of the confidence interval as separate maps. This can help visualize the full range of possible values but may still suffer from the same interpretive challenges as side-by-side maps.

```{r f07-map-MoE, message=FALSE,warning=FALSE,echo=FALSE,fig.width = 7, fig.height = 3, fig.fullwidth = TRUE, fig.cap = "Maps of top end of 90 percent income estimate (left) and bottom end of 90 percent income estimate (right).", fig.align='center'}
library(RColorBrewer)
s1$IncMax <- s1$Income + 1.645 * s1$IncomeSE
s1$IncMin <- s1$Income - 1.645 * s1$IncomeSE
brks <- quantile(c(s1$IncMax, s1$IncMin), seq(0,1,0.2))
brks[length(brks)] <- brks[length(brks)] + 1
P1 <- raster::spplot(s1, "IncMax", at=brks, col.regions=brewer.pal(7,"Greens"))
P2 <- raster::spplot(s1, "IncMin", at=brks, col.regions=brewer.pal(7,"Greens"))

print(P1, split=c(1, 1, 2, 1), more=TRUE)
print(P2, split=c(2, 1, 2, 1), more=FALSE)
```

### Assessing confidence in spatial patterns

While the maps presented earlier in this chapter offer ways to visualize uncertainty—such as margins of error or standard errors—they do not fully address a key reason we map data in the first place: to compare values across space. In spatial analysis, we are often interested in identifying patterns of high or low values and ranking regions accordingly. However, these comparisons assume that the observed estimates are stable and that their relative order will persist across samples. This assumption is problematic.

To illustrate this, we begin by examining confidence interval plots for each polygon. These plots reveal that many regions have overlapping intervals, meaning that their true values could plausibly fall above or below those of neighboring regions. For example, a county that appears to have a lower income than its neighbor may, in fact, have a higher income if a different sample were taken. This overlap undermines the reliability of spatial rankings and calls into question the robustness of apparent patterns.

```{r f07-MoE-plot1, message=FALSE,warning=FALSE,echo=FALSE,fig.width = 5, fig.height = 3.0, fig.fullwidth = FALSE, fig.cap = "Income estimates by county with 90 percent confidence interval. Note that many counties have overlapping estimate ranges.", fig.align='center'}
library(gplots)
# Sort data by INCOME
Y   = s1$Income[order(s1$Income)]  # per capita income
YSE = s1$IncomeSE[order(s1$Income)] # SE
lbs = s1$NAME[order(s1$Income)]

# Plot the estimate along with the MoE
OP <- par(mar=c(3,7,0,1))
plotCI(Y,1:16, ui= Y+(1.645*YSE), li=(Y-1.645*YSE),pch=16, lwd=1, barcol="red",sfrac=.005, err="x", col="grey50",
       ylab = "", xlab="", axes=FALSE, gap=0.5)
axis(1, cex.axis=0.8)
axis(2, at=1:16, labels=lbs,las=2,cex.axis=0.8)
mtext("Income ($)", side= 1,line=2)
par(OP)
```

Consider, for instance, Piscataquis County, whose income estimate (represented by the gray point in the plot) appears lower than that of neighboring Oxford County. At first glance, this suggests a clear ranking between the two. However, when we examine their confidence intervals, we see substantial overlap—indicating that this apparent difference may not be statistically reliable. If a new sample were drawn from each county, the resulting estimates could easily shift, potentially placing Piscataquis above Oxford in income rankings. The following example illustrates how such reversals can occur when uncertainty is taken into account.


```{r f07-sim-values, message=FALSE,warning=FALSE,echo=FALSE,fig.width = 5, fig.height = 3.0, fig.fullwidth = FALSE, fig.cap = "Example of income estimates one could expect to sample based on the 90 percent confidence interval shown in the previous plot.", fig.align='center'}

# This function creates a normal distribution that is capped
# at the lower limit by 0 or X - SE * number of SE and at the
# upper limit by X + SE * number of SE
rnorml <- function(x,se,numse) {  # numse is the number of SEs
  rx = rep(-1,length(x))          # Initialize rx
  ri = rx < 0 | rx < x - (numse * se) | rx > x + (numse * se)
  # Recalculate rnorm for all rx values outside of the limit
  while( length(ri[ri==TRUE]) > 0){
    rx[ri] = rnorm(length(ri[ri==TRUE]),x[ri],se[ri])
    ri = rx < 0 | rx < x - (numse * se) | rx > x + (numse * se)
  }
  return(rx)
}

library(gplots)
# Sort data by INCOME
set.seed(31)
Yrnd = rnorml(Y, YSE, 1.645)

# Plot the estimate along with the MoE
OP <- par(mar=c(3,6,0,1))
plotCI(Yrnd,1:16, ui= Yrnd+(1.645*YSE), li=(Yrnd-1.645*YSE),pch=16, lwd=2, barcol="white",sfrac=.005, err="x", col="grey50",
       ylab = "", xlab="", axes=FALSE, gap=0.5)
axis(1, cex.axis=0.8)
axis(2, at=1:16, labels=lbs,las=2, cex.axis=0.8)
mtext("Income ($)", side= 1,line=2)
par(OP)
```

In one such simulated sample, Oxford County’s income estimate drops below that of both Piscataquis and Franklin counties—reversing the original ranking. A similar shift is observed for Sagadahoc County, which falls behind two other counties, Hancock and Lincoln. These changes underscore how uncertainty can affect not just individual estimates, but the broader spatial hierarchy we infer from mapped data. What appears to be a clear pattern in the original map may, in fact, be a fragile construct shaped by sampling variability.

How do the spatial patterns in the **original** estimated income map hold up when uncertainty is introduced? By comparing it with a **simulated** income map--generated from values sampled within each county’s confidence interval--we can begin to assess the stability of observed rankings and the reliability of apparent spatial patterns.

```{r f07-sim-map, message=FALSE,warning=FALSE,echo=FALSE,fig.width = 5, fig.height = 3.0, fig.fullwidth = TRUE, fig.cap = "Original income estimate (left) and realization of a simulated sample (right).", fig.align='center'}
library(RColorBrewer)
s1$R1 <- Yrnd
brks <- quantile(c(s1$Income, s1$R1), seq(0,1,0.2))
brks[length(brks)] <- brks[length(brks)] + 1
P1 <- raster::spplot(s1, "Income", at=brks, col.regions=brewer.pal(7,"Greens"))
P2 <- raster::spplot(s1, "R1", at=brks, col.regions=brewer.pal(7,"Greens"))

print(P1, split=c(1, 1, 2, 1), more=TRUE)
print(P2, split=c(2, 1, 2, 1), more=FALSE)
```

A few more simulated samples (using the 90% confidence interval) are shown below:

```{r f07-5sim-maps, message=FALSE,warning=FALSE,echo=FALSE,fig.width = 10, fig.height = 3.1, fig.fullwidth = TRUE, fig.cap = "Original income estimate (left) and realizations from simulated samples (R2 through R5).", fig.align='center'}

set.seed(421); s1$R2 <- rnorml(Y, YSE, 1.645)
set.seed(1231); s1$R3 <- rnorml(Y, YSE, 1.645)
set.seed(326); s1$R4 <- rnorml(Y, YSE, 1.645)
set.seed(5441); s1$R5 <- rnorml(Y, YSE, 1.645)
brks <- quantile(c(s1$R1,s1$R2,s1$R3,s1$R4,s1$R5), seq(0,1,0.2))
brks[length(brks)] <- brks[length(brks)] + 1
raster::spplot(s1, c("R1","R2","R3","R4","R5"),at=brks, col.regions=brewer.pal(7,"Greens"))
```

### Class comparison maps

```{r echo=FALSE, fig.align='center'}
brks <- c(0,20600, 22800,25000,27000,34000)
```

Effectively conveying both estimates and their associated uncertainty in a single map remains a challenge. As Sun and Wong [@DataQuality2010] note, the appropriate strategy often depends on the context and purpose of the analysis. One useful approach is the **class comparison method**, which evaluates whether a polygon’s margin of error (MoE) extends beyond the boundaries of its assigned classification. In this method, the map displays not only the estimated value but also whether the confidence interval surrounding that estimate crosses into adjacent classes.

For example, if we adopt the classification breaks [`r sprintf("%i ",brks)`], we find that many polygons have MoEs that span multiple class boundaries.

```{r compInt, message=FALSE, warning=FALSE, echo=FALSE, fig.width = 5.5, fig.height = 3.0, fig.fullwidth = FALSE, fig.cap = "Income estimates by county with 90 percent confidence interval. Note that many of the counties' MoE have ranges that cross into an adjacent class.", fig.align='center'}

# Plot the estimate along with the MoE
OP <- par(mar=c(3,6,0,1))
plotCI(Y,1:16, ui= Y+(1.645*YSE), li=(Y-1.645*YSE),pch=16, lwd=1, barcol="red",
       sfrac=.005, err="x", col="grey50", ylab = "", xlab="", axes=FALSE, gap=0.5,
       xlim = c(19000 , 34100))
axis(1, cex.axis=0.8, at = brks)
axis(2, at=1:16, labels=lbs,las=2,cex.axis=0.8)
mtext("Income ($)", side= 1,line=2)
abline(v=brks, col=rgb(0,.6,0), lty = 3)
par(OP)
```

Take Piscataquis county, for example. Its estimate is assigned the second classification break (`r sprintf("%i",brks[2])` to `r sprintf("%i ",brks[3])`), yet its lower confidence interval stretches into the first classification break indicating that we cannot be 90% confident that the estimate is assigned the proper class (i.e. its true value could fall into the first class). Other counties such as Cumberland and Penobscot don't have that problem since their 90% confidence intervals fall inside the classification breaks.

This information can be mapped as a hatch mark overlay. For example, income could be plotted using varying shades of green with hatch symbols indicating if the lower interval crosses into a lower class (135&deg; hatch), if the upper interval crosses into an upper class (45&deg; hatch), if both interval ends cross into a different class (90&deg;-vertical-hatch) or if both interval ends remain inside the estimate's class (no hatch).

```{r ComPlot, message=FALSE, warning=FALSE, echo=FALSE, fig.height = 3.5, fig.margin = TRUE, fig.cap = "Plot of income with class comparison hatches.", fig.align='center'}

IncInt <- findInterval(s1$Income, brks)
LowInt <- findInterval(s1$Income - 1.645 * s1$IncomeSE, brks )
HiInt <- findInterval(s1$Income + 1.645 * s1$IncomeSE, brks )
s1$Comp <- 1 # Both MoE ends are in the same class as estimate
s1$Comp[IncInt > LowInt] <- 2 # lower  MoE end is in a class below that of the estimate
s1$Comp[IncInt > LowInt & IncInt < HiInt] <- 3 # lower  MoE end is in a class below that of the estimate
s1$Comp[IncInt < HiInt]  <- 4 # upper  MoE end is in a class above that of the estimate

color <- brewer.pal(7,"Greens")
ang <- (0:3) * 45
dens <- c(0,10,10, 10)
OP <- par(mar=c(0,0,0,0))
sp::plot(s1, col = color[findInterval(s1$Income, brks)])
sp::plot(s1, density = dens[s1$Comp], angle = ang[s1$Comp], add=TRUE)
par(OP)
```

## Summary

This chapter explores statistical approaches to mapping continuous spatial data, focusing on how classification schemes influence the interpretation of choropleth maps. Building on the visual principles introduced in the previous chapter, it introduces methods such as equal interval, quantile, boxplot, interquartile range (IQR), and standard deviation classifications. Each technique offers a different lens for revealing spatial patterns and understanding data distributions.

The chapter also introduces outlier maps, which emphasize extreme values using statistical definitions derived from boxplots, standard deviations, or quantiles. These maps are particularly useful for identifying regions that deviate sharply from the norm.

Mapping uncertainty (especially in datasets derived from surveys) is also addressed. Techniques such as confidence interval plots, simulated sample maps, and class comparison overlays are used to assess the reliability of spatial rankings and classifications. These methods highlight the fragility of apparent patterns and promote more cautious, statistically informed interpretations of mapped data.