Chapter 25 Area Data IV

NOTE: The source files for this book are available with companion package {isdas}. The source files are in Rmarkdown format and packed as templates. These files allow you execute code within the notebook, so that you can work interactively with the notes.

25.1 Learning objectives

In the previous practice/session, you learned about the concept of spatial autocorrelation, and how it can be used to evaluate statistical maps when searching for patterns. We also introduced Moran’s \(I\) coefficient, one of the most widely used tools to measure spatial autocorrelation.

In this practice, you will learn about:

  1. Decomposing Moran’s \(I\).
  2. Local Moran’s \(I\) and mapping.
  3. A concentration approach for local analysis of spatial association.
  4. A short note on hypothesis testing.
  5. Detection of hot and cold spots.

25.2 Suggested readings

  • Bailey TC and Gatrell AC (1995) Interactive Spatial Data Analysis, Chapter 7. Longman: Essex.
  • Bivand RS, Pebesma E, and Gomez-Rubio V (2008) Applied Spatial Data Analysis with R, Chapter 9. Springer: New York.
  • Brunsdon C and Comber L (2015) An Introduction to R for Spatial Analysis and Mapping, Chapter 7. Sage: Los Angeles.
  • O’Sullivan D and Unwin D (2010) Geographic Information Analysis, 2nd Edition, Chapter 7. John Wiley & Sons: New Jersey.

25.3 Preliminaries

As usual, it is good practice to clear the working space to make sure that you do not have extraneous items there when you begin your work. The command in R to clear the workspace is rm (for “remove”), followed by a list of items to be removed. To clear the workspace from all objects, do the following:

rm(list = ls())

Note that ls() lists all objects currently on the workspace.

Load the libraries you will use in this activity:

library(crosstalk)
library(isdas)
library(plotly)
library(sf)
library(spdep)
library(tidyverse)

Load the datasets:

data("df1_simulated")
data("df2_simulated")

These two dataframes are simulated landscapes, one completely random and another stochastic with a strong systematic pattern. Note that the descriptive statistics of both variables are identical.:

summary(df1_simulated)
##        x               y               z        
##  Min.   : 1.00   Min.   : 1.00   Min.   :24.40  
##  1st Qu.:27.00   1st Qu.:19.00   1st Qu.:27.89  
##  Median :46.50   Median :33.00   Median :30.33  
##  Mean   :45.61   Mean   :31.63   Mean   :34.38  
##  3rd Qu.:66.00   3rd Qu.:45.00   3rd Qu.:38.25  
##  Max.   :87.00   Max.   :61.00   Max.   :69.59
summary(df2_simulated)
##        x               y               z        
##  Min.   : 1.00   Min.   : 1.00   Min.   :24.40  
##  1st Qu.:27.00   1st Qu.:19.00   1st Qu.:27.89  
##  Median :46.50   Median :33.00   Median :30.33  
##  Mean   :45.61   Mean   :31.63   Mean   :34.38  
##  3rd Qu.:66.00   3rd Qu.:45.00   3rd Qu.:38.25  
##  Max.   :87.00   Max.   :61.00   Max.   :69.59

The third dataset is an object of class sf (simple feature) with the census tracts of Hamilton CMA and some selected population variables from the 2011 Census of Canada:

data(Hamilton_CT)

25.4 Decomposing Moran’s \(I\)

Here we will revisit Moran’s \(I\) coefficient to see how its utility for the exploration of spatial patterns can be extended. Recall from the preceding reading and activity that this coefficient of spatial autocorrelation was derived based on the idea of aggregating the products of a (mean-centered) variable by its spatial moving average, and then dividing by the variance: \[ I = \frac{\sum_{i=1}^n{z_i\sum_{j=1}^n{w_{ij}^{st}z_j}}}{\sum_{i=1}^{n}{z_i^2}} \]

Also, remember that when plotting Moran’s scatterplot using moran.plot() some observations were highlighted. To see this, we will recreate the plot, for which we need a set of spatial weights:

Hamilton_CT.w <- nb2listw(poly2nb(pl = Hamilton_CT))

And here is the scatterplot of population density again:

# We can use the arguments xlab and ylab in `moran.plot()` to change the labels for the two axes of the plot
mp <- moran.plot(Hamilton_CT$POP_DENSITY, Hamilton_CT.w, xlab = "Population Density", ylab = "Lagged Population Density")

The reason some observations are highlighted is because they have been identified as “influential”, meaning that they make a particularly large contribution to the calculation of \(I\). It turns out that the relative contribution of each observation to the calculation of Moran’s \(I\) is informative in and of itself, and its analysis can provide more focused information about the spatial pattern.

To explore this, we will recreate the scatterplot manually to have better control of its aspect. To do this, we first create a dataframe with the mean-centered and scaled variable \(z_i=(x_i-\overline{x})/\sum z_i^2\), and its spatial moving average. We will also create a factor variable (call it Type) to identify the type of spatial relationship (Low & Low, if both \(z_i\) and its spatial moving average are negative, High & High, if both \(z_i\) and its spatial moving average are positive, and Low & High/High & Low otherwise). This is information is useful for mapping the results:

Hamilton_CT <- Hamilton_CT %>% # Use the pipe operator to pass the dataframe as an argument to `mutate()`, which is used to create new variables.
  mutate(Z = (POP_DENSITY - mean(POP_DENSITY)) / var(POP_DENSITY), # Create a mean-centered variable that is standardized by the variance.
         SMA = lag.listw(Hamilton_CT.w, Z), # Calculate the spatial moving average of variable `Z`.
         # The function `case_when()` is used to evaluate several logical conditions and respond to them. 
         Type = case_when(Z < 0 & SMA < 0 ~ "LL",
                          Z > 0 & SMA > 0 ~ "HH",
                          TRUE ~ "HL/LH"))

Next, we will create the scatterplot and a choropleth map of the population density. The package plotly is used to create interactive plots. Read more about how to visualize geospatial information with plotly here. The package crosstalk allows us to link two plots for brushing (brushing is a visualization technique that links several plots in a dynamic way to highlight some elements of interest).

To create an interactive plot for linking and brushing we first, create a SharedData object to link two plots:

# Create a shared data object for brushing.
df_msc.sd <- SharedData$new(Hamilton_CT)

The function bscols() (for bootstrap columns) is used to array two plotly objects; the first of these is a scatterplot, and the second is a choropleth map of population density.

bscols(
  # The first plot is Moran's scatterplot
  plot_ly(df_msc.sd) %>% # Create a `plotly` object using the dataframe as an input. The pipe operator passes this object to the function `add_markers()`; this function is similar to the `geom_point()` function in `ggplot2` and it draws objects on the blank plot created by `plot_ly()`
    add_markers(x = ~Z, y = ~SMA, color = ~POP_DENSITY, size = ~(Z * SMA), colors = "YlOrRd") %>%
    hide_colorbar() %>%     # Remove the colorbar from the plot.
    highlight("plotly_selected"), # Highlight observations when selected.
  # The second plot is a choropleth map
  plot_ly(df_msc.sd) %>% # Create a `plotly` object using the dataframe as an input. The pipe operator passes this object to the function `add_sf()`; this function is similar to the `geom_sf()` functions in `ggplot2` and it draws a simple features object on the blank plot created by `plot_ly()`
    add_sf(split = ~TRACT, color = ~POP_DENSITY, colors = "YlOrRd", showlegend = FALSE) %>%
    hide_colorbar() %>% # Remove colorbar from the plot.
    highlight(dynamic = TRUE) # Highlight observations when selected.
)