Chapter 37 Spatially Continuous Data IV

NOTE: The source files for this book are available with companion package {isdas}. The source files are in Rmarkdown format and packed as templates. These files allow you execute code within the notebook, so that you can work interactively with the notes.

37.1 Learning objectives

In the previous practice you were introduced to the concept of variographic analysis for fields/spatially continuous data. In this practice, we will learn:

  1. How to use residual spatial pattern to estimate prediction errors.
  2. Kriging: a method for optimal predictions.

37.2 Suggested reading

  • Bailey TC and Gatrell AC (1995) Interactive Spatial Data Analysis, Chapters 5 and 6. Longman: Essex.
  • Bivand RS, Pebesma E, and Gomez-Rubio V (2008) Applied Spatial Data Analysis with R, Chapter 8. Springer: New York.
  • Brunsdon C and Comber L (2015) An Introduction to R for Spatial Analysis and Mapping, Chapter 6, Sections 6.7 and 6.8. Sage: Los Angeles.
  • Isaaks EH and Srivastava RM (1989) An Introduction to Applied Geostatistics, Chapter 12. Oxford University Press: Oxford.
  • O’Sullivan D and Unwin D (2010) Geographic Information Analysis, 2nd Edition, Chapters 9 and 10. John Wiley & Sons: New Jersey.

37.3 Preliminaries

As usual, it is good practice to clear the working space to make sure that you do not have extraneous items there when you begin your work. The command in R to clear the workspace is rm (for “remove”), followed by a list of items to be removed. To clear the workspace from all objects, do the following:

rm(list = ls())

Note that ls() lists all objects currently on the workspace.

Load the libraries you will use in this activity:

library(isdas)
library(gstat)
library(plotly)
library(spdep)
library(tidyverse)
library(stars)

Begin by loading the data file:

data("Walker_Lake")

You can verify the contents of the dataframe:

summary(Walker_Lake)
##       ID                  X               Y               V         
##  Length:470         Min.   :  8.0   Min.   :  8.0   Min.   :   0.0  
##  Class :character   1st Qu.: 51.0   1st Qu.: 80.0   1st Qu.: 182.0  
##  Mode  :character   Median : 89.0   Median :139.5   Median : 425.2  
##                     Mean   :111.1   Mean   :141.3   Mean   : 435.4  
##                     3rd Qu.:170.0   3rd Qu.:208.0   3rd Qu.: 644.4  
##                     Max.   :251.0   Max.   :291.0   Max.   :1528.1  
##                                                                     
##        U           T      
##  Min.   :   0.00   1: 45  
##  1st Qu.:  83.95   2:425  
##  Median : 335.00          
##  Mean   : 613.27          
##  3rd Qu.: 883.20          
##  Max.   :5190.10          
##  NA's   :195

37.4 Using residual spatial pattern to estimate prediction errors

Previously, in Chapter @ref{spatially-continuous-data-ii} we discussed how to interpolate a field using trend surface analysis; we also saw how that method may lead to residuals that are not spatially independent.

The implication of non-random residuals is that there is systematic residual pattern that the model did not capture; This, in turn, means that there is at least some information that can still be extracted from the residuals. Again, we will use the case of Walker Lake to explore one way to do this.

As before, we first calculate the polynomial terms of the coordinates to fit a trend surface to the data:

Walker_Lake <- mutate(Walker_Lake,
                      X3 = X^3, X2Y = X^2 * Y, X2 = X^2, 
                      XY = X * Y,
                      Y2 = Y^2, XY2 = X * Y^2, Y3 = Y^3)

Given the polynomial expansion, we can proceed to estimate the following cubic trend surface model, which we already know provided the best fit to the data:

WL.trend3 <- lm(formula = V ~ X3 + X2Y + X2 + X + XY + Y + Y2 + XY2 + Y3, 
                data = Walker_Lake)
summary(WL.trend3)
## 
## Call:
## lm(formula = V ~ X3 + X2Y + X2 + X + XY + Y + Y2 + XY2 + Y3, 
##     data = Walker_Lake)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -564.19 -197.41    7.91  194.25  929.72 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -8.620e+00  1.227e+02  -0.070 0.944035    
## X3           1.533e-04  4.806e-05   3.190 0.001522 ** 
## X2Y          6.139e-05  3.909e-05   1.570 0.117000    
## X2          -6.651e-02  1.838e-02  -3.618 0.000330 ***
## X            9.172e+00  2.386e+00   3.844 0.000138 ***
## XY          -4.420e-02  1.430e-02  -3.092 0.002110 ** 
## Y            4.794e+00  2.040e+00   2.350 0.019220 *  
## Y2          -1.806e-03  1.327e-02  -0.136 0.891822    
## XY2          7.679e-05  2.956e-05   2.598 0.009669 ** 
## Y3          -4.170e-05  2.819e-05  -1.479 0.139759    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 276.7 on 460 degrees of freedom
## Multiple R-squared:  0.1719, Adjusted R-squared:  0.1557 
## F-statistic: 10.61 on 9 and 460 DF,  p-value: 5.381e-15

We can next visualize the residuals which, as you can see, do not appear to be random

plot_ly(x = ~Walker_Lake$X, 
        y = ~Walker_Lake$Y, 
        z = ~WL.trend3$residuals, 
        color = ~WL.trend3$residuals < 0, 
        colors = c("blue", "red"), 
        type = "scatter3d")
## No scatter3d mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode

Now we will create an interpolation grid:

# The function `sequence()` create a sequence of values from - to 
# using by as the step increment. In this case, we generate a grid
# with points that are 2.5 m apart.
X.p <- seq(from = 0.1, to = 255.1, by = 2.5)
Y.p <- seq(from = 0.1, to = 295.1, by = 2.5)
df.p <- expand.grid(X = X.p, Y = Y.p)

WE can add the polynomial terms to this grid. Since our trend surface model was estimated using the cubic polynomial, we add those terms to the dataframe:

df.p <- mutate(df.p, X3 = X^3, X2Y = X^2 * Y, X2 = X^2, 
               XY = X * Y, 
               Y2 = Y^2, XY2 = X * Y^2, Y3 = Y^3)

The interpolated cubic surface is obtained by using the model and the interpolation grid as newdata:

# The function `predict()` is used to make predictions given a model 
# and a possibly new dataset, different from the one used for estimation 
# of the model.
WL.preds3 <- predict(WL.trend3, 
                     newdata = df.p, 
                     se.fit = TRUE, 
                     interval = "prediction", 
                     level = 0.95)

The surface is converted into a matrix for 3D plotting:

z.p3 <- matrix(data = WL.preds3$fit[,1], 
               nrow = length(Y.p), 
               ncol = length(X.p), 
               byrow = TRUE)

And plot:

WL.plot3 <- plot_ly(x = ~X.p, 
                    y = ~Y.p, 
                    z = ~z.p3, 
                    type = "surface", 
                    colors = "YlOrRd") %>% 
  layout(scene = list(aspectmode = "manual",
                      aspectratio = list(x = 1, 
                                         y = 1,
                                         
                                         z = 1)))
WL.plot3

The trend surface provides a smooth estimate of the field. However, it is not sufficient to capture all systematic variation, and fails to produce random residuals.

A possible way of enhancing this approach to interpolation is to exploit the information that remains in the residuals, for instance by the use of \(k\)-point means.

We can illustrate this as follows. To interpolate the residuals, we first need the set of target points (the points for the interpolation), as well as the source (the observations):

# We will use the prediction grid we used above to interpolate the residuals 
target_xy = expand.grid(x = X.p, 
                        y = Y.p) %>%
  st_as_sf(coords = c("x", "y"))

# Convert the `Walker_Lake` dataframe to a simple features object using as follows:
Walker_Lake.sf <- Walker_Lake %>% 
  st_as_sf(coords = c("X", "Y"))

# Append the residuals to the table
Walker_Lake.sf$residuals <- WL.trend3$residuals

It is possible now to use the kpointmean function to interpolate the residuals, for instance using \(k=5\) neighbors:

kpoint.5 <- kpointmean(source_xy = Walker_Lake.sf, 
                       target_xy = target_xy, 
                       z = residuals, 
                       k = 5)
## projected points

Given the interpolated residuals, we can join them to the cubic trend surface, as follows:

z.p3 <- matrix(data = WL.preds3$fit[,1] + kpoint.5$z,
               nrow = length(Y.p), 
               ncol = length(X.p), 
               byrow = TRUE)

This is now the interpolated field that combines the trend surface and the estimated residuals:

WL.plot3 <- plot_ly(x = ~X.p, 
                    y = ~Y.p, 
                    z = ~z.p3,
                    type = "surface", 
                    colors = "YlOrRd") %>% 
  layout(scene = list(aspectmode = "manual",
                      aspectratio = list(x = 1, 
                                         y = 1, 
                                         z = 1)))
WL.plot3

Of all the approaches that we have seen so far, this is the first that provides a genuine estimate of the following: \[ \hat{z}_p + \hat{\epsilon}_p \]

With trend surface analysis providing a smooth estimator of the underlying field: \[ \hat{z}_p = f(x_p, y_p) \]

And \(k\)-point means providing an estimator of: \[ \hat{\epsilon}_p \]

A question is how to decide the number of neighbors to use in the calculation of the \(k\)-point means. As previously discussed, \(k\)=1 becomes identical to Voronoi polygons, and \(k = n\) becomes the global mean.

A second question concerns the way the average is calculated. As variographic analysis demonstrates, it is possible to estimate the way in which spatial dependence weakens with distance. Why should more distant points be weighted equally? The answer is, there is no reason why they should, and in fact, variographic analysis elegantly solves this, as well the question of how many points to use: all of them, with varying weights.

Next, we will introduce kriging, a method for optimal prediction that is based on the use of variographic analysis.

37.5 Kriging: a method for optimal prediction.

To introduce the method known as kriging, we will begin by positing a situation as follows:

\[ \hat{z}_p + \hat{\epsilon}_p = \hat{f}(x_p, y_p) + \hat{\epsilon}_p \]

where \(\hat{f}(x_p, y_p)\) is a smooth estimator of an underlying field.

We aim to predict \(\hat{\epsilon}_p\) based on the observed residuals. We use an expression similar to the one used for IDW and \(k\)-point means in Chapter @ref{spatially-continuous-data-i} (we will use \(\lambda\) for the weights to avoid confusing the the weights in variographic analysis):

\[ \hat{\epsilon}_p = \sum_{i=1}^n {\lambda_{pi}\epsilon_i} \]

That is, \(\hat{\epsilon}_p\) is a linear combination of the prediction residuals from the trend: \[ \epsilon_i = z_i - \hat{f}(x_i, y_i) \]

It is possible to define the following expected mean squared error, or prediction variance: \[ \sigma_{\epsilon}^2 = E[(\hat{\epsilon}_p - \epsilon_i)^2] \]

The prediction variance measures how close, on average, the prediction error is to the residuals.

The prediction variance can be decomposed as follows: \[ \sigma_{\epsilon}^2 = E[\hat{\epsilon}_p] + E[\epsilon_i] - 2E[\hat{\epsilon_i\epsilon}_p] \]

It turns out (we will not show the detailed derivation, but it can be consulted here), that the expression for the prediction variance depends on the weights: \[ \sigma_{\epsilon}^2 = \sum_{i=1}^n \sum_{j=1}^n{\lambda_{ip}\lambda_{jp}C_{ij}} + \sigma^2 + 2\sum_{i=1}^{n}{\lambda_{ip}C_{ip}} \] where \(C_{ij}\) is the autocovariance between observations at \(i\) and \(j\), and \(C_{ip}\) is the autocovariance between the observation at \(i\) and prediction location \(p\).

Fortunately for us, the semivariogram and the autocovariance is straightforward: \[ C_{z}(h) =\sigma^2 - \hat{\gamma}_{z}(h) \]

This means that, given the distance \(h\) between \(i\) and \(j\), and \(i\) and \(p\), we can use a semivariogram to obtain the autocovariances needed to calculate the prediction variance. We are still missing, however, the weights \(\lambda\), which are not known a priori.

These weights can be obtained if we use the following rules:

The expectation of the prediction errors is zero (unbiassedness) Find the weights \(lambda\) that minimize the prediction variance (optimal estimator).

This makes sense, since we would like our predictions to be unbiased (i.e., accurate) and as precise as possible, that is, to have the smallest variance (recall the discussion about accuracy and precision in Chapter @ref{spatially-continuous-data-iii}).

Again, solving the minimization problem is beyond the scope of our presentation, but it suffices to say that the result is as follows:

\[ \mathbf{\lambda}_p = \mathbf{C}^{-1}\mathit{c}_{p} \]

where \(\mathbf{C}\) is the covariance matrix, and \(\mathit{c}_{p}\) is the covariance vector for location \(p\).

In summary, kriging is a method to optimally estimate the value of a variable at \(p\) as a weighted sum of the observations of the same variable at locations \(i\). This method is known to have the properties of Best (in the sense that it minimizes the variance) Linear (because predictions are a linear combination of weights) Unbiased (since the estimators of the prediction errors are zero) Predictor, or BLUP.

Kriging is implemented in the package gstat as follows.

To put kriging to work we must first conduct variographic analysis of the residuals. The function variogram uses as an argument a simple features object that we can create as follows:

Walker_Lake.sf <- Walker_Lake %>%
  st_as_sf(coords = c("X", "Y"),
           # Remove set to false to retain the X and Y coordinates 
           # in the dataframe after they are converted to simple features
           remove = FALSE) 

The variogram of the residuals can be obtained by specifying a trend surface in the formula:

variogram_v <- variogram(V ~ X3 + X2Y + X2 + X + XY + Y + Y2 + XY2 + Y3, 
                         data = Walker_Lake.sf)

# Plot 
ggplot(data = variogram_v, 
       aes(x = dist, 
           y = gamma)) +
  geom_point() + 
  geom_text(aes(label = np), 
            # Nudge the labels away from the points
            nudge_y = -1500) +
  xlab("Distance") +
  ylab("Semivariance")

You can verify that the semivariogram above corresponds to the residuals by repeating the analysis directly on the residuals. First join the residuals to the SpatialPointsDataFrame:

Walker_Lake.sf$e <- WL.trend3$residuals

And then calculate the semivariogram and plot:

variogram_e <- variogram(e ~ 1, 
                         data = Walker_Lake.sf)

# Plot 
ggplot(data = variogram_e, 
       aes(x = dist, 
           y = gamma)) +
  geom_point() + 
  geom_text(aes(label = np), 
            nudge_y = -1500) +
  xlab("Distance") + 
  ylab("Semivariance")

The empirical semivariogram is used to estimate a semivariogram function:

variogram_v.t <- fit.variogram(variogram_v, model = vgm("Exp", "Sph", "Gau"))
variogram_v.t
##   model   psill    range
## 1   Nug     0.0 0.000000
## 2   Exp 85554.4 9.910429

The variogram function plots as follows:

gamma.t <- variogramLine(variogram_v.t, maxdist = 130)

# Plot
ggplot(data = variogram_v,
       aes(x = dist, 
           y = gamma)) +
  geom_point(size = 3) + 
  geom_line(data = gamma.t,
            aes(x = dist, 
                y = gamma)) +
  xlab("Distance") + 
  ylab("Semivariance")

We will convert the prediction grid to a simple features object:

df.sf <- df.p %>%
  st_as_sf(coords = c("X", "Y"),
           remove = FALSE)

Then, we can krige the field as follows (ensure that packages sf and stars are installed):

V.kriged <- krige(V ~ X3 + X2Y + X2 + X + XY + Y + Y2 + XY2 + Y3,
                  Walker_Lake.sf, 
                  df.sf, 
                  variogram_v.t)
## [using universal kriging]

Extract the predictions and prediction variance from the object V.kriged:

V.km <- matrix(data = V.kriged$var1.pred,
               nrow = 119,
               ncol = 103, 
               byrow = TRUE)
V.sm <- matrix(data = V.kriged$var1.var,
               nrow = 119,
               ncol = 103,
               byrow = TRUE)

We can now plot the interpolated field:

V.km.plot <- plot_ly(x = ~X.p,
                     y = ~Y.p, 
                     z = ~V.km, 
                     type = "surface", 
                     colors = "YlOrRd") %>% 
  layout(scene = list(aspectmode = "manual", 
                      aspectratio = list(x = 1,
                                         y = 1,
                                         z = 1)))
V.km.plot

Also, we can plot the kriging standard errors (the square root of the prediction variance). This gives an estimate of the uncertainty in the predictions:

V.sm.plot <- plot_ly(x = ~X.p, 
                     y = ~Y.p,
                     z = ~sqrt(V.sm), 
                     type = "surface", 
                     colors = "YlOrRd") %>% 
  layout(scene = list(aspectmode = "manual",
                      aspectratio = list(x = 1, 
                                         y = 1, 
                                         z = 1)))
V.sm.plot

Where are predictions more/less precise?

References

Bailey, T. C., and A. C. Gatrell. 1995. Interactive Spatial Data Analysis. Book. Essex: Addison Wesley Longman.
Bivand, R. S., E. J. Pebesma, and V. Gómez-Rubio. 2008. Applied Spatial Data Analysis with r. Book. New York: Springer Science+Business Media.
Brunsdon, Chris, and Lex Comber. 2015. An Introduction to r for Spatial Analysis and Mapping. Book. Sage.
Isaaks, E. H., and R. M. Srivastava. 1989. Applied Geostatistics. Book. New York: Oxford University Press.
O’Sullivan, David, and David Unwin. 2010. Geographic Information Analysis. Book. 2nd. Edition. Hoboken, New Jersey: John Wiley & Sons.