Spatial GLMs with glmmfields

Here we will use the glmmfields package to fit a spatial GLM with a predictor. While glmmfields was designed to fit spatiotemporal GLMs with the possibility of extreme events, it can also be used to fit regular spatial GLMs without a time element and without extreme events. Currently it can fit Gaussian (link = identity), Gamma (link = log), Poisson (link = log), negative binomial (link = log), and binomial (link = logit) models. The package can also fit lognormal (link = log) models.

Let’s load the necessary packages:

library(glmmfields)
library(ggplot2)
library(dplyr)

Set up parallel processing (not used in this example):

options(mc.cores = parallel::detectCores())

First, let’s simulate some data. We will use the built-in function sim_glmmfields(), but normally you would start with your own data. We will simulate 200 data points, some (fake) temperature data, an underlying random field spatial pattern, and add some observation error. In this example we will fit a Gamma GLM with a log link.

The underlying intercept is 0.5 and the slope between temperature and our observed variable (say biomass or density of individuals in a quadrat) is 0.2.

set.seed(1)
N <- 200 # number of data points
temperature <- rnorm(N, 0, 1) # simulated temperature data
X <- cbind(1, temperature) # our design matrix
s <- sim_glmmfields(
  n_draws = 1, gp_theta = 1.5, n_data_points = N,
  gp_sigma = 0.2, sd_obs = 0.2, n_knots = 12, obs_error = "gamma",
  covariance = "squared-exponential", X = X,
  B = c(0.5, 0.2) # B represents our intercept and slope
)
d <- s$dat
d$temperature <- temperature
ggplot(s$dat, aes(lon, lat, colour = y)) +
  viridis::scale_colour_viridis() +
  geom_point(size = 3)

If we fit a regular GLM we can see that there is spatial autocorrelation in the residuals:

m_glm <- glm(y ~ temperature, data = d, family = Gamma(link = "log"))
m_glm
## 
## Call:  glm(formula = y ~ temperature, family = Gamma(link = "log"), 
##     data = d)
## 
## Coefficients:
## (Intercept)  temperature  
##      0.5369       0.1967  
## 
## Degrees of Freedom: 199 Total (i.e. Null);  198 Residual
## Null Deviance:       23.27 
## Residual Deviance: 16.72     AIC: 280.9
confint(m_glm)
##                 2.5 %    97.5 %
## (Intercept) 0.4964271 0.5780115
## temperature 0.1522553 0.2413277
d$m_glm_residuals <- residuals(m_glm)
ggplot(d, aes(lon, lat, colour = m_glm_residuals)) +
  scale_color_gradient2() +
  geom_point(size = 3)

Let’s instead fit a spatial GLM with random fields. Note that we are only using 1 chain and 500 iterations here so the vignette builds quickly on CRAN. For final inference, you should likely use 4 or more chains and 2000 or more iterations.

m_spatial <- glmmfields(y ~ temperature,
  data = d, family = Gamma(link = "log"),
  lat = "lat", lon = "lon", nknots = 12, iter = 500, chains = 1,
  prior_intercept = student_t(3, 0, 10),
  prior_beta = student_t(3, 0, 3),
  prior_sigma = half_t(3, 0, 3),
  prior_gp_theta = half_t(3, 0, 10),
  prior_gp_sigma = half_t(3, 0, 3),
  seed = 123 # passed to rstan::sampling()
)
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#tail-ess

Let’s look at the model output:

m_spatial
## Inference for Stan model: glmmfields.
## 1 chains, each with iter=500; warmup=250; thin=1; 
## post-warmup draws per chain=250, total post-warmup draws=250.
## 
##            mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
## gp_sigma   0.27    0.01 0.06   0.17   0.22   0.26   0.30   0.41   150    1
## gp_theta   1.66    0.02 0.25   1.31   1.48   1.63   1.81   2.12   160    1
## B[1]       0.49    0.01 0.08   0.35   0.44   0.49   0.54   0.64    57    1
## B[2]       0.19    0.00 0.02   0.16   0.18   0.19   0.20   0.22   176    1
## CV[1]      0.21    0.00 0.01   0.19   0.20   0.21   0.22   0.23   151    1
## lp__     -63.13    0.25 3.01 -70.15 -64.96 -62.96 -60.94 -58.19   140    1
## 
## Samples were drawn using NUTS(diag_e) at Sat Oct 26 04:43:27 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

We can see that the 95% credible intervals are considerably wider on the intercept term and narrower on the slope coefficient in the spatial GLM vs. the model that ignored the spatial autocorrelation.

Let’s look at the residuals in space this time:

plot(m_spatial, type = "spatial-residual", link = TRUE) +
  geom_point(size = 3)

That looks better.

We can inspect the residuals versus fitted values:

plot(m_spatial, type = "residual-vs-fitted")

And the predictions from our model itself:

plot(m_spatial, type = "prediction", link = FALSE) +
  viridis::scale_colour_viridis() +
  geom_point(size = 3)

Compare that to our data at the top. Note that the original data also includes observation error with a CV of 0.2.

We can also extract the predictions themselves with credible intervals:

# link scale:
p <- predict(m_spatial)
head(p)
## # A tibble: 6 × 3
##   estimate conf_low conf_high
##      <dbl>    <dbl>     <dbl>
## 1    0.730   0.619      0.824
## 2    0.561   0.468      0.680
## 3    0.182   0.0682     0.325
## 4    0.927   0.825      1.06 
## 5    0.626   0.519      0.715
## 6    0.503   0.394      0.603
# response scale:
p <- predict(m_spatial, type = "response")
head(p)
## # A tibble: 6 × 3
##   estimate conf_low conf_high
##      <dbl>    <dbl>     <dbl>
## 1     2.08     1.86      2.28
## 2     1.75     1.60      1.97
## 3     1.20     1.07      1.38
## 4     2.53     2.28      2.88
## 5     1.87     1.68      2.05
## 6     1.65     1.48      1.83
# prediction intervals on new observations (include observation error):
p <- predict(m_spatial, type = "response", interval = "prediction")
head(p)
## # A tibble: 6 × 3
##   estimate conf_low conf_high
##      <dbl>    <dbl>     <dbl>
## 1     2.08    1.38       3.01
## 2     1.75    1.00       2.83
## 3     1.20    0.770      1.79
## 4     2.53    1.56       3.78
## 5     1.87    1.19       2.66
## 6     1.65    1.00       2.47

Or use the tidy method to get our parameter estimates as a data frame:

head(tidy(m_spatial, conf.int = TRUE, conf.method = "HPDinterval"))
## # A tibble: 6 × 5
##   term                     estimate std.error conf.low conf.high
##   <chr>                       <dbl>     <dbl>    <dbl>     <dbl>
## 1 gp_sigma                    0.266    0.0622    0.158     0.392
## 2 gp_theta                    1.66     0.247     1.30      2.13 
## 3 B[1]                        0.488    0.0816    0.353     0.643
## 4 B[2]                        0.193    0.0169    0.165     0.225
## 5 CV[1]                       0.210    0.0104    0.193     0.230
## 6 spatialEffectsKnots[1,1]    0.322    0.0963    0.147     0.496

Or make the predictions on a fine-scale spatial grid for a constant value of the predictor:

pred_grid <- expand.grid(
  lat = seq(min(d$lat), max(d$lat), length.out = 30),
  lon = seq(min(d$lon), max(d$lon), length.out = 30)
)
pred_grid$temperature <- mean(d$temperature)
pred_grid$prediction <- predict(
  m_spatial,
  newdata = pred_grid,
  type = "response"
)$estimate
ggplot(pred_grid, aes(lon, lat, fill = prediction)) +
  geom_raster() +
  viridis::scale_fill_viridis()