Global and local measures of spatial auto correlation

1. Quick Recap

In previous sections we have seen how to

extract geospatial and aspatial datasets,
convert one projection to another,
perform exploratory data analysis,
create choropleth map using tmap’s elements and leverage its asthetics,
compute spatial weights (QUEEN, ROOK) and plot them,
compute distance weight matrix

2. Introduction

In this section, let’s see how to compute Global and Local Measure of Spatial Autocorrelation (GLSA). In spatial policy, one of the main development objective of the local govenment and planners is to ensure equal distribution of development in the province. Our task is to apply appropriate spatial statistical methods to discover if development are even distributed geographically.

3. Glimpse of Steps

Some of the important steps performed in this study are as follows

importing geospatial data using appropriate function(s) of sf package,
importing csv file using appropriate function of readr package,
performing relational join using appropriate join function of dplyr package,
computing Global Spatial Autocorrelation (GSA) statistics by using appropriate functions of spdep package,
- plot Moran scatterplot,
- compute and plot spatial correlogram using appropriate function of spdep package.
computing Local Indicator of Spatial Association (LISA) statistics for detecting clusters and outliers by using appropriate functions spdep package;
computing Getis-Ord’s Gi-statistics for detecting hot spot or/and cold spot area by using appropriate functions of spdep package; and
visualising the analysis output by using tmap package.

4. Data

Following two data sets are used:

Hunan province administrative boundary layer at county level. This is a geospatial data set in ESRI shapefile format
Hunan_2012.csv: This csv file contains selected Hunan’s local development indicators in 2012

5. Deep Dive into Map Analysis

5.1 Installing libraries and Importing files

p_load function pf pacman package is used to install and load sf ,tidyverse and tmap packages into R environment. First, let us import Hunan shapefile into R using st_read() of sf package. The imported shapefile will be simple features Object of sf. Next, let us import Hunan_2012.csv into R by using read_csv() of readr package. The output is R dataframe class.

pacman::p_load(sf, spdep, tmap, tidyverse)
hunan <- st_read(dsn = "data/geospatial", 
                 layer = "Hunan")

Reading layer `Hunan' from data source 
  `D:\raveenaclr\Geospatial Analytics\Hands-on_Ex\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 88 features and 7 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 108.7831 ymin: 24.6342 xmax: 114.2544 ymax: 30.12812
Geodetic CRS:  WGS 84

hunan2012 <- read_csv("data/aspatial/Hunan_2012.csv")

Rows: 88 Columns: 29
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): County, City
dbl (27): avg_wage, deposite, FAI, Gov_Rev, Gov_Exp, GDP, GDPPC, GIO, Loan, ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

5.2 Data Wrangling

The code chunk below will be used to update the attribute table of hunan’s SpatialPolygonsDataFrame with the attribute fields of hunan2012 dataframe. This is performed by using left_join() of dplyr package.

hunan <- left_join(hunan,hunan2012)

Joining, by = "County"

5.3 Visualising Regional Development Indicator

Now, let us prepare a basemap and a choropleth map showing the distribution of GDPPC 2012 by using qtm() of tmap package.

equal <- tm_shape(hunan) +
  tm_fill("GDPPC",
          n = 5,
          style = "equal") +
  tm_borders(alpha = 0.5) +
   tm_compass(type="8star",
             position=c("right", "top"))+
  tm_layout(main.title = "Equal classification",
            main.title.size = 1.5,
            main.title.position = "center",
            main.title.fontface = "bold")

quantile <- tm_shape(hunan) +
  tm_fill("GDPPC",
          n = 5,
          style = "quantile") +
  tm_borders(alpha = 0.5) +
  tm_compass(type="8star",
             position=c("right", "top"))+
  tm_layout(main.title = "Quantile classification",
            main.title.size = 1.5,
            main.title.position = "center",
            main.title.fontface = "bold")

tmap_arrange(equal, 
             quantile, 
             asp=1, 
             ncol=2)

Analysis:

It is seen that map classification with quantile interval is better than map classification with equal interval as the former clearly spreads out rather clustering into one segment. Hence, quantile style will be used in further analysis.

5.4 Global Spatial Autocorrelation

5.4.1 Computing Contiguity Spatial Weights

Before we can compute the global spatial autocorrelation statistics, we need to construct a spatial weights of the study area. The spatial weights is used to define the neighbourhood relationships between the geographical units (i.e. county) in the study area. The code chunk below is used to compute Queen contiguity weight matrix.

wm_q <- poly2nb(hunan, 
                queen=TRUE)
summary(wm_q)

Neighbour list object:
Number of regions: 88 
Number of nonzero links: 448 
Percentage nonzero weights: 5.785124 
Average number of links: 5.090909 
Link number distribution:

 1  2  3  4  5  6  7  8  9 11 
 2  2 12 16 24 14 11  4  2  1 
2 least connected regions:
30 65 with 1 link
1 most connected region:
85 with 11 links

5.4.2 Row-standardised weights matrix

Next, let us assign weights to each neighboring polygon. In our case, each neighboring polygon will be assigned equal weight (style=“W”).

rswm_q <- nb2listw(wm_q, 
                   style="W", 
                   zero.policy = TRUE)
rswm_q

Characteristics of weights list object:
Neighbour list object:
Number of regions: 88 
Number of nonzero links: 448 
Percentage nonzero weights: 5.785124 
Average number of links: 5.090909 

Weights style: W 
Weights constants summary:
   n   nn S0       S1       S2
W 88 7744 88 37.86334 365.9147

5.4.3 Global Spatial Autocorrelation: Moran’s I

The code chunk below performs Moran’s I statistical testing using moran.test() of spdep.

moran.test(hunan$GDPPC, 
           listw=rswm_q, 
           zero.policy = TRUE, 
           na.action=na.omit)


    Moran I test under randomisation

data:  hunan$GDPPC  
weights: rswm_q    

Moran I statistic standard deviate = 4.7351, p-value = 1.095e-06
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.300749970      -0.011494253       0.004348351

Interpretation:

H₁ (Alternative Hypothesis) - GDPPC distribution is spatially dependent.

H₀ (Null Hypothesis) - GDPPC distribution is a random phenomenon.

At 95% confidence interval, p=value is less than alpha (0.05). Hence it is significant and so null hypothesis can be rejected and confirm that GDPPC is not randomly distributed and the spatial distribution of high values and/or low values are more spatially clustered and observations tend to be similar.

Computing Monte Carlo Moran’s I

The code chunk below performs permutation test for Moran’s I statistic by using moran.mc() of spdep. A total of 1000 simulation will be performed.

set.seed(1234)
bperm= moran.mc(hunan$GDPPC, 
                listw=rswm_q, 
                nsim=999, 
                zero.policy = TRUE, 
                na.action=na.omit)
bperm


    Monte-Carlo simulation of Moran I

data:  hunan$GDPPC 
weights: rswm_q  
number of simulations + 1: 1000 

statistic = 0.30075, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater

Similarly, even after performing 1000 permutations, it is confirmed that at 95% confidence interval, p value is significant (less than 0.05) and Z score is positive. So null hypothesis can be rejected and confirm that GDPPC is not randomly distributed and the spatial distribution of high values and/or low values are more spatially clustered and observations tend to be similar.

Let us examine the simulated Moran’s I test statistics in detail by plotting the distribution of the statistical values as a histogram by using the code chunk below.

mean(bperm$res[1:999])

[1] -0.01504572

var(bperm$res[1:999])

[1] 0.004371574

summary(bperm$res[1:999])

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.18339 -0.06168 -0.02125 -0.01505  0.02611  0.27593

hist(bperm$res, 
     freq=TRUE, 
     breaks=20, 
     xlab="Simulated Moran's I")
abline(v=0, 
       col="red")

Interpretation:

Simulated values are both positive and negative. The positive values indicate that the values are clustered, observations tend to be similar; whereas the negative values on the left of 0 indicate that the values are dispersed and observations tend to be dissimilar.

Challenge: Plotting using ggplot

Now the same visualisation is plotted using ggplot instead of base graph.

res values are converted to dataframe and passed as ggplot data
geom_histogram() is used on top of ggplot to plot the histogram
vertical line is plotted at value 0 using geom_vline()

p1 <- ggplot(data=as.data.frame(bperm[7]), 
       aes(x= res)) +
      geom_histogram(bins=20, 
                     color="black", 
                     fill="light blue") +
      geom_vline(aes(xintercept=0),
                     color="red", 
                     linetype="dashed", 
                     size=1)+
      labs(title = "Moran's I test statistic distribution",
       x = "Moran's I",
       y = "Frequency")
p1

Interpretation:

5.4.5 Global Spatial Autocorrelation: Geary’s

The code chunk below performs Geary’s C test for spatial autocorrelation by using geary.test() of spdep.

geary.test(hunan$GDPPC, listw=rswm_q)


    Geary C test under randomisation

data:  hunan$GDPPC 
weights: rswm_q 

Geary C statistic standard deviate = 3.6108, p-value = 0.0001526
alternative hypothesis: Expectation greater than statistic
sample estimates:
Geary C statistic       Expectation          Variance 
        0.6907223         1.0000000         0.0073364

Analysis:

H₀ : The attribute is randomly distributed among the features in Hunan

H₁: The attribute is not randomly distributed among the features in Hunan

As the p-value is significant, null hypothesis can be rejected, and we can conclude that the attribute is not randomly distributed and the spatial distribution in the dataset is more spatially clustered than the expected value.

Computing Monte Carlo Geary’s C

The code chunk below performs permutation test for Geary’s C statistic by using geary.mc() of spdep.

set.seed(1234)
bperm=geary.mc(hunan$GDPPC, 
               listw=rswm_q, 
               nsim=999)
bperm


    Monte-Carlo simulation of Geary C

data:  hunan$GDPPC 
weights: rswm_q 
number of simulations + 1: 1000 

statistic = 0.69072, observed rank = 1, p-value = 0.001
alternative hypothesis: greater

Interpretation:

Here at 95% confidence interval, p value is less than 0.05 and Geary’s C value (Z score)is less than 1. So, we can reject the null hypothesis and confirm that the values are clustered, observations tend to be similar.

mean(bperm$res[1:999])

[1] 1.004402

var(bperm$res[1:999])

[1] 0.007436493

summary(bperm$res[1:999])

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.7142  0.9502  1.0052  1.0044  1.0595  1.2722

Next, let us plot a histogram to reveal the distribution of the simulated values by using the code chunk below.

hist(bperm$res, freq=TRUE, breaks=20, xlab="Simulated Geary c")
abline(v=1, col="red")

Interpretation:

At c = 1, observations are arranged randomly over space. and half of the values here are less than 1 and almost half of the values are greater than 1. Hence we can confirm that the points which have larger c value (>1) are said to be dispersed, observations tend to be dissimilar and all the small values <1 indicate that they are clustered and observations tend to be similar.

5.5 Spatial Correlogram

Spatial correlograms are great to examine patterns of spatial autocorrelation in your data or model residuals. They show how correlated are pairs of spatial observations when you increase the distance (lag) between them - they are plots of some index of autocorrelation (Moran’s I or Geary’s c) against distance.

5.5.1 Compute Moran’s I correlogram

In the code chunk below, sp.correlogram() of spdep package is used to compute a 6-lag spatial correlogram of GDPPC. The global spatial autocorrelation used in Moran’s I and Geary’s C. The plot() of base Graph is then used to plot the output.

The plot() of base graph is then used to plot the output and the plots are arranged using par(mfrow()) parameter, here it is 1 row and 2 columns.

par(mfrow=c(1,2))
MI_corr <- sp.correlogram(wm_q, 
                          hunan$GDPPC,
                          order=6, 
                          method="I", 
                          style="W")
plot(MI_corr,
     main ="Moran's I error plot")

GC_corr <- sp.correlogram(wm_q, 
                          hunan$GDPPC, 
                          order=6, 
                          method="C", 
                          style="W")
plot(GC_corr,
     main ="Geary's C error plot")

Interpretation:

The plots reveal that the relationship between Moran’s I and Geary’C test statistic values are inverse. Moran’s I values approach -1 whereas Geary’s C test values approach 2 for subsequent lags which indicates that quite dissimilar values tend to form a cluster.

By plotting the output might not allow us to provide complete interpretation. This is because not all autocorrelation values are statistically significant. Hence, it is important for us to examine the full analysis report by printing out the analysis results as in the code chunk below.

print(MI_corr)

Spatial correlogram for hunan$GDPPC 
method: Moran's I
         estimate expectation   variance standard deviate Pr(I) two sided    
1 (88)  0.3007500  -0.0114943  0.0043484           4.7351       2.189e-06 ***
2 (88)  0.2060084  -0.0114943  0.0020962           4.7505       2.029e-06 ***
3 (88)  0.0668273  -0.0114943  0.0014602           2.0496        0.040400 *  
4 (88)  0.0299470  -0.0114943  0.0011717           1.2107        0.226015    
5 (88) -0.1530471  -0.0114943  0.0012440          -4.0134       5.984e-05 ***
6 (88) -0.1187070  -0.0114943  0.0016791          -2.6164        0.008886 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

print(GC_corr)

Spatial correlogram for hunan$GDPPC 
method: Geary's C
        estimate expectation  variance standard deviate Pr(I) two sided    
1 (88) 0.6907223   1.0000000 0.0073364          -3.6108       0.0003052 ***
2 (88) 0.7630197   1.0000000 0.0049126          -3.3811       0.0007220 ***
3 (88) 0.9397299   1.0000000 0.0049005          -0.8610       0.3892612    
4 (88) 1.0098462   1.0000000 0.0039631           0.1564       0.8757128    
5 (88) 1.2008204   1.0000000 0.0035568           3.3673       0.0007592 ***
6 (88) 1.0773386   1.0000000 0.0058042           1.0151       0.3100407    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation:

Both Moran’s I and Geary’s C test results show that at 90% confidence, (alpha value = 0.01), most of the autocorrelated values are significant. And hence we can reject the null hypothesis stating the feature are random and with statistical evidences, we can confirm the following

Observed spatial pattern of GDPPC is spatially dependent and not equally likely as any other spatial pattern.
GDPPC at one location depends on values at other (neighbouring) locations.

5.6 Cluster and Outlier Analysis

Local Indicators of Spatial Association or LISA are statistics that evaluate the existence of clusters in the spatial arrangement of a given variable.

5.6.1 Computing local Moran’s I

The code chunks below are used to compute local Moran’s I of GDPPC2012 at the county level.

fips <- order(hunan$County)
localMI <- localmoran(hunan$GDPPC, rswm_q)
head(localMI)

            Ii          E.Ii       Var.Ii        Z.Ii Pr(z != E(Ii))
1 -0.001468468 -2.815006e-05 4.723841e-04 -0.06626904      0.9471636
2  0.025878173 -6.061953e-04 1.016664e-02  0.26266425      0.7928094
3 -0.011987646 -5.366648e-03 1.133362e-01 -0.01966705      0.9843090
4  0.001022468 -2.404783e-07 5.105969e-06  0.45259801      0.6508382
5  0.014814881 -6.829362e-05 1.449949e-03  0.39085814      0.6959021
6 -0.038793829 -3.860263e-04 6.475559e-03 -0.47728835      0.6331568

5.6.2 Mapping the local Moran’s I

The code chunks below can be used to perform the task. The out SpatialPolygonDataFrame is called hunan.localMI.

hunan.localMI <- cbind(hunan,localMI) %>%
  rename(Pr.Ii = Pr.z....E.Ii..)

5.6.3 Mapping local Moran’s I values

Using choropleth mapping functions of tmap package, we can plot the local Moran’s I values by using the code chinks below.

tm_shape(hunan.localMI) +
  tm_fill(col = "Ii", 
          style = "pretty",
          palette = "RdBu",
          title = "local moran statistics") +
  tm_borders(alpha = 0.5)+
  tm_compass(type="8star",
             position=c("right", "top"))+
  tm_layout(main.title = "Local Moran I spatial distribution",
            main.title.size = 1.5,
            main.title.position = "center",
            main.title.fontface = "bold")

Variable(s) "Ii" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Analysis:

From the above map, we can understand the following

- Cluster - All positive valued region have neighboring features with similarly high or low attribute values

- Outlier - All negative valued regions have neighboring features with dissimilar values

5.6.4 Mapping local Moran’s I p-values

The code chunks below produce a choropleth map of Moran’s I p-values by using functions of tmap package.

tm_shape(hunan.localMI) +
  tm_fill(col = "Pr.Ii", 
          breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
          palette="-Blues", 
          title = "local Moran's I p-values") +
  tm_borders(alpha = 0.5)+
  tm_compass(type="8star",
             position=c("right", "top"))+
  tm_layout(main.title = "Local Moran p vlaues spatial distribution",
            main.title.size = 1.5,
            main.title.position = "center",
            main.title.fontface = "bold")

Analysis:

From the above map, we can understand the following

- For every corresponding I value, we can locate the regions which are significant or not.

- Darker the region, higher the significance

5.6.5 Mapping both local Moran’s I values and p-values

Let us now plot both the local Moran’s I values map and its corresponding p-values map next to each other for effective interpretation.

localMI.map <- tm_shape(hunan.localMI) +
  tm_fill(col = "Ii", 
          style = "pretty", 
          title = "local moran statistics") +
  tm_borders(alpha = 0.5)+
   tm_compass(type="8star",
             position=c("right", "top"))+
  tm_layout(main.title = "Local Moran I spatial distribution",
            main.title.size = 1.5,
            main.title.position = "center",
            main.title.fontface = "bold")

pvalue.map <- tm_shape(hunan.localMI) +
  tm_fill(col = "Pr.Ii", 
          breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
          palette="-Blues", 
          title = "local Moran's I p-values") +
  tm_borders(alpha = 0.5)+
   tm_compass(type="8star",
             position=c("right", "top"))+
  tm_layout(main.title = "Local Moran p value spatial distribution",
            main.title.size = 1.5,
            main.title.position = "center",
            main.title.fontface = "bold")

tmap_arrange(localMI.map, pvalue.map, asp=1, ncol=2)

Variable(s) "Ii" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Analysis:

From the left map, we can understand the following

- Cluster - All positive valued region have neighboring features with similarly high or low attribute values

- Outlier - All negative valued regions have neighboring features with dissimilar values

From the right map, we can understand the following

- For every corresponding I value, we can locate the regions which are significant or not.

- Darker the region, higher the significance.

5.7 Creating a LISA Cluster Map

The LISA Cluster Map shows the significant locations color coded by type of spatial autocorrelation. The first step before we can generate the LISA cluster map is to plot the Moran scatterplot.

5.7.1 Plotting Moran scatterplot

The code chunk below plots the Moran scatterplot of GDPPC 2012 by using moran.plot() of spdep.

nci <- moran.plot(hunan$GDPPC, rswm_q,
                  labels=as.character(hunan$County), 
                  xlab="GDPPC 2012", 
                  ylab="Spatially Lag GDPPC 2012")

5.7.2 Plotting Moran scatterplot with standardised variable

First let us use scale() to centers and scales the variable. Centering is done by subtracting the mean (omitting NAs) the corresponding columns, and scaling is done by dividing the (centered) variable by their standard deviations.

hunan$Z.GDPPC <- scale(hunan$GDPPC) %>% as.vector
nci2 <- moran.plot(hunan$Z.GDPPC, rswm_q,
                   labels=as.character(hunan$County),
                   xlab="z-GDPPC 2012", 
                   ylab="Spatially Lag z-GDPPC 2012")

5.7.3 Preparing LISA map classes

First, let us center the variable of interest around its mean.
It is followed by centering the local Moran’s around the mean.
Next, let us set a statistical significance level for the local Moran.
We have defined high-high, low-low, low-high and high-low categories.
Lastly, places non-significant Moran in the category 0.

quadrant <- vector(mode="numeric",length=nrow(localMI))
DV <- hunan$GDPPC - mean(hunan$GDPPC)     
C_mI <- localMI[,1] - mean(localMI[,1])    
signif <- 0.05
quadrant[DV >0 & C_mI>0] <- 4      
quadrant[DV <0 & C_mI<0] <- 1      
quadrant[DV <0 & C_mI>0] <- 2
quadrant[DV >0 & C_mI<0] <- 3
quadrant[localMI[,5]>signif] <- 0

5.7.4 Plotting LISA map

Now, let us build the LISA map by using the code chunks below.

hunan.localMI$quadrant <- quadrant
colors <- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
clusters <- c("insignificant", "low-low", "low-high", "high-low", "high-high")

tm_shape(hunan.localMI) +
  tm_fill(col = "quadrant", 
          style = "cat", 
          palette = colors[c(sort(unique(quadrant)))+1], 
          labels = clusters[c(sort(unique(quadrant)))+1],
          popup.vars = c("")) +
  tm_view(set.zoom.limits = c(11,17)) +
  tm_borders(alpha=0.5)+
   tm_compass(type="8star",
             position=c("right", "top"))+
  tm_layout(main.title = "LISA Map",
            main.title.size = 1.5,
            main.title.position = "center",
            main.title.fontface = "bold")

Analysis:

We can see that GDPPC spatial distribution is positively autocorrelated in the northern region i.e. those are associated with relatively high values of the surrounding locations.

Let us now plot both the local Moran’s I values map and its corresponding p-values map next to each other for effective interpretation.

gdppc <- qtm(hunan, "GDPPC")

hunan.localMI$quadrant <- quadrant
colors <- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
clusters <- c("insignificant", "low-low", "low-high", "high-low", "high-high")

LISAmap <- tm_shape(hunan.localMI) +
  tm_fill(col = "quadrant", 
          style = "cat", 
          palette = colors[c(sort(unique(quadrant)))+1], 
          labels = clusters[c(sort(unique(quadrant)))+1],
          popup.vars = c("")) +
  tm_view(set.zoom.limits = c(11,17)) +
  tm_borders(alpha=0.5)+
   tm_compass(type="8star",
             position=c("right", "top"))+
  tm_layout(main.title = "LISA Map",
            main.title.size = 1.5,
            main.title.position = "center",
            main.title.fontface = "bold")

tmap_arrange(gdppc, LISAmap, asp=1, ncol=2)

Analysis:

In left map with normal choropleth, GDPCC distribution is all over place. We couldn’t find any significant regions whereas in the right side map, we can see that spatial distribution of GDPPC is positively autocorrelated in the eastern region i.e. those are associated with relatively high values of the surrounding locations.

5.8 Hot Spot and Cold Spot Area Analysis

Beside detecting cluster and outliers, localised spatial statistics can be also used to detect hot spot and/or cold spot areas.

5.8.1 Deriving the centroid

To get our longitude values we map the st_centroid() function over the geometry column of us.bound. This allows us to get only the longitude, which is the first value in each centroid. We do the same for latitude with one key difference. We access the second value per each centroid. Now that we have latitude and longitude, we use cbind to put longitude and latitude into the same object.

longitude <- map_dbl(hunan$geometry, ~st_centroid(.x)[[1]])
latitude <- map_dbl(hunan$geometry, ~st_centroid(.x)[[2]])
coords <- cbind(longitude, latitude)

5.8.2 Determining the cut-off distance

Firstly, we need to determine the upper limit for distance band by using the steps below:

Return a matrix with the indices of points belonging to the set of the k nearest neighbours of each other by using knearneigh() of spdep.
Convert the knn object returned by knearneigh() into a neighbours list of class nb with a list of integer vectors containing neighbour region number ids by using knn2nb().
Return the length of neighbour relationship edges by using nbdists() of spdep. The function returns in the units of the coordinates if the coordinates are projected, in km otherwise.
Remove the list structure of the returned object by using unlist().

#coords <- coordinates(hunan)
k1 <- knn2nb(knearneigh(coords))
k1dists <- unlist(nbdists(k1, coords, longlat = TRUE))
summary(k1dists)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  24.79   32.57   38.01   39.07   44.52   61.79

The summary report shows that the largest first nearest neighbour distance is 61.79 km, so using this as the upper threshold gives certainty that all units will have at least one neighbour.

5.8.3 Computing fixed distance weight matrix

Now, let us compute the distance weight matrix by using dnearneigh() as shown in the code chunk below.

wm_d62 <- dnearneigh(coords, 0, 62, longlat = TRUE)
wm_d62

Neighbour list object:
Number of regions: 88 
Number of nonzero links: 324 
Percentage nonzero weights: 4.183884 
Average number of links: 3.681818

wm62_lw <- nb2listw(wm_d62, style = 'B')
summary(wm62_lw)

Characteristics of weights list object:
Neighbour list object:
Number of regions: 88 
Number of nonzero links: 324 
Percentage nonzero weights: 4.183884 
Average number of links: 3.681818 
Link number distribution:

 1  2  3  4  5  6 
 6 15 14 26 20  7 
6 least connected regions:
6 15 30 32 56 65 with 1 link
7 most connected regions:
21 28 35 45 50 52 82 with 6 links

Weights style: B 
Weights constants summary:
   n   nn  S0  S1   S2
B 88 7744 324 648 5440

5.8.4 Computing adaptive distance weight matrix

It is possible to control the numbers of neighbours directly using k-nearest neighbours, either accepting asymmetric neighbours or imposing symmetry as shown in the code chunk below.

knn <- knn2nb(knearneigh(coords, k=8))
knn

Neighbour list object:
Number of regions: 88 
Number of nonzero links: 704 
Percentage nonzero weights: 9.090909 
Average number of links: 8 
Non-symmetric neighbours list

knn_lw <- nb2listw(knn, style = 'B')
summary(knn_lw)

Characteristics of weights list object:
Neighbour list object:
Number of regions: 88 
Number of nonzero links: 704 
Percentage nonzero weights: 9.090909 
Average number of links: 8 
Non-symmetric neighbours list
Link number distribution:

 8 
88 
88 least connected regions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 with 8 links
88 most connected regions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 with 8 links

Weights style: B 
Weights constants summary:
   n   nn  S0   S1    S2
B 88 7744 704 1300 23014

5.9 Computing Gi statistics

5.9.1 Gi statistics using fixed distance

The Gi statistics is represented as a Z-score. Greater values represent a greater intensity of clustering and the direction (positive or negative) indicates high or low clusters.

fips <- order(hunan$County)
gi.fixed <- localG(hunan$GDPPC, wm62_lw)
gi.fixed

 [1]  0.436075843 -0.265505650 -0.073033665  0.413017033  0.273070579
 [6] -0.377510776  2.863898821  2.794350420  5.216125401  0.228236603
[11]  0.951035346 -0.536334231  0.176761556  1.195564020 -0.033020610
[16]  1.378081093 -0.585756761 -0.419680565  0.258805141  0.012056111
[21] -0.145716531 -0.027158687 -0.318615290 -0.748946051 -0.961700582
[26] -0.796851342 -1.033949773 -0.460979158 -0.885240161 -0.266671512
[31] -0.886168613 -0.855476971 -0.922143185 -1.162328599  0.735582222
[36] -0.003358489 -0.967459309 -1.259299080 -1.452256513 -1.540671121
[41] -1.395011407 -1.681505286 -1.314110709 -0.767944457 -0.192889342
[46]  2.720804542  1.809191360 -1.218469473 -0.511984469 -0.834546363
[51] -0.908179070 -1.541081516 -1.192199867 -1.075080164 -1.631075961
[56] -0.743472246  0.418842387  0.832943753 -0.710289083 -0.449718820
[61] -0.493238743 -1.083386776  0.042979051  0.008596093  0.136337469
[66]  2.203411744  2.690329952  4.453703219 -0.340842743 -0.129318589
[71]  0.737806634 -1.246912658  0.666667559  1.088613505 -0.985792573
[76]  1.233609606 -0.487196415  1.626174042 -1.060416797  0.425361422
[81] -0.837897118 -0.314565243  0.371456331  4.424392623 -0.109566928
[86]  1.364597995 -1.029658605 -0.718000620
attr(,"cluster")
 [1] Low  Low  High High High High High High High Low  Low  High Low  Low  Low 
[16] High High High High Low  High High Low  Low  High Low  Low  Low  Low  Low 
[31] Low  Low  Low  High Low  Low  Low  Low  Low  Low  High Low  Low  Low  Low 
[46] High High Low  Low  Low  Low  High Low  Low  Low  Low  Low  High Low  Low 
[61] Low  Low  Low  High High High Low  High Low  Low  High Low  High High Low 
[76] High Low  Low  Low  Low  Low  Low  High High Low  High Low  Low 
Levels: Low High
attr(,"gstari")
[1] FALSE
attr(,"call")
localG(x = hunan$GDPPC, listw = wm62_lw)
attr(,"class")
[1] "localG"

hunan.gi <- cbind(hunan, as.matrix(gi.fixed)) %>%
  rename(gstat_fixed = as.matrix.gi.fixed.)

5.9.2 Mapping Gi values with fixed distance weights

The code chunk below shows the functions used to map the Gi values derived using fixed distance weight matrix.

gdppc <- qtm(hunan, "GDPPC")

Gimap <-tm_shape(hunan.gi) +
  tm_fill(col = "gstat_fixed", 
          style = "pretty",
          palette="-RdBu",
          title = "local Gi") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "Gi map (Fixed Distance)",
            main.title.size = 1.5,
            main.title.position = "center",
            main.title.fontface = "bold")+
  tm_compass(type="8star",
             position=c("right", "top"))


tmap_arrange(gdppc, Gimap, asp=1, ncol=2)

Variable(s) "gstat_fixed" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Analysis:

Plotting Gi map helps us to understand that GDPPC is associated with relatively high values of the surrounding locations (clustered- hotspot areas) in the eastern region of Hunan and are associated with relatively low values in surrounding locations (scattered - coldspot) in northern and southern region of Hunan.

The code chunk below are used to compute the Gi values for GDPPC2012 by using an adaptive distance weight matrix (i.e knb_lw).

fips <- order(hunan$County)
gi.adaptive <- localG(hunan$GDPPC, knn_lw)
hunan.gi <- cbind(hunan, as.matrix(gi.adaptive)) %>%
  rename(gstat_adaptive = as.matrix.gi.adaptive.)

5.9.4 Mapping Gi values with adaptive distance weights

The code chunk below shows the functions used to map the Gi values derived using fixed distance weight matrix.

gdppc<- qtm(hunan, "GDPPC")

Gimap <- tm_shape(hunan.gi) + 
  tm_fill(col = "gstat_adaptive", 
          style = "pretty", 
          palette="-RdBu", 
          title = "local Gi") + 
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "Gi map (Adaptive Distance)",
            main.title.size = 1.5,
            main.title.position = "center",
            main.title.fontface = "bold")+
  tm_compass(type="8star",
             position=c("right", "top"))

tmap_arrange(gdppc, 
             Gimap, 
             asp=1, 
             ncol=2)

Variable(s) "gstat_adaptive" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Analysis:

Similarly, Gi Map with adaptive distance shows that GDPPC is associated with relatively high values of the surrounding locations (clustered- hotspot areas) in the eastern region of Hunan and are associated with relatively low values in surrounding locations (scattered - coldspot) in northern and southern region of Hunan.

6. Conclusion & Key Takeaways

In this exercise, we have seen in greater detail that how to compute Global and Local measure of Spatial Autocorrelation (GLSA) and also how to perform cluster analysis to group the attributes and finally the ways of executing hot spot and cold spot area analysis for easier understanding. In the upcoming section, let us view how to perform geographical segmentation with spatially constrained clustering techniques.