### The Null Hypothesis

Many of the statistics in the spatial statistics toolbox are inferential spatial pattern analysis techniques (i.e., Global Moran's I, Local Moran's I, Gi*). Inferential statistics are grounded in probability theory. Probability is a measure of chance, and underlying all statistical tests (either directly or indirectly) are probability calculations that assess the role of chance on the outcome of your analysis. Typically, with traditional (non-spatial) statistics, you work with a random sample and try to determine the probability that your sample data is a good representation (is reflective) of the population at large. As an example, you might ask: "What are the chances that the results from my exit poll (showing candidate A will beat candidate B by a slim margin, perhaps) will reflect final election results?" But with many spatial statistics, including the spatial autocorrelation type statistics listed above, very often you are dealing with*all*available data for the study area (all crimes, all disease cases, attributes for every census block, and so on). When you compute a statistic (the mean, for example) for the

*entire*population, you no longer have an estimate at all. You have a

*fact.*Consequently, it makes no sense to talk about "likelihood" or "probabilities" any more. So what can you do in the case where you have all data values for a study area? You can only assess probabilities by postulating, via the null hypothesis, that your spatial data are, in fact, part of some larger population. Where appropriate, the tools in the spatial statistics toolbox use the randomization null hypothesis as the basis for statistical significance testing. The randomization null hypothesis postulates that the observed spatial pattern of your data represents one of many (n!) possible spatial arrangements. If you could pick up your data values and throw them down onto the features in your study area, you would have one possible spatial arrangement. The randomization null hypothesis states that if you could do this exercise (pick them up, throw them down) infinite times, most of the time you would produce a pattern that would not be markedly different from the observed pattern (your real data). Once in a while you might accidentally throw all of the highest values into the same corner of your study area, but the probabilities of doing that are small. The randomization null hypothesis states that your data is one of many, many, many possible versions of complete spatial randomness. The data values are fixed; only their spatial arrangement could vary.A common alternative null hypothesis, not implemented for the spatial statistics toolbox, is the normalization null hypothesis. The normalization null hypothesis postulates that the observed values are derived from an infinitely large, normally distributed population of values through some random sampling process. With a different sample you would get different values, but you would still expect those values to be representative of the larger distribution. The normalization null hypothesis states that the values represent one of many possible sample of values. If you could fit your observed data to a normal curve and then randomly select values to toss onto your study area, most of the time you would produce a pattern and distribution of values that would not be markedly different from the observed pattern/distribution (your real data). The normalization null hypothesis states that your data and their arrangement are one of many, many, many possible random samples. Neither the data values nor their spatial arrangment are fixed. The normalization null hypothesis is only appropriate when the data values are normally distributed.

### Additional Resources:

- Ebdon, David. Statistics in Geography. Blackwell, 1985.
- Mitchell, Andy. The ESRI Guide to GIS Analysis, Volume 2. ESRI Press, 2005.
- Goodchild, M.F., Spatial Autocorrelation. Catmog 47, Geo Books, 1986