Statistical analysis is often used to explore your data—for example, to examine the distribution of values for a particular attribute or to spot outliers (extreme high or low values). Having this information is useful when defining classes and ranges on a map, when reclassifying data, or when looking for data errors.
In the example below, statistics have been calculated for the distribution of senior citizens by census tract in this region (percentage age 65 and over in each tract), including the mean and standard deviation, as well as a histogram showing the distribution of values. Most tracts have a lower percentage of seniors than the mean, but a few tracts have a very high percentage.Another use of statistical analysis is to summarize data. Often this is done for categories, such as calculating the total area in each land use category. You can also create spatial summaries, such as calculating the average elevation for each watershed. Summary data is useful for gaining a better understanding of conditions in a study area.
In the example below, summary statistics have been calculated for each landuse class showing the number of parcels in that class, the size of the smallest and largest parcel, the average parcel size, and the total area in the class.
Statistical analysis is also used to identify and confirm spatial patterns, such as the center of a group of features, the directional trend, or whether features form clusters. While patterns may be apparent on a map, trying to draw conclusions from a map can be difficult-how you classify and symbolize the data can obscure or overemphasize patterns. Statistical functions analyze the underlying data and give you a measure that can be used to confirm the existence and strength of the pattern.
Below is an example of analyses that show the mean center of a set of burglaries, and the standard deviation ellipse for a set of moose sightings (showing the directional trend)Below is an example of an analysis that shows statistically significant clusters of census tracts with many senior citizens (orange) or few (blue).

Statistical analysis functions in ArcGIS Desktop are either nonspatial (tabular) or spatial (containing location). Nonspatial statistics are used to analyze attribute values associated with features. The values are accessed directly from a layer's feature attribute table. Examples of nonspatial statistics include the mean and standard deviation.
In this example, the Summary Statistics tool was used to calculate the number of vacant parcels for a set of census tracts, including the total, the mean, and the standard deviation.
Charts and graphs, such as a histogram or Q-Q plots, are another way of analyzing nonspatial data. In all cases, only the values are analyzed. The locations of the features with which the values are associated—and any spatial relationships between the features—are not considered.
In this example, the histogram shows the distribution of vacant parcels (the number of vacant parcels along the x-axis and the number of tracts in each range along the y-axis).
A Normal Q-Q Plot is used to assess the similarity of the distribution of a set of values to that of a standard normal distribution (the typical bell curve, when shown on a histogram). The line on the Normal Q-Q plot shows expected values for a normal distribution—the closer the values to the line, the closer the distribution is to normal. In this example, the concentration of the elements Phosphorous for a set of soil samples is close to normally distributed.
The Normal QQ Plot tool is one of the data exploration tools available with the Geostatistical Analyst extension.
Spatial statistics, on the other hand, focus on the spatial relationships between features—how compact or dispersed the features are, whether they're oriented in a particular direction, and whether they form clusters. The spatial relationship is usually defined as distance (how far apart features are) but can also be other forms of interaction between features.
In the example below, the output of the Standard Distance tool (displayed graphically as a circle) is calculated using the distance of each wildlife sighting from the calculated center of the sightings.
Some spatial statistics consider both the spatial relationships of features and the values of an attribute associated with the features. These are known as weighted statistics—the spatial relationship is influenced by the values. Weighted spatial statistics are used to find out if features having similar values occur together—if, for example, schools with similarly high or low test scores form clusters.
In the example below, the center of parks is weighted by the number of visitors at each park (represented by the size of the green circles).
Statistical functions can also be classified by whether they're descriptive or inferential. Descriptive statistics summarize some characteristic of the values or features you're analyzing—the mean value, the frequency distribution of values, or the directional trend of a group of features. Descriptive statistics are often useful for comparing two sets of features for the same area.
The example below compares the distribution of senior citizens (top) to that of children under 5 (bottom) for the same set of census tracts.
In the example below, the standard distance circles for the American Indian and African American population show that the distribution of the African American population in this area is much more compact.
Inferential statistics use probability theory to either predict the likely occurrence of values (using a set of known values), or to assess the likelihood that any pattern or trend you see in the data is not due to chance. The function provides a measure of the pattern or relationship. You then perform a statistical test on this measure to determine whether it is significant at some level of confidence. If the statistic analysis indicates burglaries occur in clusters, you'd then run a test to find out the chance that the clusters occurred by chance. You might find, for example, that there's a 90% likelihood that the clusters didn't occur by chance, indicating the burglaries may be linked in some way. Essentially to determine the probability, the test compares the measure you get for the existing features to the measure you'd expect to get for the same number of features spread over the same area, but distributed randomly.
In the example below, the map on the left shows clusters of census tracts having a high number of senior citizens (orange) or a low number (blue), at a 90% level of probability; the right map shows clusters at a 99% level of probability.

The statistical functions in ArcGIS Desktop are located in ArcMap, ArcCatalog, and ArcToolbox, as well as within two extensions: Spatial Analyst and Geostatistical Analyst.

To summarize a field by one or more other fields (for example, to count the number of parcels in each landuse class, sum the area in each landuse class, or find the average parcel size in each class), use the Summarize option on the ArcMap table window, or the Frequency command in the Statistics toolset in the Analysis toolbox in ArcToolbox.

### Table statistics

A core set of descriptive statistics that summarize the values for a single field is available from several locations in ArcGIS Desktop-the table window in ArcMap, the table preview tab in ArcCatalog, and the Statistics toolset (within the Analysis toolbox) in ArcToolbox.Function | Location | Statistics | Output |
---|---|---|---|

Statistics menu option | ArcMap table window or ArcCatalog table preview tab | Count Minimum Maximum Sum Mean Standard Deviation Frequency histogram | Results are displayed in a window |

Summary Statistics tool | Analysis Toolbox/ Statistics Toolset | Minimum Maximum Sum Mean Standard Deviation Range First Last | Results are written to a new table |

Function | Location | Statistics | Output |
---|---|---|---|

Summarize menu option | ArcMap table window (right-click field name) | Minimum Maximum Average (mean) Sum Standard Deviation Variance | Results are written to a new table |

Frequency tool | Analysis Toolbox/ Statistics Toolset | Count Sum | Results are written to a new table |

### Spatial Statistics

The Spatial Statistics toolbox in ArcToolbox contains a number of statistical routines for analyzing the distribution of a set of features, analyzing patterns, and identifying clusters.Functional Area | Toolset | Tools |
---|---|---|

Geographic distribution measurements | Measuring Geographic Distributions | Mean Center Central Feature Standard Distance Directional Distribution (Standard Deviational Ellipse) Linear Directional Mean |

Geographic pattern analysis | Analyzing patterns | Average Nearest Neighbor Spatial Autocorrelation (Moran's I) High/Low Clustering (Getis-Ord General G) |

Geographic cluster analysis | Mapping clusters | Cluster and Outlier Analysis (Anselin Local Moran's I) Hot Spot Analysis (Getis-Ord Gi*) |

### Raster statistics

The Spatial Analyst extension includes several statistical functions that can be used to analyze rasters, primarily to summarize attribute values and assign the summary statistics to cells in a new raster layer. These are located in several different toolsets with the Spatial Analyst toolbox.Tool | Location | Input | Output | What it does |
---|---|---|---|---|

Cell Statistics | Local Toolset | Multiple rasters | Raster | Calculates the specified statistic for each cell based on multiple inputs |

Focal Statistics | Neighborhood Toolset | Raster | Raster | Summarizes the values for a raster within a defined neighborhood around each cell, and assigns the value to that cell in the output raster |

Point Statistics | Neighborhood Toolset | Point features | Raster | Summarizes values for point feature attributes within a defined neighborhood, and assigns values to cells in the output raster |

Line Statistics | Neighborhood Toolset | Line features | Raster | Summarizes values for line feature attributes within a defined neighborhood, and assigns values to cells in the output raster |

Zonal Statistics | Zonal Toolset | Raster, or polygon features | Raster or summary table | Summarizes values of a raster surface by categories or classes (zones) of the input raster or polygon dataset |