Kriging is similar to IDW in that it weights the surrounding measured values to derive a prediction for an unmeasured location. The general formula for both interpolators is formed as a weighted sum of the data:

Each pair of locations has a unique distance, and there are often many pairs of points. To plot all pairs quickly becomes unmanageable. Instead of plotting each pair, the pairs are grouped into lag bins. For example, compute the average semivariance for all pairs of points that are greater than 40 meters apart but less than 50 meters. The empirical semivariogram is a graph of the averaged semivariogram values on the y-axis and the distance (or lag) on the x-axis (see the following illustration).

Spatial autocorrelation quantifies a basic principle of geography: things that are closer are more alike than things farther apart. Thus, pairs of locations that are closer (far left on the x-axis of the semivariogram cloud) should have more similar values (low on the y-axis of the semivariogram cloud). As pairs of locations become farther apart (moving to the right on the x-axis of the semivariogram cloud), they should become more dissimilar and have a higher squared difference (move up on the y-axis of the semivariogram cloud).



The value at which the semivariogram model attains the range (the value on the y-axis) is called the sill. The partial sill is the sill minus the nugget (see the following section).
In IDW, the weight λi, depends solely on the distance to the prediction location. However, in Kriging the weights are based not only on the distance between the measured points and the prediction location, but also on the overall spatial arrangement among the measured points. To use the spatial arrangement in the weights, the spatial autocorrelation must be quantified. Thus, in ordinary Kriging, the weight, λi, depends on a fitted model to the measured points, the distance to the prediction location, and the spatial relationships among the measured values around the prediction location. The following sections will discuss how the general Kriging formula is used to create a map of the prediction surface and a map of the accuracy of the predictions.where:Z(si) = the measured value at the ith location.λi = an unknown weight for the measured value at the ith location.s0 = the prediction location.N = the number of measured values.
Creating a prediction surface map with Kriging
To make a prediction with the Kriging interpolation method, two tasks are necessary:- Uncover the dependency rules
- Make the predictions
- It creates the variograms and covariance functions to estimate the statistical dependence, called spatial autocorrelation, values that depend on the model of autocorrelation (fitting a model).
- It predicts the unknown values (making a prediction).
Variography
Fitting a model, or spatial modeling, is also known as structural analysis, or variography. When spatially modeling the structure of measured points, you begin with a graph of the empirical semivariogram, computed as:Semivariogram(distance h) = 0.5 * average[ (value at location i – value at location j)2]for all pairs of locations separated by distance h. The formula involves calculating the difference squared between the values of the paired locations. The image below shows the pairing of one point (the red point) with all other measured locations. This process continues for each measured point.

Each pair of locations has a unique distance, and there are often many pairs of points. To plot all pairs quickly becomes unmanageable. Instead of plotting each pair, the pairs are grouped into lag bins. For example, compute the average semivariance for all pairs of points that are greater than 40 meters apart but less than 50 meters. The empirical semivariogram is a graph of the averaged semivariogram values on the y-axis and the distance (or lag) on the x-axis (see the following illustration).

Spatial autocorrelation quantifies a basic principle of geography: things that are closer are more alike than things farther apart. Thus, pairs of locations that are closer (far left on the x-axis of the semivariogram cloud) should have more similar values (low on the y-axis of the semivariogram cloud). As pairs of locations become farther apart (moving to the right on the x-axis of the semivariogram cloud), they should become more dissimilar and have a higher squared difference (move up on the y-axis of the semivariogram cloud).
Fitting a model to the empirical semivariogram
The next step is to fit a model to the points forming the empirical semivariogram. Semivariogram modeling is a key step between spatial description and spatial prediction. The main application of Kriging is the prediction of attribute values at unsampled locations. You have seen how the empirical semivariogram provides information on the spatial autocorrelation of datasets. However, it does not provide information for all possible directions and distances. For this reason, and to ensure that kriging predictions have positive kriging variances, it is necessary to fit a model (that is, a continuous function or curve) to the empirical semivariogram. Abstractly, this is similar to regression analysis, where a continuous line or curve is fitted to the data points. To fit a model to the empirical semivariogram, select some function that serves as your model—for example, a spherical type that rises at first and then levels off for larger distances beyond a certain range (see below). There are deviations of the points on the empirical semivariogram from the model; some points are above the model curve, and some points are below. But if you add the distance each point is above the line and add the distance each point is below the line, the two values should be similar. There are many semivariogram models to choose from.Semivariogram models
ArcGIS Spatial Analyst provides the following functions from which to choose to model the empirical semivariogram:- Circular
- Spherical
- Exponential
- Gaussian
- Linear
The spherical model
This model shows a progressive decrease of spatial autocorrelation (equivalently, an increase of semivariance) until some distance, beyond which autocorrelation is zero. The spherical model is one of the most commonly used models. The following illustrates two common models and identify how the functions differ.
The exponential model
This model is applied when spatial autocorrelation decreases exponentially with increasing distance. Here the autocorrelation disappears completely only at an infinite distance. The exponential model is also a commonly used model. The choice of which model to use is based on the spatial autocorrelation of the data and on prior knowledge of the phenomenon.
Understanding a semivariogram—the range, sill, and nugget
As previously stated, the semivariogram depicts the spatial autocorrelation of the measured sample points. Because of a basic principle of geography (things that are closer are more alike), measured points that are close will generally have a smaller difference squared than those farther apart. Once each pair of locations is plotted (after being binned), a model is fit through them. Range, sill, and nugget are commonly used to describe these models.The range and sill
When you look at the model of a semivariogram, you will notice that at a certain distance, the model levels out. The distance where the model first flattens out is known as the range. Sample locations separated by distances closer than the range are spatially autocorrelated, whereas locations farther apart than the range are not.
The value at which the semivariogram model attains the range (the value on the y-axis) is called the sill. The partial sill is the sill minus the nugget (see the following section).
The nugget
Theoretically, at zero separation distance (for example, lag = 0), the semivariogram value is zero. However, at an infinitely small separation distance, the semivariogram often exhibits a nugget effect, which is some value greater than zero. If the semivariogram model intercepts the y-axis at 2, then the nugget is 2. The nugget effect can be attributed to measurement errors or spatial sources of variation at distances smaller than the sampling interval (or both). Measurement error occurs because of the error inherent in measuring devices. Natural phenomena can vary spatially over a range of scales. Variation at microscales smaller than the sampling distances will appear as part of the nugget effect. Before collecting data, it is important to gain some understanding of the scales of spatial variation that you are interested in.Making a prediction
After you have uncovered the dependence or autocorrelation in your data (see Variography), and finished with the first use of the data—using the spatial information in the data to compute distances and model the spatial autocorrelation—you can make a prediction using the fitted model. Thereafter, the empirical semivariogram is set aside. You now use the data again to make predictions. Like IDW interpolation, Kriging forms weights from surrounding measured values to predict unmeasured locations. As with IDW interpolation, the measured values closest to the unmeasured locations have the most influence. However, the Kriging weights for the surrounding measured points are more sophisticated than those of IDW. IDW uses a simple algorithm based on distance, but Kriging weights come from a semivariogram that was developed by looking at the spatial nature of the data. To create a continuous surface of the phenomenon, predictions are made for each location (cell centers) in the study area based on the semivariogram and the spatial arrangement of measured values that are nearby.Search radius
A basic principle of geography says that things that are close to one another are more alike than things farther away. Using this principle, you can establish your search radius or neighborhood by assuming that as the locations get farther from the prediction location, the measured values will have less spatial autocorrelation with the unknown value for the location you are predicting. Thus, you can eliminate those locations that are farther away with little influence. Not only is there less relationship with locations that are farther away, but it is possible that the locations that are farther away may have a negative influence if they are located in an area much different from the prediction location. Search radius controls computational speed. The smaller the search radius, the faster the predictions can be made. As a result, it is common practice to limit the number of points that are used when making a prediction by specifying a search neighborhood. The specified shape of the neighborhood restricts how far, and where to look for the measured values to be used in the prediction. Other neighborhood parameters restrict the point locations that will be used within that shape so, for example, you can define the maximum and minimum number of measured points to use within the neighborhood. Using the configuration of the valid points within the specified search radius around the prediction location in conjunction with the model fit to the semivariogram, you can determine the weights for the measured locations. From the weights and the values, you can make a prediction for the unknown value at the prediction location.ArcGIS Spatial Analyst neighborhood types
There are two neighborhood types in ArcGIS Spatial Analyst—fixed and variable- Fixed search radius requires a distance and a minimum number of points. The distance dictates the radius of the circle of the neighborhood (in map units). The distance of the radius is constant, so for each interpolated cell, the radius of the circle used to find input points is the same. The minimum number of points indicates the minimum number of measured points to use within the neighborhood. All the measured points that fall within the radius will be used in the calculation of each interpolated cell. When there are fewer measured points in the neighborhood than the specified minimum, the search radius will increase until it can encompass the minimum number of points. The specified fixed search radius will be used for each interpolated cell (cell center) in the study area; thus, if your measured points are not spread out equally, which they rarely are, there likely will be a different number of measured points used in the different neighborhoods for the various predictions.
- With a variable search radius, the number of points used in calculating the value of the interpolated cell is specified, which makes the radius distance vary for each interpolated cell, depending on how far it has to search around each interpolated cell to reach the specified number of input points. Thus, some neighborhoods can be small, and others can be large, depending on the density of the measured points near the interpolated cell. You can specify a maximum distance (in map units) that the search radius cannot exceed. If the radius for a particular neighborhood reaches the maximum before obtaining the specified number of points, the prediction for that location will be performed on the number of measured points within the maximum radius.
Kriging methods
There are two Kriging methods—ordinary and universal- Ordinary Kriging is the most general and widely used of the Kriging methods and is the default. It assumes the constant mean is unknown. This is a reasonable assumption unless there is some scientific reason to reject this assumption.
- Universal Kriging assumes that there is an overriding trend in the data (for example, a prevailing wind), and it can be modeled by a deterministic function, a polynomial. This polynomial is subtracted from the original measured points, and the autocorrelation is modeled from the random errors. Once the model is fit to the random errors and before making a prediction, the polynomial is added back to the predictions to give you meaningful results. Universal Kriging should only be used when you know there is a trend in your data and you can provide a scientific justification to describe it.
where:Z(si) = the measured value at the ith location.λi = an unknown weight for the measured value at the ith location.s0 = the prediction location.N = the number of measured values.