Performs GWR, a local form of linear regression used to model spatially varying relationships. Requires an ArcInfo, Spatial Analyst, or Geostatistical Analyst License.
Learn more about how Geographically Weighted regression works
Illustration

Performs GWR, a local form of linear regression used to model spatially varying relationships. Requires an ArcInfo, Spatial Analyst, or Geostatistical Analyst License.
Learn more about how Geographically Weighted regression works

This tool honors the Environment output coordinate system. However, feature geometry is projected to the output coordinate system after analysis is complete. Consequently, the value entered for the Distance parameter should be specified in the same units as the input feature class. Values entered for the Output Cell Size should be specified in the same units as the output coordinate system.
Using projected data is always recommended; it is especially important whenever distance is a component of your analysis, as it is for GWR when you select FIXED for Kernal Type. It is strongly recommended that you project your data using a Projected Coordinate System (rather than using a Geographic Coordinate System based on degrees, minutes, and seconds).
Some of the computations done by the GWR tool take advantage of multiple CPUs in order to increase performance, and will automatically use up to 8 threads/CPUs for processing.
GWR constructs a separate equation for every feature in the dataset incorporating the dependent and explanatory variables of features falling within the bandwidth of each target feature. The shape and extent of the bandwidth is dependent on user input for the Kernal Type, Bandwidth Method, Distance, and Number of Features parameters with one restriction: when the number of neighboring features would exceed 1000, only the closest 1000 are incorporated into each local equation.
In global regression models, such as OLS, results are unreliable when two or more variables exhibit multicollinearity (when two or more variables are redundant or together tell the same "story"). GWR builds a local regression equation for each feature in the dataset. When the values for a particular explanatory variable cluster spatially, you will very likely have problems with local multicolliearity. The condition number in the Output feature class indicates when results are unstable due to local multicollinearity. As a rule of thumb, do not trust results for features with a condition number larger than 30, equal to Null or, for shapefiles, equal to -1.7976931348623158e+308.
Caution should be used when including nominal/categorical data in a GWR model. Where categories cluster spatially, there is strong risk of encountering local collinearity issues. The condition number included in the GWR output indicates when local collinearity is a problem (a condition number less than zero, greater than 30, or set to Null). Results in the presence of local collinearity are unstable.
Don't use dummy explanatory variables to represent different spatial regimes in a GWR model. Because GWR allows explanatory variable coefficients to vary, these dummy variables are unnecessary, and if included, will create problems with local collinearity.
GWR should be applied to datasets with several hundred features for best results. It is not an appropriate method for small datasets. It will not work with multipoint data.
To better understand regional variation among the coefficients of your explanatory variables, examine the optional raster coefficient surfaces created by GWR. For polygon data, you can use graduated color or cold-to-hot rendering on each coefficient field in the Output Feature Class to observe changes across your study area.
If a prediction feature class is provided, but no prediction explanatory variables are specified, the output prediction feature class is created with computed coefficients for each location only (no predictions).
A regression model is misspecified if it is missing a key explanatory variable. Statistically significant spatial autocorrelation of the regression residuals and/or unexpected spatial variation among the coefficients of one or more explanatory variables suggests that your model is misspecified. You should make every effort (through OLS residual analysis and GWR coefficient variation analysis, for example) to discover what these key missing variables are so they may be included in the model.
Always question whether or not it makes sense for an explanatory variable to be non-stationary. For example, suppose you are modeling the density of a particular plant species as a function of several variables including ASPECT. If you find that the coefficient for the ASPECT variable changes across the study area, you are likely seeing evidence of a key missing explanatory variable (perhaps prevailence of competing vegetation, for example). You should make every effort to include all key explanatory variables in your regression model.
Whenever using shapefiles, keep in mind that they cannot store null values. Tools or other procedures that create shapefiles from non-shapefile inputs may, consequently, store null values as zero or as some very small negative number (-DBL_MAX = -1.7976931348623158e+308). This can lead to unexpected results.
When the result of a computation is infinity or undefined, the result for non-shapefiles will be Null; for shapefiles the result will be -DBL_MAX = -1.7976931348623158e+308.
Problems with local collinearity will prevent both the AIC and CV Bandwidth methods from resolving an optimal distance/number of neighbors. If you get an error indicating severe model design problems, try specifying a particular distance or neighbor count, then examining the condition numbers in the output feature class to see which features are associated with local collinearity.
Severe model design errors or errors indicating local equations do not include enough neighbors often indicate a problem with global or local collinearity. To determine where the problem is, run the model using OLS and examine the VIF value for each explanatory variable. If some of the VIF values are large (above 7.5, for example), global multicollinearity is preventing GWR from solving. More likely, however, local multicollinearity is the problem. Try creating a thematic map for each explanatory variable. If the map reveals spatial clustering of identical values, consider removing those variables from the model or combining those variables with other explanatory variables in order to increase value variation. If, for example, you are modeling home values and have variables for both bedrooms and bathrooms, you may want to combine these to increase value variation, or to represent them as bathroom/bedroom square footage. Avoid using dummy/binary variables, categorical/nominal variables, or variables with very few possible values when constructing GWR models.
GWR is a linear model subject to the same requirements as OLS. Review the section titled "How Regression Models Go Bad" in the Regression Analysis Basics document as a check that your GWR model is properly specified.
| Parameter | Explanation | Datatype |
|---|---|---|
| Input feature class (Required) |
The feature class containing the dependent and independent variables.
|
Feature Layer |
| Dependent variable (Required) |
The numeric field containing values for what you are trying to model.
|
Field |
| Explanatory variable(s) (Required) |
A list of fields representing independent explanatory variables in your regression model.
|
Field |
| Output feature class (Required) |
The output feature class to receive dependent variable estimates and residuals.
|
Feature Class |
| Kernal type (Optional) |
Specifies if the kernal is always fixed or if it is allowed to vary in extent as a function of feature density.
|
String |
| Bandwidth method (Optional) |
Specifies how the extent of the kernel should be determined: using the Akaike Information Criterion (AICc), using Cross Validation (CV) or allowing the user to specify either a fixed distance or fixed number of neighbors as a bandwidth parameter.
|
String |
| Distance (Optional) |
Specifies a fixed bandwidth extent whenever the user selects FIXED for kernel type and BANDWIDTH PARAMETER for the Bandwidth method. Enter this value using the units specified by the Environment output coordinate system.
|
Double |
| Number of neighbors (Optional) |
An integer reflecting the exact number of neighbors to include in the local bandwidth of the Gaussian kernel in cases where the user selects ADAPTIVE for kernel type and BANDWIDTH PARAMETER for Bandwidth method.
|
Integer |
| Weights (Required) |
The numeric field containing a spatial weighting for individual features. Primarily useful when the number of samples taken at different locations varies, values for the dependent and independent variables are averaged, and places with more samples are more reliable (should be weighted higher).
|
Field |
| Coefficient raster workspace (Optional) |
A full pathname to the workspace where all of the coefficient rasters will be created. When this workspace is provided, rasters are created for the intercept and every explanatory variable.
|
Workspace |
| Output cell size (Optional) |
The cell size (a number) or reference to the cell size (a pathname to a raster dataset) to use when creating the coefficient rasters. The default cell size is the shortest of the width or height of the extent specified in the Environment output coordinate system, divided by 250.
|
Analysis Cell Size |
| Prediction locations (Required) |
A feature class containing features representing locations where estimates should be computed. Each feature in this dataset should contain values for all of the explanatory variables specified; the dependent variable for these features will be estimated using the model calibrated for the input feature class data.
|
Feature Layer |
| Prediction explanatory variable(s) (Required) |
A list of fields representing explanatory variables in the Prediction Locations feature class. These field names should be provided in the same order (a one to one correspondance) as those listed for the input feature class Explanatory variables parameter. If no prediction explanatory variables are given, the output prediction feature class will only contain computed coefficient values for each prediction location.
|
Field |
| Output prediction feature class (Required) |
The output feature class to receive dependent variable estimates for each feature in the Prediction locations feature class.
|
Feature Class |
# Model 911 emergency calls using GWR
# Import system modules
import arcgisscripting
# Create the Geoprocessor object
gp = arcgisscripting.create(9.3)
gp.OverwriteOutput = 1
# Local variables...
workspace = "C:\Data\CallAnalysis"
try:
# Set the current workspace (to avoid having to specify the full path to the feature classes each time)
gp.workspace = workspace
# 911 Calls as a function of {number of businesses, number of rental units,
# number of adults who didn't finish high school}
# Process: Geographically Weighted Regression...
gwr = gp.GeographicallyWeightedRegression("CallData", "Calls",
"BUS_COUNT;RENTROCC00;NoHSDip",
"CallsGWR.shp", "ADAPTIVE", "BANDWIDTH PARAMETER","#", "25", "#",
"CoefRasters", "135", "PredictionPoints", "#", "GWRCallPredictions.shp")
# Create Spatial Weights Matrix to use with Global Moran's I tool
# Process: Generate Spatial Weights Matrix...
swm = gp.GenerateSpatialWeightsMatrix("CallsGWR.shp", "UniqID",
"CallData25Neighs.swm",
"K_NEAREST_NEIGHBORS",
"#", "#", "#", 25)
# Calculate Moran's Index of Spatial Autocorrelation for
# OLS Residuals using a SWM File.
# Process: Spatial Autocorrelation (Morans I)...
moransI = gp.SpatialAutocorrelation("CallsGWR.shp", "StdResid",
"false", "Get Spatial Weights From File",
"Euclidean Distance", "None", "#",
"CallData25Neighs.swm")
except:
# If an error occurred when running the tool, print out the error message.
print gp.GetMessages()