Managing intermediate data in shared models


Intermediate data is data that is created and deleted when a model tool is run. In the majority of cases, you do not have to be concerned about intermediate data, even with shared models. In some cases, however, model tools cannot create intermediate data and the model fails and the error messages you (or your user) receive may not point directly to the problem. For example, the message may contain "Unexpected error" when, in fact, the tool simply cannot write its output because the folder or geodatabase does not exist.

The simplest technique to ensure that your models can create and delete intermediate data is:

The steps are as follows:

  1. In the ArcToolbox window, right-click the model tool and click Edit. This opens ModelBuilder.
  2. In ModelBuilder, right-click on all intermediate data variables and choose Managed, as illustrated below. After choosing Managed, a check will appear next to Managed (just like Intermediate in the illustration below).
  3. Making intermediate data managed

  4. Save and close the model.
  5. In the ArcToolbox window, right-click the ArcToolbox entry and click Environments.
  6. In the Environment Settings dialog box, expand General Settings.
  7. In the Scratch Workspace setting, enter the pathname to a file geodatabase.
  8. Click OK.
  9. Execute your model by double-clicking the model tool in the ArcToolbox window, entering parameters, and clicking OK. If there are still errors, follow the above steps again, double-checking that you've made all intermediate data variables managed and that the scratch workspace is set properly. If there are still problems, then either:

If you are writing script tools and you need a location to write scratch data (the equivalent of intermediate data in models), click here to learn how to create and delete scratch data in scripts.

The remainder of this topic contains an in-depth discussion of how models manage intermediate data and may be of interest to you if you are creating and sharing many complex models.

How models manage intermediate data

You need to make sure that any model you share has a location to create and delete its intermediate data. The simplest method is to make all your intermediate data managed data, as described above.

The remainder of this topic contains an in-depth look at how geoprocessing manages intermediate data, and gives you the insight you need to troubleshoot problems. The sections below describe

How output pathnames are auto-generated

When you open a tool from ArcToolbox or inside ModelBuilder and provide input datasets, the location of the output data is automatically generated.

This auto-generated name is constructed using the following logic (details can be found in Specifying tool inputs and outputs):

The output data variable will contain the auto-generated name regardless of whether the variable eventually becomes intermediate data, managed data, or a tool parameter.

When you distribute your models, the recipient will surely have different settings for the scratch or current workspace, and they will want their environment settings to apply. That is, when they open and run the tool dialog for your model, they want all intermediate data to be written to their scratch workspace as set in their environments. This will occur as long as you don't alter the auto-generated name in your data variables, as described next.

The altered state of data variables

Whenever you modify the value of a variable within ModelBuilder, it is considered altered. Once a variable is altered, ArcGIS must assume that you want to use the altered value and will never again modify it. If the altered variable contains a pathname containing folders or workspaces that do not exist on another user's computer, the model will fail.

If the variable is an output dataset, and its value is empty or unaltered, geoprocessing tools will auto-generate a pathname. You want to take advantage of this fact and leave output dataset parameters unaltered so that geoprocessing will auto-generate a pathname for you.

In ModelBuilder, there is no way for you to determine if a data variable is considered altered, but you can reset the altered state of a variable by deleting (blanking out) the existing value and then validating the entire model. Validation will then see that the output value is blank and will auto-generate a new name for intermediate data, and then mark the data variable as unaltered. A better method, however, is to set the variable to Managed, as described next.

Using Managed data

You may choose to have ModelBuilder manage the location of intermediate data (using the logic described above). You can set a data variable to be managed by right-clicking the variable and clicking the Managed option. Once you've set a variable to Managed, you cannot change the output path within ModelBuilder (the parameter control will always be disabled). This means that Managed data cannot have its altered status changed and will have a new auto-generated pathname for the data each time the model executes.

Learn more about managed data

Using %scratchworkspace%

Non-system script tools may or may not provide an auto-generated output pathname. If they don't provide an auto-generated output pathname, you can use variable substitution in your output pathnames, as shown below.

Using variable substitution

Learn more about variable substitution

Learn more about providing an auto-generated output pathname for your script tools

The main issue with using variable substitution is that you rarely know if %scratchworkspace% will be a system folder or a geodatabase when the tool is executed. If, when you built your model in ModelBuilder, your scratch workspace was a shapefile workspace (a folder), ModelBuilder would have automatically appended ".shp" to the feature dataset name (that is, you entered "%scratchworkspace%/temp" and ModelBuilder automatically replaced it with "%scratchworkspace%/temp.shp"). At a later time, you change your scratch workspace to a file geodatabase then run the model tool from the ArcToolbox window. The model fails because it is trying to write "temp.shp" to the file geodatase, and geodatabases cannot contain special characters, such as the dot found in ".shp".

There are only two cases where you can safely predict the type of scratch workspace:

Both case are examined in more detail below.

ArcGIS Server scratch workspaces

When a server tool is executed on the server, ArcGIS Server creates a unique job folder for the tool to use. Inside this job folder is a folder named scratch, and within this folder is a file geodatabase named scratch.gdb, as shown below.

Intermediate and output data location

ArcGIS Server sets the application level scratch workspace environment to the location of this unique scratch folder. It does not change the tool, model, or model process level settings. When the server tool is run, the location of any intermediate or managed output data variable will be reset, unless

Learn more about environment levels

Since ArcGIS Server always creates this scratch folder with a scratch geodatabase, and sets the scratch workspace environment to the scratch folder, you can safely use variable substitution for all output pathnames, such as:

%scratchworkspace%/output_buffer.shp
%scratchworkspace%/scratch.gdb/outBuffer

Using the share folder structure

A structure for sharing tools described a recommended folder structure, called the ToolShare folder, shown below.

Recommended directory structure

This ToolShare folder structure works well for sharing tools, whether you are packaging and shipping, sharing on a LAN, or publishing to an ArcGIS Server.

Note that like the unique job folder created by ArcGIS Server, the ToolShare folder contains a scratch folder and a scratch.gdb. You can set up your models so that its intermediate data is always written to this scratch folder, as follows:

Using %scratchworkspace% in a model parameter will take the application level scratch workspace, not the model level scratch workspace, so you only want to use this technique for non-parameter data variables, such as intermediate data.

If you use this technique when sharing your toolbox across a LAN, any execution of your tools will write intermediate data to this scratch folder. Using the following configuration as an example:

Example configuration

Whether you want to use this technique for sharing across a LAN is up to you. The first consideration is whether you grant permissions to other users to write data to your shared folder. Secondly, writing data across a LAN is generally slower than writing to a local disk. It is preferable to use the scratch workspace environment set by the tool user. However, as noted above, you don't know if the user has set their scratch workspace to a folder or a geodatabase. Using this technique, you know the type of scratch workspace.

When publishing a toolbox to ArcGIS Server, you should never set the current and scratch workspace model environments. As noted above, ArcGIS Server will set the application level environments, but not the tool, model, or model process level environments. If you publish your toolbox to ArcGIS Server with a scratch (or current) workspace set at the model level, those settings will be used instead of the application level settings set by ArcGIS Server. Because the scratch workspace provided by ArcGIS Server is not in the same folder as the toolbox, relative pathnames (to the scratch folder) will not work—the folder won't exist and the model will fail.

Writing scratch data to the in_memory workspace

Geoprocessing provides an in-memory workspace where you can write features and tables.

Learn more about the in_memory workspace

Be very careful when using in-memory workspaces—you only want to write datasets that you know will be small to the in_memory workspace.

Scratch data in scripting

In your scripts, you often need to construct a location to write scratch data.

Learn more about creating scratch data in scripts

See Also