Batch processing means doing a number of processes sequentially or simultaneously to save time and effort. If you have a proper data-set in a proper machine, batch processing lets you organize your tasks and achieve number of goals in one click. In this post I will go through few batch processing techniques you will find in ArcGIS.

In ArcGIS, doing things in batch can be done in a number of ways. The most common practice is to use the python scripting modules. Python is famous not only for it’s ease of code, but also it’s simplicity and power of data handling and garbage management system. ArcGIS also offers a very intuitive Model Maker tool to schematically simplify various geoprocessioning operations. Besides these two, each tool in ArcGIS has a ‘batch’ option in their option menu.

The Python way

ArcGIS let you use its own python modules and scripts to process a number of tasks. You can use both IDLE and ‘command line’ version of python coding environment. In ArcGIS there are a number of site package, arcpy is one of them. A module is a python file that generally includes functions and classes. ArcPy is supported by a series of modules, including a mapping module (arcpy.mapping), a Spatial Analyst module (arcpy.sa), and a Geostatistical Analyst module (arcpy.ga). ArcPy itself builds on (and is a successor to) the successful arcgisscripting module. Its goal is to create the cornerstone for a useful and productive way to perform geographic data analysis, data conversion, data management, and map automation with Python.

Besides arcpy, you have python’s core modules like os, sys and also many third party modules and packages (ie. matplotlib, scipy, numpy, xlrd etc)

Doing an in-depth tutorial of Python will not happen in this post, please try these links to learn the basics of Python.

When you install a fresh copy of ArcGIS software, you install Python as a separate folder on your computer. You will find a link of IDLE, Python command line and other shortcuts in your Start menu. Besides these, ArcGIS also let you to access python directly from its main software window. In ArcGIS, you can also use Python from Field Calculator and custom Toolboxes. ArcGIS 10.0 typically installs Python 2.6 to C:\Python26\ArcGIS10.0\ and 10.1 installs Python 2.7 to C:\Python27\ArcGIS10.1\; you can use these paths to access Python from your favorite text editor or IDE. The best way to set the path for your system is,

set Path=C:\Python27\ArcGIS10.1;%Path%

Initially you will have 32bit version of Python, unless you install 64bit background geoprocessing with it. To allow other Python installations to access Arcpy a file must be copied from the \Lib\site-packages\ folder within the ArcGIS Python installation and placed in the corresponding folder of the non-ArcGIS Python. If you have not installed 64-bit background geoprocessing the file is Desktop10.1.pth; if you have installed it, the file is DTBGGP64.pth.

Here is post describing all sort of tweaking of Python installation and runtime environment.

Coding with arcpy

Arcpy provides access to geoprocessing tools as well as additional functions, classes, and modules that allow you to create simple or complex workflows. Broadly speaking, arcpy is organized in tools, functions, classes, and modules. The arcpy site-package is built closely upon Python 2.6 and requires this version (or later) to be successfully imported.

Consider that there are also no-tools function available within arcpy. Here are some arcpy vocabularies.

Below is an example of Python script running in a Python window. The script runs the Buffer tool of 500 meter over a shape file called “river” and produces “river_500m_buffer”. In the Python window, your code is followed by >>> sign.

>>> import arcpy
>>> results = arcpy.Buffer_analysis("rivers", "rivers_500m_buffer", "500 METERS")
>>> print result
C:/[Default locatio]/rivers_500m_buffer

Return the number of features

>>> result = arcpy.GetCount_management("streets_50m_of_rivers")
>>> print result.getOutput(0)

Play some more.

>>> # Return a list of default spatial grid indexes for a feature class.
>>> result = arcpy.CalculateDefaultGridIndex_management("streets_50m_of_rivers")
>>> for i in range(0, result.outputCount):
>>>   print result.getOutput(i)
 
>>> # Setting a cell size of a raster
>>> if arcpy.env.cellSize != 30:
>>>   arcpy.env.cellSize = 30
  
>>> # Let's do something more with Buffer tool
>>> import arcpy
>>> arcpy.env.workspace = 'C:/some_location/example.gdb'
 
>>> layer = 'City_Trails'
>>> distances = ['100 meters', '200 meters', '400 meters']
 
>>> for dist in distances:
>>>     output = layer + '_' + dist.replace(' ', '_')
>>>     arcpy.Buffer_analysis(layer, output, dist)

arcpy.mapping is a package within arcpy usually used to manipulate the contents of existing map documents (.mxd) and layer files (.lyr) and also be used to automate map production. The software resource center has a tutorial page dedicated for arcpy.mapping package.

Spatial Analyst (arcpy.sa) module gives you the access to Spatial Analyst functionalities, including tools, operators, functions, and classes, is to import from the sa module. Using thie import method it is possible to access Spatial Analyst functionality without providing a name space and imports overloaded operators, which allows rasters to be used with operators. Before using this, you will need to check out the Spatial Analyst license before running a tool.

Here is a typical example of automation in one block.

arcpy.env.workspace = ”C:/inputs”
for fc in arcpy.ListFeatureClasses():
 outfc = arcpy.Describe(fc).basename + “_Dissolved”
 arcpy.Dissolve_management(fc, outfc) 

In the above code, the first line sets the input workspace. The second line loops through all the feature classes in the database. The third line gives a trailing ‘_Dissolved’ name of each output feature classes that is created in line four.

import arcpy.mapping
mxd = mapping.MapDocument("c:/[document location]/MapDocument.mxd")
lstBrokenDS = mapping.ListBrokenDataSources(mxd)
for layer in lstBrokenDS:
 print layer.name

Above block lists all the broken data sources in an mxd. To find and replace the missing sources we will use the following code

import arcpy.mapping
mxd = mapping.MapDocument("c:/[document location]/MapDocument.mxd")
mxd.findAndReplaceWorkspacePaths("C:/[document location]/MapDocument.mxd", "C:/[New document location]/MapDocument.mxd")

Just like findAndReplaceWorkspacePaths(), you can use replaceWorkspaces() to replace workspace, and replaceDataSource() to replace individual layers and table sources.

import arcpy.mapping
mxd = mapping.MapDocument("c:/[Document_location]/MapDocument.mxd")

# Get reference dataframe
df = mapping.ListDataFrames(mxd,"dataframe")[0]

# Find the layer
lyr = mapping.ListLayers(mxd,"theLayer",df)[0]

# Replace datasource as shapefile
lyr.replaceDataSource("c:/[Document_location]","SHAPEFILE_WORKSPACE","theLayer")

# Save a copy of the new document
mxd.saveACopy("c:/[Document_location]/newMapDocument.mxd")

Here we have replaced only the first (the 0 index) layer from the array. To find all the broken datasources,

import arcpy.mapping 
import os

path = "C:/[Document_location]"

# Store the names of broken datasources
f = open('BrokenDataList.txt','w')

# Use os.walk() to walk through the datasources
for root,dirs,files in os.walk(path):
	# For each file, use the os.path.splitext() method to obtain the base file name and extension
	for filename in files:
		basename, extension = os.path.splitext(filename)
		# Test the file extension, get the full path to the map document file, create a new map 
		# document object instance using the path, write the map document name, loop through 
		# each of the broken data sources, and write to a file
		if extension == ".mxd":
			fullPath = os.path.join(path,filename)
			mxd = mapping.MapDocument(fullPath)
			f.write("MXD: " + filename + "\n")
			brknList = mapping.ListBrokenDataSources(mxd)
			for brknItem in brknList:
				f.write("\t" + brknItem.name + "\n")
f.close()

You can fix them by hand, or use replaceDataSource() to do it.