ArcPy Selection Functions
Contents
ArcPy Selection Functions#
An essential task in GIS is to select a subset of a layer or feature class that matches certain criteria.
The criteria could be based on the information carried within the data’s attributes, e.g., block groups with more than 1000 population, or based on locational traits, e.g., within 2 miles of any school.
In this section, we will take a look at several functions that are used to selecting (or querying) features.
1. Selection Query#
Structured Query Language (SQL) is a powerful language used to define one or more criteria that can consist of attributes, operators, and calculations.
Hint
SQL reads as S-Q-L or “sequel” by many professional database developers.
Whether you’re aware or not, as users of ArcGIS Pro, you must have already used it. Perhaps, in the form of the following approach.
Query expressions in ArcGIS adhere to standard SQL expressions. Accordingly, if we turn the “switch” to SQL, we will see the following equivalent expression written in SQL.
1.1 Writing SQL expression in ArcPy#
Without loss of generality, a single SQL expression can be written in the following pattern:
<field name> <logic operator> <value>
Rules to remember when writing SQL expression for ArcPy functions.
Query in ArcPy functions is defined using Python
str
.Field delimiter must be used to specify a field of an attribute table. For shapefile and feature class (in file geodatabase), the field delimiter is double quotes, i.e.,
"<fieldname>"
.Text values must always be enclosed by single quotes
'<some text>'
.
History of Geodatabases
Read this blog about a brief history of the geodatabase and why personal geodatabases are no longer supported in ArcGIS Pro. And, learn what is mobile geodatabase?
Since there are multiple rules to follow when writing SQL expressions,
we must use a combination of methods that we have learned
about defining Python strings, including " "
(double quotes), ' '
(single quotes),
""" """
(triple quotes), and \
(the escape character).
Now, let’s work on two examples using the zip_boundaries feature class. Specifically, we will write SQL expressions (in Python) to query against a field containing numeric values and another field containing text values.
numeric value:
POP2010 > 10000
, (zip codes have more than 10,000 people)text value:
PO_NAME = Gainesville
(zip codes in the City of Gainesville)
1.2 Expression with triple quotes#
Strings defined by “triple quotes” is commonly used to define SQL expressions in ArcPy because this form can accommodate both double quotes and single quotes.
# numeric value
print(""""POP2010" > 10000""")
"POP2010" > 10000
# text value
print(""""PO_NAME" = 'GAINESVILLE'""")
"PO_NAME" = 'GAINESVILLE'
1.3 Expression with escape character#
Remember that we can include double quote and single quote in string
by using the escape character (\
).
print("\"")
print("\'")
"
'
query_numeric = "\"POP2010\" > 10000"
print(query_numeric)
"POP2010" > 10000
query_text = "\"PO_NAME\" = \'GAINESVILLE\'"
print(query_text)
"PO_NAME" = 'GAINESVILLE'
1.4 Expression with .format
#
query_numeric = "{} > {}".format('"POP2010"', 10000)
print(query_numeric)
"POP2010" > 10000
query_text = "{} = {}".format('"PO_NAME"', "'GAINESVILLE'")
print(query_text)
"PO_NAME" = 'GAINESVILLE'
2. Define Compound Criteria#
Consider the following compound criteria.
population greater than 10,000 AND name equals to Gainesville
population greater than 10,000 OR name equals to Gainesville
We use AND
and OR
to connect two expressions together. Note that
all letters in both words are capitalized.
❓ Do you still remember how logical operators are written in Python?
Are they upper case or lower case?
query_comp = "{} > {} AND {} = {}".format('"POP2010"', 10000,
'"PO_NAME"', "'GAINESVILLE'")
print(query_comp)
"POP2010" > 10000 AND "PO_NAME" = 'GAINESVILLE'
query_comp = "{} > {} OR {} = {}".format('"POP2010"', 10000,
'"PO_NAME"', "'GAINESVILLE'")
print(query_comp)
"POP2010" > 10000 OR "PO_NAME" = 'GAINESVILLE'
3. Select
Function#
arcpy.analysis.Select
extracts features from an input feature class or input feature layer, typically
using a SQL expression, and stores
them in an output feature class, i.e., saves physically on the drive.
Now, let’s apply what we have learned about SQL expressions in this function.
import arcpy
gdb_worksp = r"..\data\class_data.gdb"
arcpy.env.workspace = gdb_worksp
zip_fc = "zip_boundaries"
pop_query = """"POP2010" > 10000"""
zip_output = "zip_q1_out"
arcpy.Select_analysis(zip_fc, zip_output, pop_query)
print("{} zip codes selected.".format(arcpy.GetCount_management(zip_output)))
13 zip codes selected.
city_query = "\"PO_NAME\" = \'GAINESVILLE\'"
zip_output = "zip_q2_out"
arcpy.analysis.Select(zip_fc, zip_output, city_query)
print("{} zip codes selected.".format(arcpy.GetCount_management(zip_output)))
12 zip codes selected.
comp_query = "{} > {} AND {} = {}".format('"POP2010"', 10000,
'"PO_NAME"', "'GAINESVILLE'")
zip_output = "zipbnd_q3_out"
arcpy.analysis.Select(zip_fc, zip_output, comp_query)
print("{} zip codes selected.".format(arcpy.GetCount_management(zip_output)))
8 zip codes selected.
4. Select by Attributes#
arcpy.management.SelectLayerByAttribute()
adds, updates, or removes a selection based on an attribute query.
zip_fc = "zip_boundaries"
pop_query = """"POP2010" > 10000"""
zip_lyr = arcpy.management.SelectLayerByAttribute(zip_fc,
"NEW_SELECTION",
pop_query)
print("{} records selected.".format(arcpy.GetCount_management(zip_lyr)))
13 records selected.
Selection on layers
Select by attribute and Select by location introduced later only “temporarily” select features from a specified feature class or layer,meaning they DO NOT physically save files on the hard disk. See creating and using layer selections.
It is a good idea to assign the output to a variable, just like the example above. Hence, we can reference that selection later in the script.
5. Select by Location#
arcpy.management.SelectLayerByLocation
selects features based on a spatial relationship to features in another dataset.
Each feature in the Input Features parameter is evaluated against the features in the Selecting Features parameter. If the specified Relationship parameter value is met, the input feature is selected.
See also
All spatial relationships supported by ArcGIS Pro.
blkgrp_fc = "blockgroups"
cntbnd_fc = "county_boundary"
blkgrp_lyr = arcpy.management.SelectLayerByLocation(
blkgrp_fc, "WITHIN", cntbnd_fc, "", "", ""
)
Note the difference between the two print statements below. The first one prints the number of features in the feature class. The second, on the other hand, prints the number of records get selected in the layer which references that same feature class.
print("{} features selected.".format(
arcpy.management.GetCount(blkgrp_fc))
)
178 features selected.
print("{} records selected.".format(
arcpy.management.GetCount(blkgrp_lyr))
)
142 records selected.
6. Save Layer Selection to a Feature Class#
There are two options to save a “temporary” selection to an output layer.
arcpy.conversion.FeatureClassToFeatureClass(<path>, <fc name>)
arcpy.management.CopyFeatures(<full path>)
It is critical to understand how to set output path and name in each of the method.
If path not set, the output feature class will be stored in current workspace.
To specify a path use one of the following methods:
use
"\\"
to concatenate path and nameuse
os.path.join()
functiondefine the full path name altogether
output_path = r"..\data\output_data.gdb"
output_name = "bg_within_cnt"
result = arcpy.conversion.FeatureClassToFeatureClass(
blkgrp_lyr, output_path, output_name
)
print("output is {}".format(result.getOutput(0)))
output is ..\data\output_data.gdb\bg_within_cnt
# path not specified
result = arcpy.management.CopyFeatures(blkgrp_lyr)
print("output is {}".format(result.getOutput(0)))
output is ..\data\class_data.gdb\blockgroups_Layer1_CopyFeatures
Note if path not specified, the output will have the name as the temporary layer which has a name automatically created by ArcGIS Pro.
See also
The Result
object is used to see the output locations.
To learn more about it, see here.
The output feature class can be defined using the following ways. Note that they result in the same value.
output_path + "\\" + output_name
'..\\data\\output_data.gdb\\bg_within_cnt'
import os
output_name = os.path.join(output_path, output_name)
output_name
'..\\data\\output_data.gdb\\bg_within_cnt'
output_fc = r'..data\output_data.gdb\bg_within_cnt'
output_fc
'..data\\output_data.gdb\\bg_within_cnt'