Use NumPy in ArcGIS
Contents
Use NumPy in ArcGIS#
NumPy is a numeric computing library of Python. It is a fundamental package of the Scientific Python Ecosystem. It provides fast operation on n-dimensional (n-d) arrays. It includes functions for linear algebra and comprehensive mathematical operations.
Other packages in the Scientific Python Ecosystem.
pandas: Python’s “excelsheet”
SciPy: optimization, interpolation, mathematical algorithms
matplotlib: plotting graphs and interactive visualization
Seaborn: based on
matplotlib
with more graph types and stylesscikit-learn: regression, classification, clustering, machine learning
NetworkX: create, manipulate complex networks (graphs).
statsmodels: estimation statistical models, conducting statistical tests.
Same as before, when dealing with a new package, we need to learn how to import it. Note that please follow the convention (as below) to import the package.
import numpy as np
1. Create a numpy.ndarray
#
numpy.ndarray
is a fundamental data structure in the NumPy library for Python,
which is used to represent homogeneous multidimensional arrays of data.
It is a collection of elements of the same data type that can be indexed
and manipulated efficiently.
1.1 From a Python list#
We can convert a list
to np.ndarray
using np.array()
.
my_list = [1, 2, 3, 4]
my_list
[1, 2, 3, 4]
Tip
Recall that we can also create a list
using the range
object.
my_list = list(range(1, 5))
np.array(my_list)
array([1, 2, 3, 4])
type(np.array(my_list))
numpy.ndarray
Make sure the list is homogeneous
If a list is not homogeneous, NumPy will try to cast all the elements to the same data type. This may result in unexpected behavior or errors if the elements cannot be converted to a common data type. For example, if a list contains both strings and integers, NumPy may cast all the elements to strings, resulting in a NumPy array of dtype ‘U’.
In general, it is recommended to use NumPy arrays with homogeneous data types to take full advantage of the performance benefits and functionality provided by the library.
mix_list = [1, 2, 3, 'four', 'five']
np.array(mix_list)
array(['1', '2', '3', 'four', 'five'], dtype='<U11')
1.2 Use np.arange()
#
Returns an array of evenly spaced values within a specified interval.
The interval is defined by a start value, a stop value, and a step size.
Think of it as a NumPy equivalent of the range
object.
np.arange([start,] stop[, step,], dtype=None)
np.arange(1, 5)
array([1, 2, 3, 4])
As shown in the syntax, if start
is omitted, the function will use a default
“start” of 0
. Therefore, the two statements below are equivalent.
print(np.arange(10))
print(np.arange(0, 10))
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
Similar to the step
argument in range
, step
here is an optional argument
that specifies the step size (increment/decrement) between each value in
the interval. If step is not provided, the default value of 1
is used.
np.arange(0, 10, 2)
array([0, 2, 4, 6, 8])
1.3 Use np.linspace()
#
Returns an array of evenly spaced values within a specified interval, and allows you
to specify the number (num
) of items in the interval.
np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=None)
np.linspace(1, 5, 5) # start, stop, num
array([1., 2., 3., 4., 5.])
The “spacing” is calculated by (stop - start) / (num - 1)
. In the example,
(5 - 1) / (5 - 1)
equals to 1
, hence the result.
Let’s see another example in which we create an array with spacing equals to 0.5
.
np.linspace(0, 5, 11) # (5-0)/(11-1)
array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])
np.linspace(0, 5, 21)
array([0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. , 2.25, 2.5 ,
2.75, 3. , 3.25, 3.5 , 3.75, 4. , 4.25, 4.5 , 4.75, 5. ])
NumPy is very powerful and flexible in generating array of numbers. Imagine how much more codes you need if creating such an array or list using “loops”.
The default value for argument num
is 50 and True
for endpoint
, meaning the
end is included by default.
np.linspace(1, 50)
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13.,
14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26.,
27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38., 39.,
40., 41., 42., 43., 44., 45., 46., 47., 48., 49., 50.])
The len
function also works for numpy arrays. len(arr)
returns
the length of the first dimension of the array.
For a 1-dimensional array, the length returned by len is equal to the
size of the first dimension of the array.
len(np.linspace(1, 50))
50
2. Indexing and slicing#
Indexing and slicing in NumPy arrays are similar to those in Python lists, but with additional features that make them more powerful and flexible.
index starts with “0” as usual
a negative index start count from the end of the array
use the colon to specify the start and end for slicing
slice NumPy array with “step” is also allowed
my_arr = np.arange(10)
my_arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
print(my_arr[4])
print(my_arr[-2])
4
8
my_arr[1:6] # slicing
array([1, 2, 3, 4, 5])
my_arr[1:6:2] # slicing with step
array([1, 3, 5])
my_arr[:] # omit both ends
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
my_arr[::-1] # reverse the order
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
Fancy indexing
You can use a list or an array of indices to access specific
elements of an array. The result will be another np.ndarray
.
This is another demo of NumPy’s power in manipulating arrays.
Consider how you would get this without “fancy indexing.”
my_arr[[1, 3, 5]] # to access elements at index 1, 3, and 5.
array([1, 3, 5])
3. Generate random numbers with NumPy#
The behavior of numpy.random.randint
is different from the built-in random.randint
function in Python when it comes to the inclusion of the upper bound.
random.randint
:[low, high]
, i.e., high is included.np.random.randint
:[low, high)
, i.e., high is excluded. This is also called “half open.”
np.random.randint(2, 10)
8
If high is None
(the default), then results are from [0, low)
.
np.random.randint(10)
5
You can repeatedly generate any number of integers within such an interval. The resulting random integers follows a “discrete uniform” distribution, meaning any integer within that range has equal probability to be drawn each time.
np.random.randint(0, 10, 20)
array([9, 2, 4, 6, 0, 0, 6, 5, 4, 9, 6, 7, 9, 3, 5, 7, 3, 8, 9, 6])
Another popular function for many experimental testing is to randomly generate
floating numbers in [0, 1), which can be done by np.random.rand
.
The function allows to specify the shape of the numpy array.
np.random.rand(3, 2) # an array of size 6 and shape 3 (rows) by 2 (columns)
array([[0.20191371, 0.30418614],
[0.78891467, 0.41619982],
[0.09169838, 0.12678858]])
Two useful methods of np.ndarray
is shape
and size
. shape
returns a
tuple
consists of number of elements on each dimension.
size
is the total number of elements regardless of dimension.
arr = np.random.rand(6, 3)
print(arr)
print(f"The size of arr: {arr.size}.")
print(f"The shape of arr: {arr.shape}.")
[[0.27431737 0.51009214 0.8001273 ]
[0.7378156 0.72025623 0.16373974]
[0.77804814 0.76287854 0.38625261]
[0.82533601 0.63923246 0.41923371]
[0.83303691 0.11121103 0.48612382]
[0.3593643 0.52778161 0.04028845]]
The size of arr: 18.
The shape of arr: (6, 3).
Note that len
returns the element of first dimension only. So, except for 1-d array,
len
and size
will not return the same value.
print(len(arr))
print(arr.size)
6
18
4. Work with NumPy in ArcGIS#
import arcpy
gdb_worksp = r"../data/class_data.gdb"
arcpy.env.workspace = gdb_worksp
arcpy.ListFeatureClasses()
['county_boundary',
'hospitals',
'schools',
'I75',
'roads',
'law_enforcement',
'major_highways',
'zip_boundaries',
'major_roads',
'landuse',
'crash',
'blockgroups',
'I75_2mi_buff',
'schools_2mile_I75',
'blockgroups_school_spjoin',
'zip_q1_out',
'zip_q2_out',
'zipbnd_q3_out',
'blockgroups_Layer1_CopyFeatures']
Set to work with the schools
feature class.
school_fc = "schools"
4.1 Use arcpy.ListFields()
to access fields#
A field object represents a column in a table. A field has many properties, the most obvious ones being its name and its type.
An example of field object
fields = arcpy.ListFields(feature_class)
# Iterate through the list of fields
for field in fields:
# Print field properties
print("Field: {0}".format(field.name))
print("Alias: {0}".format(field.aliasName))
print("Type: {0}".format(field.type))
print("Is Editable: {0}".format(field.editable))
print("Required: {0}".format(field.required))
print("Scale: {0}".format(field.scale))
print("Precision: {0}".format(field.precision))
school_fields = []
for field in arcpy.ListFields(school_fc):
school_fields.append(field.name)
print(school_fields)
['OBJECTID_1', 'Shape', 'OBJECTID', 'STATUS', 'SCORE', 'SIDE', 'MATCH_ADDR', 'FEDERAL_ID', 'STATE_ID', 'SCHOOL_ID', 'NAME', 'ADDRESS', 'CITY', 'ZIPCODE', 'PHONE', 'COUNTY', 'OPERATING', 'OP_CLASS', 'ENROLLMENT', 'PROGRAMS', 'COMMON_USE', 'USE', 'TYPE', 'ACTIVITY', 'GRADES', 'LOW_GRADE', 'HIGH_GRADE', 'PRINCIPAL', 'TEACHERS', 'STDTCH_RT', 'MIGRNT_STD', 'TITLE1SCHO', 'MAGNETINFO', 'FREE_LUNCH', 'REDUCED_LU', 'FISH_FAC1', 'FISH_FAC2', 'COMMENTS', 'BBSERVICE', 'BBPROVIDER', 'BBSPEED', 'DSTREAMSPD', 'YR_BUILT', 'PARCEL_ID', 'LAT_DD', 'LONG_DD', 'USNG_FL_1K', 'FDOE_MSID', 'NCES_PUB', 'NCES_PRIV', 'FDOE_PRV', 'SOURCE', 'DESCRIPT', 'FLAG', 'UPDATE_DAY', 'FGDLAQDATE', 'AUTOID']
We can use the join
function and the “new line” escape sequence, i.e., "\n"
to print individual field names on separate lines.
print('\n'.join(school_fields))
OBJECTID_1
Shape
OBJECTID
STATUS
SCORE
SIDE
MATCH_ADDR
FEDERAL_ID
STATE_ID
SCHOOL_ID
NAME
ADDRESS
CITY
ZIPCODE
PHONE
COUNTY
OPERATING
OP_CLASS
ENROLLMENT
PROGRAMS
COMMON_USE
USE
TYPE
ACTIVITY
GRADES
LOW_GRADE
HIGH_GRADE
PRINCIPAL
TEACHERS
STDTCH_RT
MIGRNT_STD
TITLE1SCHO
MAGNETINFO
FREE_LUNCH
REDUCED_LU
FISH_FAC1
FISH_FAC2
COMMENTS
BBSERVICE
BBPROVIDER
BBSPEED
DSTREAMSPD
YR_BUILT
PARCEL_ID
LAT_DD
LONG_DD
USNG_FL_1K
FDOE_MSID
NCES_PUB
NCES_PRIV
FDOE_PRV
SOURCE
DESCRIPT
FLAG
UPDATE_DAY
FGDLAQDATE
AUTOID
4.2 Convert a feature class to NumPy array#
arcpy.da.FeatureClassToNumPyArray
converts a feature class to a Structured Array. A NumPy structured array is a special kind of array that allows you to store and manipulate heterogeneous data (i.e., data of different types) in a tabular format.
Each element in a structured array is a tuple
, where each field
of the tuple can be of a different data type.
x = np.array([('Rex', 9, 81.0), ('Fido', 3, 27.0)],
dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
Data access module
The “da” in arcpy.da
stands for Data Access. See a list of functions
of this module .
The syntax of this function is the following.
FeatureClassToNumPyArray(in_table, field_names, {where_clause}, {spatial_reference},
{explode_to_points}, {skip_nulls}, {null_value})
Specify a list of fields that are desired to be included in the returned array.
school_arr = arcpy.da.FeatureClassToNumPyArray(
school_fc, ["NAME", 'OP_CLASS', 'ENROLLMENT', 'TYPE', 'TEACHERS']
)
school_arr
array([('GRACE CHRISTIAN SCHOOL OF ALACHUA CO.', 'PRIVATE', 0., 'SENIOR HIGH', 0. ),
('FAMILY LIFE ACADEMY', 'PRIVATE', 0., 'COMBINATION ELEMENTARY & SECONDARY', 0. ),
('FOREST GROVE CHRISTIAN ACADEMY', 'PRIVATE', 53., 'COMBINATION ELEMENTARY & SECONDARY', 9.4),
('VAISHNAVA ACADEMY FOR GIRLS', 'PRIVATE', 19., 'COMBINATION JR. HIGH & SENIOR HIGH', 0. ),
('BHAKTIVEDANTA ACADEMY', 'PRIVATE', 0., 'COMBINATION ELEMENTARY & MIDDLE', 0. ),
('DESTINY CHRISTIAN ACADEMY', 'PRIVATE', 0., 'COMBINATION ELEMENTARY & MIDDLE', 0. ),
('INCAF MONTESSORI SCHOOL', 'PRIVATE', 0., 'ELEMENTARY', 0. ),
('GREAT AMERICAN VISIONS ENTERPRISES,INC', 'PRIVATE', 0., 'COMBINATION ELEMENTARY & MIDDLE', 0. ),
('JORDAN GLEN SCHOOL INC.', 'PRIVATE', 115., 'COMBINATION ELEMENTARY & MIDDLE', 14.5),
('QUEEN OF PEACE CATHOLIC ACADEMY', 'PRIVATE', 358., 'COMBINATION ELEMENTARY & MIDDLE', 28.4),
('THE ROCK SCHOOL', 'PRIVATE', 207., 'COMBINATION ELEMENTARY & SECONDARY', 17.9),
('CHRISTIAN LIFE ACADEMY', 'PRIVATE', 52., 'COMBINATION ELEMENTARY & SECONDARY', 5.9),
('SAINT FRANCIS CATHOLIC HIGH SCHOOL', 'PRIVATE', 260., 'SENIOR HIGH', 19.8),
('COUNTRYSIDE CHRISTIAN SCHOOL', 'PRIVATE', 106., 'COMBINATION ELEMENTARY & SECONDARY', 6.5),
('TRILOGY SCHOOL OF LEARNING ALTERNATIVE', 'PRIVATE', 84., 'COMBINATION ELEMENTARY & SECONDARY', 11.5),
('MILLHOPPER MONTESSORI SCHOOL', 'PRIVATE', 209., 'COMBINATION ELEMENTARY & MIDDLE', 19.1),
('BNAI ISRAEL DAY SCHOOL', 'PRIVATE', 22., 'ELEMENTARY', 0. ),
('GAINESVILLE CONDUCTIVE EDUCATION ACADEMY', 'PRIVATE', 0., 'COMBINATION ELEMENTARY & SECONDARY', 0. ),
('BRENTWOOD SCHOOL', 'PRIVATE', 246., 'ELEMENTARY', 10.8),
('CORNERSTONE ACADEMY', 'PRIVATE', 216., 'COMBINATION ELEMENTARY & SECONDARY', 24.8),
('WESTWOOD HILLS CHRISTIAN SCHOOL', 'PRIVATE', 264., 'COMBINATION ELEMENTARY & SECONDARY', 12.6),
('FLOWERS MONTESSORI SCHOOL', 'PRIVATE', 45., 'PRE-KINDERGARTEN-KINDERGARTEN', 6.1),
('Z.L. SUNG S.D.A. SCHOOL', 'PRIVATE', 19., 'COMBINATION ELEMENTARY & MIDDLE', 2. ),
('GAINESVILLE CONDUCTIVE EDUCATION ACADEMY', 'PRIVATE', 0., 'COMBINATION ELEMENTARY & MIDDLE', 0. ),
('KIDS N ALL CHRISTIAN ACADEMY', 'PRIVATE', 0., 'ELEMENTARY', 0. ),
('OAK HALL SCHOOL', 'PRIVATE', 753., 'COMBINATION ELEMENTARY & SECONDARY', 63.6),
('FREEDOM CHRISTIAN ACADEMY', 'PRIVATE', 0., 'COMBINATION ELEMENTARY & MIDDLE', 0. ),
('GAINESVILLE COUNTRY DAY SCHOOL', 'PRIVATE', 211., 'ELEMENTARY', 39.5),
('FAITH TABERNACLE OF PRAISE SCHOOL OF MINISTRY', 'PUBLIC', 0., 'COLLEGE/UNIVERSITY', 0. ),
("THE CHILDREN'S CENTER", 'PRIVATE', 55., 'ELEMENTARY', 0. ),
('CITY COLLEGE', 'PRIVATE', 0., 'COLLEGE/UNIVERSITY', 0. ),
('SAINT PATRICK INTERPARISH SCHOOL', 'PRIVATE', 336., 'COMBINATION ELEMENTARY & MIDDLE', 17.5),
('STAR CHRISTIAN CENTER AND ACADEMY', 'PRIVATE', 43., 'COMBINATION ELEMENTARY & MIDDLE', 0. ),
('COMPASSIONATE OUTREACH MINISTRIES', 'PRIVATE', 0., 'ELEMENTARY', 0. ),
('WINDSOR CHRISTIAN ACADEMY', 'PRIVATE', 32., 'COMBINATION ELEMENTARY & SECONDARY', 2.1),
('OAK HILL COMMUNITY PRIVATE SCHOOL SYSTEM', 'PRIVATE', 0., 'COMBINATION ELEMENTARY & MIDDLE', 0. ),
('NORTH FLORIDA SDA ELEMENTARY', 'PRIVATE', 16., 'COMBINATION ELEMENTARY & SECONDARY', 1. ),
('MICANOPY AREA COOPERATIVE SCHOOL, INC.', 'PUBLIC', 116., 'ELEMENTARY', 9. ),
('ARCHER COMMUNITY SCHOOL', 'PUBLIC', 443., 'ELEMENTARY', 31.5),
('SANTA FE COLLEGE - DAVIS CENTER', 'PUBLIC', 0., 'COLLEGE/UNIVERSITY', 0. ),
('MICANOPY MIDDLE SCHOOL, INC.', 'PUBLIC', 76., 'MIDDLE/JR. HIGH', 4. ),
('OAK VIEW MIDDLE SCHOOL', 'PUBLIC', 583., 'COMBINATION ELEMENTARY & MIDDLE', 34. ),
('NEWBERRY ELEMENTARY SCHOOL', 'PUBLIC', 520., 'ELEMENTARY', 37. ),
('NEWBERRY HIGH SCHOOL', 'PUBLIC', 596., 'SENIOR HIGH', 32. ),
('HIGH SPRINGS COMMUNITY - ELEMENTARY AND MIDDLE SCHOOL', 'PUBLIC', 938., 'COMBINATION ELEMENTARY & MIDDLE', 54. ),
('UNIVERSITY OF FLORIDA - COMPARITIVE MEDICINE', 'PUBLIC', 0., 'COLLEGE/UNIVERSITY', 0. ),
('NORTH AMERICAN FAMILY INSTITUTE ALACHUA ACADEMY', 'PUBLIC', 0., 'COMBINATION JR. HIGH & SENIOR HIGH', 0. ),
('PACE', 'PUBLIC', 39., 'COMBINATION JR. HIGH & SENIOR HIGH', 0. ),
('HOSPITAL HOMEBOUND', 'PUBLIC', 24., 'COMBINATION ELEMENTARY & SECONDARY', 0. ),
('SWEETWATER BRANCH ACADEMY', 'PUBLIC', 132., 'ELEMENTARY', 9. ),
('PROFESSIONAL ACADEMY MAGNET AT LOFTEN HIGH SCHOOL', 'PUBLIC', 259., 'COMBINATION JR. HIGH & SENIOR HIGH', 22. ),
('SANTA FE HIGH SCHOOL', 'PUBLIC', 1129., 'SENIOR HIGH', 49. ),
('MEBANE MIDDLE SCHOOL', 'PUBLIC', 453., 'MIDDLE/JR. HIGH', 27. ),
('ALACHUA LEARNING CENTER - PUBLIC CHARTER SCHOOL', 'PUBLIC', 167., 'ELEMENTARY', 9. ),
('IRBY ELEMENTARY SCHOOL', 'PUBLIC', 521., 'ELEMENTARY', 37. ),
('ALACHUA ELEMENTARY SCHOOL', 'PUBLIC', 446., 'ELEMENTARY', 34. ),
('UNIVERSITY OF FLORIDA', 'PUBLIC', 0., 'COLLEGE/UNIVERSITY', 0. ),
('UNIVERSITY OF FLORIDA - AGRONOMY LAB', 'PUBLIC', 0., 'COLLEGE/UNIVERSITY', 0. ),
('SANTA FE COLLEGE - NORTHWEST CAMPUS', 'PUBLIC', 0., 'COLLEGE/UNIVERSITY', 0. ),
('BUCHHOLZ HIGH SCHOOL', 'PUBLIC', 2221., 'SENIOR HIGH', 100. ),
('FINLEY ELEMENTARY SCHOOL', 'PUBLIC', 439., 'ELEMENTARY', 34. ),
('P.K. YONGE DEVELOPMENTAL RESEARCH SCHOOL', 'PUBLIC', 1139., 'COMBINATION ELEMENTARY & SECONDARY', 122. ),
('CHILES ELEMENTARY SCHOOL', 'PUBLIC', 711., 'ELEMENTARY', 52. ),
('FT CLARKE MIDDLE SCHOOL', 'PUBLIC', 801., 'MIDDLE/JR. HIGH', 47. ),
('HIDDEN OAK ELEMENTARY SCHOOL', 'PUBLIC', 836., 'ELEMENTARY', 56. ),
('TERWILLIGER ELEMENTARY SCHOOL', 'PUBLIC', 711., 'ELEMENTARY', 46. ),
('TALBOT ELEMENTARY SCHOOL', 'PUBLIC', 731., 'ELEMENTARY', 50. ),
('NORTON ELEMENTARY SCHOOL', 'PUBLIC', 657., 'ELEMENTARY', 48. ),
('LITTLEWOOD ELEMENTARY SCHOOL', 'PUBLIC', 628., 'ELEMENTARY', 50.5),
('WESTWOOD MIDDLE SCHOOL', 'PUBLIC', 1047., 'MIDDLE/JR. HIGH', 62. ),
('HEALTHY LEARNING ACADEMY CHARTER SCHOOL', 'PUBLIC', 47., 'ELEMENTARY', 3. ),
('GLEN SPRINGS ELEMENTARY SCHOOL', 'PUBLIC', 463., 'ELEMENTARY', 32.5),
('GAINESVILLE HIGH SCHOOL', 'PUBLIC', 1928., 'SENIOR HIGH', 96. ),
('FOSTER ELEMENTARY SCHOOL', 'PUBLIC', 466., 'ELEMENTARY', 38. ),
('GENESIS PREPARATORY SCHOOL', 'PUBLIC', 71., 'ELEMENTARY', 4. ),
('LANIER CENTER & ANCHOR CENTER', 'PUBLIC', 105., 'COMBINATION ELEMENTARY & SECONDARY', 25. ),
('UNIVERSITY OF FLORIDA', 'PUBLIC', 0., 'COLLEGE/UNIVERSITY', 0. ),
('A. QUINN JONES CENTER', 'PUBLIC', 67., 'COMBINATION ELEMENTARY & SECONDARY', 21. ),
('CHARACTER COUNTS CENTER', 'PUBLIC', 19., 'COMBINATION ELEMENTARY & MIDDLE', 0. ),
('SANTA FE COMMUNITY COLLEGE', 'PUBLIC', 0., 'COLLEGE/UNIVERSITY', 0. ),
('KANAPAHA MIDDLE SCHOOL', 'PUBLIC', 932., 'MIDDLE/JR. HIGH', 53. ),
('WILES ELEMENTARY SCHOOL', 'PUBLIC', 644., 'ELEMENTARY', 42. ),
('FAMILY PROGRESS CENTER - WILES HEAD START', 'PUBLIC', 53., 'PRE-KINDERGARTEN', 0. ),
('EINSTEIN MONTESSORI SCHOOL', 'PUBLIC', 106., 'COMBINATION ELEMENTARY & MIDDLE', 6.5),
('IDYLWILD ELEMENTARY SCHOOL AND MULTI-COUNTY MIGRANT PROGRAM', 'PUBLIC', 570., 'ELEMENTARY', 41. ),
('EXPRESSIONS LEARNING ARTS ACADEMY', 'PUBLIC', 86., 'ELEMENTARY', 7. ),
('GAINESVILLE WILDERNESS INSTITUTE', 'PUBLIC', 35., 'COMBINATION JR. HIGH & SENIOR HIGH', 1. ),
('CARING & SHARING LEARNING SCHOOL', 'PUBLIC', 131., 'ELEMENTARY', 6. ),
('PRAIRIE VIEW ELEMENTARY SCHOOL', 'PUBLIC', 100., 'PRE-KINDERGARTEN', 15. ),
('LINCOLN MIDDLE SCHOOL', 'PUBLIC', 709., 'MIDDLE/JR. HIGH', 42.5),
('WILLIAMS ELEMENTARY SCHOOL', 'PUBLIC', 572., 'ELEMENTARY', 40. ),
('ALACHUA COUNTY SUPERINTENDENT OFFICE - KIRBY-SMITH CENTER', 'PUBLIC', 0., 'SUPERINTENDENT OFFICE', 4. ),
('DUVAL ELEMENTARY SCHOOL', 'PUBLIC', 445., 'ELEMENTARY', 42. ),
('BISHOP MIDDLE SCHOOL', 'PUBLIC', 717., 'MIDDLE/JR. HIGH', 43. ),
('METCALFE ELEMENTARY SCHOOL', 'PUBLIC', 531., 'ELEMENTARY', 32.5),
('THE ONE ROOM SCHOOL HOUSE PROJECT', 'PUBLIC', 114., 'ELEMENTARY', 10. ),
('RAWLINGS ELEMENTARY SCHOOL', 'PUBLIC', 395., 'ELEMENTARY', 28. ),
('FEARNSIDE FAMILY SERVICES CENTER AND HEAD START / PREK E I CENTER', 'PUBLIC', 63., 'PRE-KINDERGARTEN', 5. ),
('HOGGETOWNE MIDDLE SCHOOL', 'PUBLIC', 109., 'MIDDLE/JR. HIGH', 7.5),
('HORIZON CENTER. ALTERNATIVE SCHOOL', 'PUBLIC', 79., 'COMBINATION JR. HIGH & SENIOR HIGH', 17. ),
('LAKE FOREST ELEMENTARY SCHOOL', 'PUBLIC', 408., 'ELEMENTARY', 33. ),
('EASTSIDE HIGH SCHOOL', 'PUBLIC', 1549., 'SENIOR HIGH', 89. ),
('FLORIDA SIATECH AT GAINESVILLE,INC.', 'PUBLIC', 202., 'SENIOR HIGH', 5. ),
('SHELL ELEMENTARY SCHOOL', 'PUBLIC', 193., 'ELEMENTARY', 16. ),
('HAWTHORNE JR/SR HIGH SCHOOL', 'PUBLIC', 395., 'COMBINATION JR. HIGH & SENIOR HIGH', 23. ),
('WALDO COMMUNITY SCHOOL', 'PUBLIC', 215., 'ELEMENTARY', 17. ),
('UNIVERSITY OF FLORIDA', 'PUBLIC', 0., 'COLLEGE/UNIVERSITY', 0. ),
('SANTA FE COLLEGE - CHARLES L BLOUNT DOWNTOWN CENTER', 'PUBLIC', 0., 'COLLEGE/UNIVERSITY', 0. ),
('SANTA FE COLLEGE - KIRKPATRICK CENTER', 'PUBLIC', 0., 'COLLEGE/UNIVERSITY', 0. ),
('ALACHUA COUNTY JAIL', 'PUBLIC', 0., 'COMBINATION JR. HIGH & SENIOR HIGH', 0. ),
('ALACHUA COUNTY STUDENT SERVICES/ MIGRANT/ VIRTUAL PROGRAM/ MCKAY SCHOLARSHIP/ HOMEBOUND PROGRAM', 'PUBLIC', 0., 'MIGRANT EDUCATION PROGRAM', 4. ),
('OAK HALL LOWER SCHOOL', 'PRIVATE', 365., 'ELEMENTARY', 16. )],
dtype=[('NAME', '<U100'), ('OP_CLASS', '<U12'), ('ENROLLMENT', '<f8'), ('TYPE', '<U35'), ('TEACHERS', '<f8')])
Check the shape
of an array. Note that it returns a tuple of one element
since the structured array is considered a 1-d array.
school_arr.shape
(112,)
View the first five elements of a structured array
school_arr[:5]
array([('GRACE CHRISTIAN SCHOOL OF ALACHUA CO.', 'PRIVATE', 0., 'SENIOR HIGH', 0. ),
('FAMILY LIFE ACADEMY', 'PRIVATE', 0., 'COMBINATION ELEMENTARY & SECONDARY', 0. ),
('FOREST GROVE CHRISTIAN ACADEMY', 'PRIVATE', 53., 'COMBINATION ELEMENTARY & SECONDARY', 9.4),
('VAISHNAVA ACADEMY FOR GIRLS', 'PRIVATE', 19., 'COMBINATION JR. HIGH & SENIOR HIGH', 0. ),
('BHAKTIVEDANTA ACADEMY', 'PRIVATE', 0., 'COMBINATION ELEMENTARY & MIDDLE', 0. )],
dtype=[('NAME', '<U100'), ('OP_CLASS', '<U12'), ('ENROLLMENT', '<f8'), ('TYPE', '<U35'), ('TEACHERS', '<f8')])
View the 1st, 5th, 10th, 100th element of the array by providing a list (or tuple) of indicies. Remember this is called fancy indexing.
school_arr[[0, 4, 9, 99]]
array([('GRACE CHRISTIAN SCHOOL OF ALACHUA CO.', 'PRIVATE', 0., 'SENIOR HIGH', 0. ),
('BHAKTIVEDANTA ACADEMY', 'PRIVATE', 0., 'COMBINATION ELEMENTARY & MIDDLE', 0. ),
('QUEEN OF PEACE CATHOLIC ACADEMY', 'PRIVATE', 358., 'COMBINATION ELEMENTARY & MIDDLE', 28.4),
('HORIZON CENTER. ALTERNATIVE SCHOOL', 'PUBLIC', 79., 'COMBINATION JR. HIGH & SENIOR HIGH', 17. )],
dtype=[('NAME', '<U100'), ('OP_CLASS', '<U12'), ('ENROLLMENT', '<f8'), ('TYPE', '<U35'), ('TEACHERS', '<f8')])
Get a field (or column) from a structured array using the field name.
school_arr['ENROLLMENT']
array([ 0., 0., 53., 19., 0., 0., 0., 0., 115.,
358., 207., 52., 260., 106., 84., 209., 22., 0.,
246., 216., 264., 45., 19., 0., 0., 753., 0.,
211., 0., 55., 0., 336., 43., 0., 32., 0.,
16., 116., 443., 0., 76., 583., 520., 596., 938.,
0., 0., 39., 24., 132., 259., 1129., 453., 167.,
521., 446., 0., 0., 0., 2221., 439., 1139., 711.,
801., 836., 711., 731., 657., 628., 1047., 47., 463.,
1928., 466., 71., 105., 0., 67., 19., 0., 932.,
644., 53., 106., 570., 86., 35., 131., 100., 709.,
572., 0., 445., 717., 531., 114., 395., 63., 109.,
79., 408., 1549., 202., 193., 395., 215., 0., 0.,
0., 0., 0., 365.])
4.3 Compute statistics of a np.ndarray
#
maximum:
np.max()
minimum:
np.min()
mean:
np.mean()
standard deviation:
np.std()
enroll_arr = school_arr['ENROLLMENT']
np.max(enroll_arr)
2221.0
np.min(enroll_arr)
0.0
np.mean(enroll_arr)
294.35714285714283
np.std(enroll_arr)
397.6106732080762
4.4 Simple query against NumPy array#
Consider the following questions:
which school has the largest enrollment:
argmax
which school has the smallest enrollment:
argmin
enroll_arr.argmax() # returns the index of the largest value
59
school_arr['NAME'][enroll_arr.argmax()]
'BUCHHOLZ HIGH SCHOOL'
school_arr['NAME'][enroll_arr.argmin()]
'GRACE CHRISTIAN SCHOOL OF ALACHUA CO.'
4.5 Generate new arrays based on a conditional statement#
Let’s try to filter by the following criteria:
schools enrollment is positive, and
schools that are public.
enroll_arr > 0 # returns as an ndarray of booleans
array([False, False, True, True, False, False, False, False, True,
True, True, True, True, True, True, True, True, False,
True, True, True, True, True, False, False, True, False,
True, False, True, False, True, True, False, True, False,
True, True, True, False, True, True, True, True, True,
False, False, True, True, True, True, True, True, True,
True, True, False, False, False, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, False, True, True, False, True,
True, True, True, True, True, True, True, True, True,
True, False, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, False, False,
False, False, False, True])
school_type_arr = school_arr['OP_CLASS']
school_type_arr == 'PUBLIC'
array([False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, True, False, False, False, False, False, False, False,
False, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, False])
Boolean Indexing
Another great feature of NumPy contributing to its advantage in data processing
is boolean indexing, in which a boolean array (array of booleans, i.e., True
or False
) is
used to specify which elements of an array should be selected.
The boolean array is typically the result of some condition that
is applied to the original array. The result of boolean indexing is a new array that
contains only the elements that correspond to the True
values in the boolean array.
Note that the boolean array must be in the same shape as the original array.
num_arr = np.array([1, 2, 3, 4])
num_arr
array([1, 2, 3, 4])
bool_arr = np.array([True, False, True, True])
num_arr[bool_arr]
array([1, 3, 4])
To find out schools that don’t have positive enrollment, we can use
np.invert
which negates the judgement.
Then, we use the field name again to get a 1-d array of the names
of these schools.
school_arr[np.invert(enroll_arr > 0)]['NAME']
array(['GRACE CHRISTIAN SCHOOL OF ALACHUA CO.', 'FAMILY LIFE ACADEMY',
'BHAKTIVEDANTA ACADEMY', 'DESTINY CHRISTIAN ACADEMY',
'INCAF MONTESSORI SCHOOL',
'GREAT AMERICAN VISIONS ENTERPRISES,INC',
'GAINESVILLE CONDUCTIVE EDUCATION ACADEMY',
'GAINESVILLE CONDUCTIVE EDUCATION ACADEMY',
'KIDS N ALL CHRISTIAN ACADEMY', 'FREEDOM CHRISTIAN ACADEMY',
'FAITH TABERNACLE OF PRAISE SCHOOL OF MINISTRY', 'CITY COLLEGE',
'COMPASSIONATE OUTREACH MINISTRIES',
'OAK HILL COMMUNITY PRIVATE SCHOOL SYSTEM',
'SANTA FE COLLEGE - DAVIS CENTER',
'UNIVERSITY OF FLORIDA - COMPARITIVE MEDICINE',
'NORTH AMERICAN FAMILY INSTITUTE ALACHUA ACADEMY',
'UNIVERSITY OF FLORIDA', 'UNIVERSITY OF FLORIDA - AGRONOMY LAB',
'SANTA FE COLLEGE - NORTHWEST CAMPUS', 'UNIVERSITY OF FLORIDA',
'SANTA FE COMMUNITY COLLEGE',
'ALACHUA COUNTY SUPERINTENDENT OFFICE - KIRBY-SMITH CENTER',
'UNIVERSITY OF FLORIDA',
'SANTA FE COLLEGE - CHARLES L BLOUNT DOWNTOWN CENTER',
'SANTA FE COLLEGE - KIRKPATRICK CENTER', 'ALACHUA COUNTY JAIL',
'ALACHUA COUNTY STUDENT SERVICES/ MIGRANT/ VIRTUAL PROGRAM/ MCKAY SCHOLARSHIP/ HOMEBOUND PROGRAM'],
dtype='<U100')
4.6 Simple numeric operation#
We can apply basic arithmetic operations using the normal operators on arrays themselves.
enroll_arr
array([ 0., 0., 53., 19., 0., 0., 0., 0., 115.,
358., 207., 52., 260., 106., 84., 209., 22., 0.,
246., 216., 264., 45., 19., 0., 0., 753., 0.,
211., 0., 55., 0., 336., 43., 0., 32., 0.,
16., 116., 443., 0., 76., 583., 520., 596., 938.,
0., 0., 39., 24., 132., 259., 1129., 453., 167.,
521., 446., 0., 0., 0., 2221., 439., 1139., 711.,
801., 836., 711., 731., 657., 628., 1047., 47., 463.,
1928., 466., 71., 105., 0., 67., 19., 0., 932.,
644., 53., 106., 570., 86., 35., 131., 100., 709.,
572., 0., 445., 717., 531., 114., 395., 63., 109.,
79., 408., 1549., 202., 193., 395., 215., 0., 0.,
0., 0., 0., 365.])
The following doubles the size of each school’s enrollment.
enroll_arr * 2
array([ 0., 0., 106., 38., 0., 0., 0., 0., 230.,
716., 414., 104., 520., 212., 168., 418., 44., 0.,
492., 432., 528., 90., 38., 0., 0., 1506., 0.,
422., 0., 110., 0., 672., 86., 0., 64., 0.,
32., 232., 886., 0., 152., 1166., 1040., 1192., 1876.,
0., 0., 78., 48., 264., 518., 2258., 906., 334.,
1042., 892., 0., 0., 0., 4442., 878., 2278., 1422.,
1602., 1672., 1422., 1462., 1314., 1256., 2094., 94., 926.,
3856., 932., 142., 210., 0., 134., 38., 0., 1864.,
1288., 106., 212., 1140., 172., 70., 262., 200., 1418.,
1144., 0., 890., 1434., 1062., 228., 790., 126., 218.,
158., 816., 3098., 404., 386., 790., 430., 0., 0.,
0., 0., 0., 730.])
enroll_arr + 100
array([ 100., 100., 153., 119., 100., 100., 100., 100., 215.,
458., 307., 152., 360., 206., 184., 309., 122., 100.,
346., 316., 364., 145., 119., 100., 100., 853., 100.,
311., 100., 155., 100., 436., 143., 100., 132., 100.,
116., 216., 543., 100., 176., 683., 620., 696., 1038.,
100., 100., 139., 124., 232., 359., 1229., 553., 267.,
621., 546., 100., 100., 100., 2321., 539., 1239., 811.,
901., 936., 811., 831., 757., 728., 1147., 147., 563.,
2028., 566., 171., 205., 100., 167., 119., 100., 1032.,
744., 153., 206., 670., 186., 135., 231., 200., 809.,
672., 100., 545., 817., 631., 214., 495., 163., 209.,
179., 508., 1649., 302., 293., 495., 315., 100., 100.,
100., 100., 100., 465.])
pos_school_arr = school_arr[school_arr['ENROLLMENT'] > 0]
We can perform a division to find out number of teachers per student directly using
two fields, i.e., two np.ndarray
.
# ratio of number of teachers and students
pos_school_arr["TEACHERS"] / pos_school_arr["ENROLLMENT"]
array([0.17735849, 0. , 0.12608696, 0.07932961, 0.08647343,
0.11346154, 0.07615385, 0.06132075, 0.13690476, 0.09138756,
0. , 0.04390244, 0.11481481, 0.04772727, 0.13555556,
0.10526316, 0.08446215, 0.18720379, 0. , 0.05208333,
0. , 0.065625 , 0.0625 , 0.07758621, 0.07110609,
0.05263158, 0.05831904, 0.07115385, 0.05369128, 0.0575693 ,
0. , 0. , 0.06818182, 0.08494208, 0.04340124,
0.05960265, 0.05389222, 0.07101727, 0.07623318, 0.04502476,
0.07744875, 0.1071115 , 0.07313643, 0.05867665, 0.06698565,
0.06469761, 0.06839945, 0.07305936, 0.08041401, 0.05921681,
0.06382979, 0.07019438, 0.04979253, 0.08154506, 0.05633803,
0.23809524, 0.31343284, 0. , 0.05686695, 0.06521739,
0. , 0.06132075, 0.07192982, 0.08139535, 0.02857143,
0.04580153, 0.15 , 0.05994358, 0.06993007, 0.09438202,
0.05997211, 0.06120527, 0.0877193 , 0.07088608, 0.07936508,
0.06880734, 0.21518987, 0.08088235, 0.05745642, 0.02475248,
0.08290155, 0.05822785, 0.07906977, 0.04383562])
The following can help us answer which public school in Alachua County has the highest ratio between teachers and students.
max_ratio_id = (pos_school_arr["TEACHERS"] / pos_school_arr["ENROLLMENT"]).argmax()
print(f"the max value occurs at {max_ratio_id} index.")
the max value occurs at 56 index.
pos_school_arr['NAME'][max_ratio_id]
'A. QUINN JONES CENTER'