In this tutorial, we will look at how to filter a numpy array.
How to filter numpy arrays?
You can filter a numpy array by creating a list or an array of boolean values indicative of whether or not to keep the element in the corresponding array. This method is called boolean mask slicing. For example, if you filter the array [1, 2, 3]
with the boolean list [True, False, True]
, the filtered array would be [1, 3]
.
The following is the syntax to filter a numpy array using this method –
# arr is a numpy array # boolean array of which elements to keep, here elements less than 4 mask = arr < 4 # filter the array arr_filtered = arr[mask] # above filtering in a single line arr_filtered = arr[arr < 4]
Alternatively, you can also use np.where()
to get the indexes of the elements to keep and filter the numpy array based on those indexes. The following is the syntax –
# arr is a numpy array # indexes to keep based on the condition, here elements less than 4 indexes_to_keep = np.where(arr < 4) # filter the array arr_filtered = arr[indexes_to_keep] # above filtering in a single line arr_filtered = arr[np.where(arr < 4)]
Examples
Let’s look at some examples to better understand the usage of the above methods for different use-cases.
First, we will create a numpy array that we will be using throughout this tutorial –
import numpy as np # create a numpy array arr = np.array([1, 4, 2, 7, 9, 3, 5, 8]) # print the array print(arr)
Output:
[1 4 2 7 9 3 5 8]
1. Filter array based on a single condition
Let’s filter the above array arr
on a single condition, say elements greater than 5 using the boolean masking method.
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
# boolean mask of elements to keep mask = arr > 5 print(mask) # filter the array arr_filtered = arr[mask] # show the filtered array print(arr_filtered)
Output:
[False False False True True False False True] [7 9 8]
You can see that we printed the boolean mask and the filtered array. Masking and filtering can be done in a single line –
# filter array filtered_arr = arr[arr > 5] print(filtered_arr)
Output:
[7 9 8]
Let’s now go ahead and perform the same filtering, this time using np.where()
instead of a boolean list or array.
# indexes of elements to keep indexes_to_keep = np.where(arr > 5) print(indexes_to_keep) # filter the array arr_filtered = arr[indexes_to_keep] # show the filtered array print(arr_filtered)
Output:
(array([3, 4, 7], dtype=int64),) [7 9 8]
The indexes of elements to keep is printed followed by the filtered array. The np.where() function gives us the indexes satisfying the condition which are then used to filter the array. A shorter version of the above code is –
# filter array filtered_arr = arr[np.where(arr > 5)] print(filtered_arr)
Output:
[7 9 8]
2. Filter array based on two conditions
To filter the array on multiple conditions, you can combine the conditions together using parenthesis ()
and the “and” &
operator – ((condition1) & (condition2) & ...)
Let’s filter the array “arr” on two conditions – greater than 5 and less than 9 using boolean masking.
# filter array filtered_arr = arr[(arr > 5) & (arr < 9)] print(filtered_arr)
Output:
[7 8]
The returned array only contains elements from the original array that are greater than 5 and less than 9, satisfying both the conditions.
Let’s now perform the same filtering using np.where()
–
# filter array filtered_arr = arr[np.where(((arr > 5) & (arr < 9)))] print(filtered_arr)
Output:
[7 8]
We get the same result satisfying both the conditions.
In the above examples, we filtered the array on two conditions but this method can easily be extended to multiple conditions.
With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having numpy version 1.18.5
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.
Tutorials on numpy arrays –
- How to sort a Numpy Array?
- Create Pandas DataFrame from a Numpy Array
- Different ways to Create NumPy Arrays
- Convert Numpy array to a List – With Examples
- Append Values to a Numpy Array
- Find Index of Element in Numpy Array
- Read CSV file as NumPy Array
- Filter a Numpy Array – With Examples
- Python – Randomly select value from a list
- Numpy – Sum of Values in Array
- Numpy – Elementwise sum of two arrays
- Numpy – Elementwise multiplication of two arrays
- Using the numpy linspace() method
- Using numpy vstack() to vertically stack arrays
- Numpy logspace() – Usage and Examples
- Using the numpy arange() method
- Using numpy hstack() to horizontally stack arrays
- Trim zeros from a numpy array in Python
- Get unique values and counts in a numpy array
- Horizontally split numpy array with hsplit()