In this tutorial, we’ll look at how to filter pandas dataframe rows based on a list of values for a column.
How to filter a pandas dataframe on a set of values?
To filter rows of a dataframe on a set or collection of values you can use the isin()
membership function. This way, you can have only the rows that you’d like to keep based on the list values. The following is the syntax:
df_filtered = df[df['Col1'].isin(allowed_values)]
Here, allowed_values is the list of values of column Col1
that you want to filter the dataframe for. Any row with its Col1
value not present in the given list is filtered out.
Example
Let’s look at an example to see the filtering in action. For instance, you have data for the operators and store locations of a fast food joint and you only want to see the operators at a few specific locations :
import pandas as pd
# store locations of a fast-food joint
data = {
'Operator': ['Sam', 'Mike', 'Harvey', 'Susan', 'Jim', 'Kevin', 'Diane'],
'City': ['New York', 'Seattle', 'New York', 'Los Angeles', 'Scranton',
'Houston', 'Miami'],
}
# create a dataframe
store_df = pd.DataFrame(data)
print(store_df)
Output:
Operator City
0 Sam New York
1 Mike Seattle
2 Harvey New York
3 Susan Los Angeles
4 Jim Scranton
5 Kevin Houston
6 Diane Miami
Now, if you want to know the names operators only in New York and Los Angeles you can filter the above dataframe using isin
# filter for New York and Los Angeles
store_df_filtered = store_df[store_df['City'].isin(['New York', 'Los Angeles'])]
print(store_df_filtered)
Output:
Operator City
0 Sam New York
2 Harvey New York
3 Susan Los Angeles
Here, you can see that we passed the list of cities for which we wanted to filter the dataframe to isin
. The isin
membership function checks whether the value in the column City
is present in the passed list or not. This method is quite useful when you want to filter multiple values in a dataframe column
Also notice, that the filtered dataframe retains the index from the original dataframe. If you want to reset the index of the resulting dataframe, you can use the reset_index()
function to have a fresh index.
# reset the dataframe index
store_df_filtered = store_df_filtered.reset_index(drop=True)
print(store_df_filtered)
Output:
Operator City
0 Sam New York
1 Harvey New York
2 Susan Los Angeles
Note that we passed drop=True
to the reset_index()
function. This is done because we do not want the previous index as an additional column in our dataframe.
For more on isin and other python membership function refer to our guide on python
With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having pandas version 1.0.5
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.
More on Pandas DataFrames –
- Pandas – Sort a DataFrame
- Change Order of Columns of a Pandas DataFrame
- Pandas DataFrame to a List in Python
- Pandas – Count of Unique Values in Each Column
- Pandas – Replace Values in a DataFrame
- Pandas – Filter DataFrame for multiple conditions
- Pandas – Random Sample of Rows
- Pandas – Random Sample of Columns
- Save Pandas DataFrame to a CSV file
- Pandas – Save DataFrame to an Excel file
- Create a Pandas DataFrame from Dictionary
- Convert Pandas DataFrame to a Dictionary
- Drop Duplicates from a Pandas DataFrame
- Concat DataFrames in Pandas
- Append Rows to a Pandas DataFrame
- Compare Two DataFrames for Equality in Pandas
- Get Column Names as List in Pandas DataFrame
- Select One or More Columns in Pandas
- Pandas – Rename Column Names
- Pandas – Drop one or more Columns from a Dataframe
- Pandas – Iterate over Rows of a Dataframe
- How to Reset Index of a Pandas DataFrame?
- Read CSV files using Pandas – With Examples
- Apply a Function to a Pandas DataFrame