Skip to Content

Filter DataFrame rows on a list of values

In this tutorial, we’ll look at how to filter pandas dataframe rows based on a list of values for a column.

To filter rows of a dataframe on a set or collection of values you can use the isin() membership function. This way, you can have only the rows that you’d like to keep based on the list values. The following is the syntax:

df_filtered = df[df['Col1'].isin(allowed_values)]

Here, allowed_values is the list of values of column Col1 that you want to filter the dataframe for. Any row with its Col1 value not present in the given list is filtered out.

Let’s look at an example to see the filtering in action. For instance, you have data for the operators and store locations of a fast food joint and you only want to see the operators at a few specific locations :

import pandas as pd

# store locations of a fast-food joint
data = {
    'Operator': ['Sam', 'Mike', 'Harvey', 'Susan', 'Jim', 'Kevin', 'Diane'],
    'City': ['New York', 'Seattle', 'New York', 'Los Angeles', 'Scranton', 
             'Houston', 'Miami'],
}
# create a dataframe
store_df = pd.DataFrame(data)
print(store_df)

Output:

 Operator         City
0      Sam     New York
1     Mike      Seattle
2   Harvey     New York
3    Susan  Los Angeles
4      Jim     Scranton
5    Kevin      Houston
6    Diane        Miami

Now, if you want to know the names operators only in New York and Los Angeles you can filter the above dataframe using isin

# filter for New York and Los Angeles
store_df_filtered = store_df[store_df['City'].isin(['New York', 'Los Angeles'])]
print(store_df_filtered)

Output:

  Operator         City
0      Sam     New York
2   Harvey     New York
3    Susan  Los Angeles

Here, you can see that we passed the list of cities for which we wanted to filter the dataframe to isin. The isin membership function checks whether the value in the column City is present in the passed list or not. This method is quite useful when you want to filter multiple values in a dataframe column

Also notice, that the filtered dataframe retains the index from the original dataframe. If you want to reset the index of the resulting dataframe, you can use the reset_index() function to have a fresh index.

# reset the dataframe index
store_df_filtered = store_df_filtered.reset_index(drop=True)
print(store_df_filtered)

Output:

 Operator         City
0      Sam     New York
1   Harvey     New York
2    Susan  Los Angeles

Note that we passed drop=True to the reset_index() function. This is done because we do not want the previous index as an additional column in our dataframe.

For more on isin and other python membership function refer to our guide on python

With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having pandas version 1.0.5


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.



Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. In the past, he's worked as a Data Scientist for ZS and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.