In this tutorial, we will look at how to remove categories from a category type column in Pandas with the help of some examples.
How to remove categories from a Pandas Categorical Column?
You can use the Pandas remove_categories()
method to remove categories from a categorical field in Pandas. For a Pandas series, use the .cat
accessor to apply this function.
The following is the syntax –
# remove a category value from a category type column in Pandas df["Col"] = df["Col"].cat.remove_categories("category_value_to_remove")
Pass the category or a list of categories (if removing multiple categories) as an argument to the function. The passed categories are removed from the list of possible category values for that field.
Examples
Let’s look at some examples of removing categories from a categorical field. First, we’ll create a Pandas dataframe that we will be using throughout this tutorial.
import pandas as pd # create a dataframe df = pd.DataFrame({ "Name": ["Tim", "Sarah", "Hasan", "Jyoti", "Jack"], "Shirt Size": ["S", "M", "XL", "S", "L"] }) # change to category dtype df["Shirt Size"] = df["Shirt Size"].astype("category") # display the dataframe print(df)
Output:
Name Shirt Size 0 Tim S 1 Sarah M 2 Hasan XL 3 Jyoti S 4 Jack L
We now have a dataframe containing the names and the corresponding t-shirt sizes of students in a university. The “Shirt Size” column is of category
type. Let’s print out the category column to see its data and the possible category values.
# display the "Shirt Size" column print(df["Shirt Size"])
Output:
0 S 1 M 2 XL 3 S 4 L Name: Shirt Size, dtype: category Categories (4, object): ['L', 'M', 'S', 'XL']
You can see that we get, “L”, “M”, “S”, and “XL” as the possible category values in the “Shirt Size” column. These values are inferred from the data during creation.
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
Remove category value from a categorical column
In the above dataframe, let’s remove “L” as a possible category value for the “Shirt Size” categorical column. For this, we apply the remove_categories()
function with the help of the .cat
accessor on the “Shirt Size” column and pass “L” as an argument.
# remove category value "L" from "Shirt Size" column df["Shirt Size"] = df["Shirt Size"].cat.remove_categories("L") # display the "Shirt Size" column print(df["Shirt Size"])
Output:
0 S 1 M 2 XL 3 S 4 NaN Name: Shirt Size, dtype: category Categories (3, object): ['M', 'S', 'XL']
You can see that now “L” is not a possible category value for the “Shirt Size” column. Note that the data having “L” as the value is now NaN.
Remove multiple categories from a categorical column
To remove multiple categories from a categorical field, pass the categories to remove as a list to the remove_categories()
function. Let’s remove “M” and “XL” as possible values from the “Shirt Size” column.
# remove categories "M" and "XL" from "Shirt Size" column df["Shirt Size"] = df["Shirt Size"].cat.remove_categories(["M", "XL"]) # display the "Shirt Size" column print(df["Shirt Size"])
Output:
0 S 1 NaN 2 NaN 3 S 4 NaN Name: Shirt Size, dtype: category Categories (1, object): ['S']
You can see that the “Shirt Size” column does not contain “M” and “XL” as possible category values. We now only have “S” as a possible category value because we removed “L” in the previous example and “M” and “XL” in this example.
Remove unused categories from a categorical column in Pandas
There’s an additional function that you can use for a specific use case. Removing unused category values from a category type column. Unused categories are values that are a part of the possible category values but do not occur in the data.
You can use the remove_unused_categories()
function to remove unused categories from a categorical field in Pandas. Its usage is similar to the remove_categories()
function. Let’s look at an example.
# series of shirt sizes shirt_sizes = pd.Series(pd.Categorical(["L", "M", "L", "M", "L"], categories=["S", "M", "L", "XL"])) # display the series print(shirt_sizes)
Output:
0 L 1 M 2 L 3 M 4 L dtype: category Categories (4, object): ['S', 'M', 'L', 'XL']
The above Pandas series is of category
type and has its set of possible values as “S”, “M”, “L”, and “XL”. If you look at the data in the series, the categories “S” and “XL” do not occur in the data. Let’s remove these categories as possible category values.
# remove unused categories shirt_sizes = shirt_sizes.cat.remove_unused_categories() # display the "series print(shirt_sizes)
Output:
0 L 1 M 2 L 3 M 4 L dtype: category Categories (2, object): ['M', 'L']
You can see that the resulting series doesn’t have any unused category values.
You might also be interested in –
- Get List of Categories in Pandas Category Column
- Pandas – Rename Categories in Category Column
- Pandas – Change Column Type to Category
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.