remove categories from a category type column in Pandas

Pandas – Remove Categories From a Categorical Column

In this tutorial, we will look at how to remove categories from a category type column in Pandas with the help of some examples.

How to remove categories from a Pandas Categorical Column?

remove categories from a category type column in Pandas

You can use the Pandas remove_categories() method to remove categories from a categorical field in Pandas. For a Pandas series, use the .cat accessor to apply this function.
The following is the syntax –

# remove a category value from a category type column in Pandas
df["Col"] = df["Col"].cat.remove_categories("category_value_to_remove")

Pass the category or a list of categories (if removing multiple categories) as an argument to the function. The passed categories are removed from the list of possible category values for that field.

Examples

Let’s look at some examples of removing categories from a categorical field. First, we’ll create a Pandas dataframe that we will be using throughout this tutorial.

import pandas as pd

# create a dataframe
df = pd.DataFrame({
        "Name": ["Tim", "Sarah", "Hasan", "Jyoti", "Jack"],
        "Shirt Size": ["S", "M", "XL", "S", "L"]
})
# change to category dtype
df["Shirt Size"] = df["Shirt Size"].astype("category")
# display the dataframe
print(df)

Output:

    Name Shirt Size
0    Tim          S
1  Sarah          M
2  Hasan         XL
3  Jyoti          S
4   Jack          L

We now have a dataframe containing the names and the corresponding t-shirt sizes of students in a university. The “Shirt Size” column is of category type. Let’s print out the category column to see its data and the possible category values.

# display the "Shirt Size" column
print(df["Shirt Size"])

Output:

0     S
1     M
2    XL
3     S
4     L
Name: Shirt Size, dtype: category
Categories (4, object): ['L', 'M', 'S', 'XL']

You can see that we get, “L”, “M”, “S”, and “XL” as the possible category values in the “Shirt Size” column. These values are inferred from the data during creation.

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

Remove category value from a categorical column

In the above dataframe, let’s remove “L” as a possible category value for the “Shirt Size” categorical column. For this, we apply the remove_categories() function with the help of the .cat accessor on the “Shirt Size” column and pass “L” as an argument.

# remove category value "L" from "Shirt Size" column
df["Shirt Size"] = df["Shirt Size"].cat.remove_categories("L")
# display the "Shirt Size" column
print(df["Shirt Size"])

Output:

0      S
1      M
2     XL
3      S
4    NaN
Name: Shirt Size, dtype: category
Categories (3, object): ['M', 'S', 'XL']

You can see that now “L” is not a possible category value for the “Shirt Size” column. Note that the data having “L” as the value is now NaN.

Remove multiple categories from a categorical column

To remove multiple categories from a categorical field, pass the categories to remove as a list to the remove_categories() function. Let’s remove “M” and “XL” as possible values from the “Shirt Size” column.

# remove categories "M" and "XL" from "Shirt Size" column
df["Shirt Size"] = df["Shirt Size"].cat.remove_categories(["M", "XL"])
# display the "Shirt Size" column
print(df["Shirt Size"])

Output:

0      S
1    NaN
2    NaN
3      S
4    NaN
Name: Shirt Size, dtype: category
Categories (1, object): ['S']

You can see that the “Shirt Size” column does not contain “M” and “XL” as possible category values. We now only have “S” as a possible category value because we removed “L” in the previous example and “M” and “XL” in this example.

Remove unused categories from a categorical column in Pandas

There’s an additional function that you can use for a specific use case. Removing unused category values from a category type column. Unused categories are values that are a part of the possible category values but do not occur in the data.

You can use the remove_unused_categories() function to remove unused categories from a categorical field in Pandas. Its usage is similar to the remove_categories() function. Let’s look at an example.

# series of shirt sizes
shirt_sizes = pd.Series(pd.Categorical(["L", "M", "L", "M", "L"], categories=["S", "M", "L", "XL"]))
# display the series
print(shirt_sizes)

Output:

0    L
1    M
2    L
3    M
4    L
dtype: category
Categories (4, object): ['S', 'M', 'L', 'XL']

The above Pandas series is of category type and has its set of possible values as “S”, “M”, “L”, and “XL”. If you look at the data in the series, the categories “S” and “XL” do not occur in the data. Let’s remove these categories as possible category values.

# remove unused categories
shirt_sizes = shirt_sizes.cat.remove_unused_categories()
# display the "series
print(shirt_sizes)

Output:

0    L
1    M
2    L
3    M
4    L
dtype: category
Categories (2, object): ['M', 'L']

You can see that the resulting series doesn’t have any unused category values.

You might also be interested in –


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

Scroll to Top