add new categories to pandas categorical column

Add New Categories to a Category Column in Pandas

In this tutorial, we will look at how to add new categories to a category type column in Pandas with the help of some examples.

How to add new categories to a Pandas categorical column?

add new categories to pandas categorical column

You can use the Pandas add_categories() method to add new categories to a categorical field in Pandas. For a Pandas series, use the .cat accessor to apply this function. The following is the syntax –

# add new category value to category type column in Pandas
df["Col"] = df["Col"].cat.add_categories("new_category_value")

Pass the category or a list of categories (if adding multiple categories) as an argument to the function. The additional categories are added to the list of possible category values for that field.

Examples

Let’s look at some examples of adding categories to a categorical field. First, we’ll create a Pandas dataframe that we will be using throughout this tutorial.

import pandas as pd

# create a dataframe
df = pd.DataFrame({
        "Name": ["Tim", "Sarah", "Hasan", "Jyoti", "Jack"],
        "Shirt Size": ["S", "S", "M", "S", "M"]
})
# change to category dtype
df["Shirt Size"] = df["Shirt Size"].astype("category")
# display the dataframe
print(df)

Output:

    Name Shirt Size
0    Tim          S
1  Sarah          S
2  Hasan          M
3  Jyoti          S
4   Jack          M

We now have a dataframe containing the names and the corresponding t-shirt sizes of students in a university. The “Shirt Size” column is of category type. Let’s print out the possible category values in that column.

# display categories
print(df["Shirt Size"].cat.categories)

Output:

Index(['M', 'S'], dtype='object')

You can see that we get “M” and “S” as the possible category values in the “Shirt Size” column. These values are inferred from the data during creation.

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

Add a new category to Pandas categorical column

Let’s add “L” as a possible category value to the “Shirt Size” column. For this, we apply the add_categories() with the help of the .cat accessor on the “Shirt Size” column and pass “L” as an argument.

# add new category value to "Shirt Size" column
df["Shirt Size"] = df["Shirt Size"].cat.add_categories("L")
# display categories
print(df["Shirt Size"].cat.categories)

Output:

Index(['M', 'S', 'L'], dtype='object')

You can see that “L” is now a possible category value in the “Shirt Size” column. Note that adding a new category value doesn’t change the data, it only adds a new possible value that can be used in the category column.

Add multiple new categories to Pandas categorical column

To add multiple categories to a categorical field, pass the list of new categories as an argument to the add_categories() function. Let’s add “XL” and “XXL” as possible values to the “Shirt Size” column.

# add multiple categories "Shirt Size" column
df["Shirt Size"] = df["Shirt Size"].cat.add_categories(["XL", "XXL"])
# display categories
print(df["Shirt Size"].cat.categories)

Output:

Index(['M', 'S', 'L', 'XL', 'XXL'], dtype='object')

You can see that “XL” and “XXL” are now added as category values for the “Shirt Size” column.

When to use the .cat accessor?

Notice that we used the .cat accessor to apply the add_categories() function in the above examples. This is because we’re applying a Pandas category function to a Pandas series.

If, on the other hand, you want to apply the same category type function to a Pandas Categorical object, you can directly apply the function without using the .cat accessor. Let’s look at an example.

# create a Categorical object
shirt_sizes = pd.Categorical(["S", "S", "M", "S", "M"])
# add new category
shirt_sizes = shirt_sizes.add_categories("L")
# display the object
print(shirt_sizes)

Output:

['S', 'S', 'M', 'S', 'M']
Categories (3, object): ['M', 'S', 'L']

Here, we first create a Pandas Categorical object storing the shirt sizes. We then use the add_categories() function to add an additional category value, “L”. Notice that here we didn’t use the .cat accessor.

You might also be interested in –


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

Scroll to Top