change category order of a categorical column in pandas

Change Category Order of a Pandas Column

In this tutorial, we will look at how to change the category order for a category type column in Pandas with the help of examples.

How to change the category order in Pandas?

change category order of a categorical column in pandas

To change the category order in an ordered categorical column in Pandas, use the Pandas categorical reorder_categories() function with the help of the .cat accessor. The following is the syntax –

# change category order
df["Col"] = df["Col"].cat.reorder_categories(category_order_list, ordered=True)

Note that all the old categories must be included in the new order and no new categories are allowed.

Examples

Let’s look at some examples of changing the order of categories for a categorical column. First, we’ll create a dataframe with an ordered category type column to use in this tutorial.

import pandas as pd

# create a dataframe
df = pd.DataFrame({
        "Name": ["Tim", "Sarah", "Hasan", "Jyoti", "Jack"],
        "Ticket Class": ["B", "A", "B", "C", "B"]
})
# change to category dtype
df["Ticket Class"] = df["Ticket Class"].astype("category")
# set and order categories for "Ticket Class" column
df["Ticket Class"] = df["Ticket Class"].cat.set_categories(["A", "B", "C"], ordered=True)
# display the dataframe
print(df)

Output:

    Name Ticket Class
0    Tim            B
1  Sarah            A
2  Hasan            B
3  Jyoti            C
4   Jack            B

We now have a dataframe containing the names and the ticket classes of passengers on a cruise ship. Note that the “Ticket Class” column is a category type column.

Let’s print out the “Ticket Class” column to see its value and the order of the categories.

# display "Ticket Class" column
print(df["Ticket Class"])

Output:

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

0    B
1    A
2    B
3    C
4    B
Name: Ticket Class, dtype: category
Categories (3, object): ['A' < 'B' < 'C']

You can see that the category values have the following order “A” < “B” < “C”. For example, ticket class “C” is higher on the order than classes “A” and “B”.

Change category order in ordered category column in Pandas.

Let’s change the order of category values in the “Ticket Class” column to “A” > “B” > “C”. That is, reverse of the current order with ticket class “A” having the higher order than classes “B” and “C” and ticket class “B” having a higher order than “C”.

To change the order, use the reorder_categories() function.

# change category order in "Ticket Class" column
df["Ticket Class"] = df["Ticket Class"].cat.reorder_categories(["C", "B", "A"], ordered=True)
# display "Ticket Class" column
print(df["Ticket Class"])

Output:

0    B
1    A
2    B
3    C
4    B
Name: Ticket Class, dtype: category
Categories (3, object): ['C' < 'B' < 'A']

You can see that the order is changed in the “Ticket Class” column. Note that we have to use the .cat accessor to apply the categorical reorder_categories() function since we’re applying it to a Pandas series. Also note that there’s no change to the data itself, only the internal order of category values is changed.

What if you try to add a new category to the reorder_categories() function? Let’s find out.

# change category order in "Ticket Class" column
df["Ticket Class"] = df["Ticket Class"].cat.reorder_categories(["C", "B", "D"], ordered=True)

Output:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [5], in <module>
      1 # change category order in "Ticket Class" column
----> 2 df["Ticket Class"] = df["Ticket Class"].cat.reorder_categories(["C", "B", "D"], ordered=True)

...

ValueError: items in new_categories are not the same as in old categories

It results in an error because no new categories are allowed in the reorder_categories() function. If you want to add a new category value, use the add_categories() function instead.

You might also be interested in –


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

Scroll to Top