In this tutorial, we will look at how to change the category order for a category type column in Pandas with the help of examples.
How to change the category order in Pandas?
To change the category order in an ordered categorical column in Pandas, use the Pandas categorical reorder_categories()
function with the help of the .cat
accessor. The following is the syntax –
# change category order df["Col"] = df["Col"].cat.reorder_categories(category_order_list, ordered=True)
Note that all the old categories must be included in the new order and no new categories are allowed.
Examples
Let’s look at some examples of changing the order of categories for a categorical column. First, we’ll create a dataframe with an ordered category type column to use in this tutorial.
import pandas as pd # create a dataframe df = pd.DataFrame({ "Name": ["Tim", "Sarah", "Hasan", "Jyoti", "Jack"], "Ticket Class": ["B", "A", "B", "C", "B"] }) # change to category dtype df["Ticket Class"] = df["Ticket Class"].astype("category") # set and order categories for "Ticket Class" column df["Ticket Class"] = df["Ticket Class"].cat.set_categories(["A", "B", "C"], ordered=True) # display the dataframe print(df)
Output:
Name Ticket Class 0 Tim B 1 Sarah A 2 Hasan B 3 Jyoti C 4 Jack B
We now have a dataframe containing the names and the ticket classes of passengers on a cruise ship. Note that the “Ticket Class” column is a category type column.
Let’s print out the “Ticket Class” column to see its value and the order of the categories.
# display "Ticket Class" column print(df["Ticket Class"])
Output:
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
0 B 1 A 2 B 3 C 4 B Name: Ticket Class, dtype: category Categories (3, object): ['A' < 'B' < 'C']
You can see that the category values have the following order “A” < “B” < “C”. For example, ticket class “C” is higher on the order than classes “A” and “B”.
Change category order in ordered category column in Pandas.
Let’s change the order of category values in the “Ticket Class” column to “A” > “B” > “C”. That is, reverse of the current order with ticket class “A” having the higher order than classes “B” and “C” and ticket class “B” having a higher order than “C”.
To change the order, use the reorder_categories()
function.
# change category order in "Ticket Class" column df["Ticket Class"] = df["Ticket Class"].cat.reorder_categories(["C", "B", "A"], ordered=True) # display "Ticket Class" column print(df["Ticket Class"])
Output:
0 B 1 A 2 B 3 C 4 B Name: Ticket Class, dtype: category Categories (3, object): ['C' < 'B' < 'A']
You can see that the order is changed in the “Ticket Class” column. Note that we have to use the .cat
accessor to apply the categorical reorder_categories()
function since we’re applying it to a Pandas series. Also note that there’s no change to the data itself, only the internal order of category values is changed.
What if you try to add a new category to the reorder_categories()
function? Let’s find out.
# change category order in "Ticket Class" column df["Ticket Class"] = df["Ticket Class"].cat.reorder_categories(["C", "B", "D"], ordered=True)
Output:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Input In [5], in <module> 1 # change category order in "Ticket Class" column ----> 2 df["Ticket Class"] = df["Ticket Class"].cat.reorder_categories(["C", "B", "D"], ordered=True) ... ValueError: items in new_categories are not the same as in old categories
It results in an error because no new categories are allowed in the reorder_categories()
function. If you want to add a new category value, use the add_categories()
function instead.
You might also be interested in –
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.