In this tutorial, we will look at how to set a category order in a Pandas category type column with the help of some examples.
How to set category order in Pandas?
You can use the Pandas categorical set_categories()
function to set and order categories in a category type column. Use the .cat
accessor to apply this function on a Pandas column. The following is the syntax –
# set and order categories df["Col"] = df["Col"].cat.set_categories(category_order_list, ordered=True)
Pass the categories in the order you want as a list and ordered=True
as arguments to make the column an ordered categorical column with the given category order.
Examples
Let’s look at some examples of setting the category order for a category type column in Pandas. First, we will create a sample dataframe that we will be using throughout this tutorial.
import pandas as pd # create a dataframe df = pd.DataFrame({ "Name": ["Tim", "Sarah", "Hasan", "Jyoti", "Jack"], "Year": ["Junior", "Senior", "Freshman", "Junior", "Freshman"], "Shirt Size": ["S", "M", "L", "S", "L"] }) # change to category dtype df["Year"] = df["Year"].astype("category") df["Shirt Size"] = df["Shirt Size"].astype("category") # display the dataframe print(df)
Output:
Name Year Shirt Size 0 Tim Junior S 1 Sarah Senior M 2 Hasan Freshman L 3 Jyoti Junior S 4 Jack Freshman L
We now have a dataframe containing the name, year, and the respective t-shirt size of some students in a university. Note that the “Year” and the “Shirt Size” column is of category type.
Let’s print out the “Year” column.
# display the column print(df["Year"])
Output:
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
0 Junior 1 Senior 2 Freshman 3 Junior 4 Freshman Name: Year, dtype: category Categories (3, object): ['Freshman', 'Junior', 'Senior']
You can see that the “Year” column is of category
dtype. Note that the category values in this column are not ordered. That is, for example, the information that “Senior” is greater than “Junior” is not encoded in the values.
Set Category Order of a Category type column in Pandas
You can convert an unordered categorical type column to an ordered categorical column. Let’s convert the “Year” column to an ordered category column with category order “Freshman” < “Sophomore” < “Junior” < “Senior”. For this, we will use the Pandas categorical set_categories()
function.
# set and order categories df["Year"] = df["Year"].cat.set_categories(["Freshman", "Sophomore", "Junior", "Senior"], ordered=True) # display the column print(df["Year"])
Output:
0 Junior 1 Senior 2 Freshman 3 Junior 4 Freshman Name: Year, dtype: category Categories (4, object): ['Freshman' < 'Sophomore' < 'Junior' < 'Senior']
You can see that the categories are now ordered. Note that some category values are not present in the data but the order information is still encoded in the category field.
Let’s look at another example. First, let’s print out the “Shirt Size” column.
# display the column print(df["Shirt Size"])
Output:
0 S 1 M 2 L 3 S 4 L Name: Shirt Size, dtype: category Categories (3, object): ['L', 'M', 'S']
You can see that this column is also of catgory
type but is currently unordered. Let’s set the category order for the “Shirt Size” column to “S” < “M” < “L”.
# set and order categories df["Shirt Size"] = df["Shirt Size"].cat.set_categories(["S", "M", "L"], ordered=True) # display the column print(df["Shirt Size"])
Output:
0 S 1 M 2 L 3 S 4 L Name: Shirt Size, dtype: category Categories (3, object): ['S' < 'M' < 'L']
The categories in the “Shirt Size” column are now ordered.
If, on the other hand, you want to change the order or categories in an ordered categorical column, use the Pandas categorical reorder_categories()
function.
You might also be interested in –
- Add New Categories to a Category Column in Pandas
- Pandas – Remove Categories From a Categorical Column
- Get List of Categories in Pandas Category Column
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.