In this tutorial, we will look at how to add new categories to a category type column in Pandas with the help of some examples.
How to add new categories to a Pandas categorical column?
You can use the Pandas add_categories()
method to add new categories to a categorical field in Pandas. For a Pandas series, use the .cat
accessor to apply this function. The following is the syntax –
# add new category value to category type column in Pandas df["Col"] = df["Col"].cat.add_categories("new_category_value")
Pass the category or a list of categories (if adding multiple categories) as an argument to the function. The additional categories are added to the list of possible category values for that field.
Examples
Let’s look at some examples of adding categories to a categorical field. First, we’ll create a Pandas dataframe that we will be using throughout this tutorial.
import pandas as pd # create a dataframe df = pd.DataFrame({ "Name": ["Tim", "Sarah", "Hasan", "Jyoti", "Jack"], "Shirt Size": ["S", "S", "M", "S", "M"] }) # change to category dtype df["Shirt Size"] = df["Shirt Size"].astype("category") # display the dataframe print(df)
Output:
Name Shirt Size 0 Tim S 1 Sarah S 2 Hasan M 3 Jyoti S 4 Jack M
We now have a dataframe containing the names and the corresponding t-shirt sizes of students in a university. The “Shirt Size” column is of category
type. Let’s print out the possible category values in that column.
# display categories print(df["Shirt Size"].cat.categories)
Output:
Index(['M', 'S'], dtype='object')
You can see that we get “M” and “S” as the possible category values in the “Shirt Size” column. These values are inferred from the data during creation.
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
Add a new category to Pandas categorical column
Let’s add “L” as a possible category value to the “Shirt Size” column. For this, we apply the add_categories()
with the help of the .cat
accessor on the “Shirt Size” column and pass “L” as an argument.
# add new category value to "Shirt Size" column df["Shirt Size"] = df["Shirt Size"].cat.add_categories("L") # display categories print(df["Shirt Size"].cat.categories)
Output:
Index(['M', 'S', 'L'], dtype='object')
You can see that “L” is now a possible category value in the “Shirt Size” column. Note that adding a new category value doesn’t change the data, it only adds a new possible value that can be used in the category column.
Add multiple new categories to Pandas categorical column
To add multiple categories to a categorical field, pass the list of new categories as an argument to the add_categories()
function. Let’s add “XL” and “XXL” as possible values to the “Shirt Size” column.
# add multiple categories "Shirt Size" column df["Shirt Size"] = df["Shirt Size"].cat.add_categories(["XL", "XXL"]) # display categories print(df["Shirt Size"].cat.categories)
Output:
Index(['M', 'S', 'L', 'XL', 'XXL'], dtype='object')
You can see that “XL” and “XXL” are now added as category values for the “Shirt Size” column.
When to use the .cat
accessor?
Notice that we used the .cat
accessor to apply the add_categories()
function in the above examples. This is because we’re applying a Pandas category function to a Pandas series.
If, on the other hand, you want to apply the same category type function to a Pandas Categorical object, you can directly apply the function without using the .cat
accessor. Let’s look at an example.
# create a Categorical object shirt_sizes = pd.Categorical(["S", "S", "M", "S", "M"]) # add new category shirt_sizes = shirt_sizes.add_categories("L") # display the object print(shirt_sizes)
Output:
['S', 'S', 'M', 'S', 'M'] Categories (3, object): ['M', 'S', 'L']
Here, we first create a Pandas Categorical object storing the shirt sizes. We then use the add_categories()
function to add an additional category value, “L”. Notice that here we didn’t use the .cat
accessor.
You might also be interested in –
- Get List of Categories in Pandas Category Column
- Pandas – Rename Categories in Category Column
- Pandas – Change Column Type to Category
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.