In this tutorial, we will look at how to get the list of categories from a Pandas category column with the help of some examples.
How to get all possible category values in a category type column in Pandas?
Categorical data in Pandas has a categories
and an ordered
property. The categories
property stores the list of possible values for the categorical data.
You can use the .cat
accessor to get the categories
property of a category type column in Pandas. The following is the syntax –
# get all categories of a category type column df["Col"].cat.categories
It returns the list of possible category values in the column.
Examples
Let’s look at some examples of using the above method to get the list of categories in a category type column in Pandas. First, we’ll create a dataframe that we will be using throughout this tutorial.
import pandas as pd # create a dataframe df = pd.DataFrame({ "Name": ["Tim", "Sarah", "Hasan", "Jyoti", "Jack"], "Gender": ["Male", "Female", "Male", "Female", "Male"] }) # change to category dtype df["Gender"] = df["Gender"].astype("category") # display the dataframe print(df)
Output:
Name Gender 0 Tim Male 1 Sarah Female 2 Hasan Male 3 Jyoti Female 4 Jack Male
We now have a dataframe containing the names and the respective gender of some students in a university.
The “Gender” column is of category
type.
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
# display the Gender column print(df["Gender"])
Output:
0 Male 1 Female 2 Male 3 Female 4 Male Name: Gender, dtype: category Categories (2, object): ['Female', 'Male']
Let’s now get all the possible categories for this column using the syntax mentioned above.
# get all categories print(df["Gender"].cat.categories)
Output:
Index(['Female', 'Male'], dtype='object')
You can see that we get all the category values for the “Gender” column.
Let’s look at another example.
Let’s add an additional column to our dataframe to store the shirt size of the students.
# add column to store shirt size df["Shirt Size"] = ["M", "S", "M", "M", "S"] # change type to category df["Shirt Size"] = df["Shirt Size"].astype("category") # set and order categories for the shirt size column df["Shirt Size"] = df["Shirt Size"].cat.set_categories(["S", "M", "L"], ordered=True) # display the column print(df["Shirt Size"])
Output:
0 M 1 S 2 M 3 M 4 S Name: Shirt Size, dtype: category Categories (3, object): ['S' < 'M' < 'L']
Note that the “Shirt Size” contains categorical values that are ordered. Let’s print out the possible category values for this column.
# get all categories print(df["Shirt Size"].cat.categories)
Output:
Index(['S', 'M', 'L'], dtype='object')
You can see that we get “S”, “M”, and “L” as the possible values for the “Shirt Size” column. Note that the size “L” does not appear in our data but since it’s a possible value the resulting list includes it.
You might also be interested in –
- Pandas – Rename Categories in Category Column
- Count Frequency of Category Values in Pandas
- Pandas – Check If Category is Ordered
- Pandas – Change Column Type to Category
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.