In this tutorial, we will look at how to convert a column in a pandas dataframe to the category type with the help of some examples.
How to convert object type to category in Pandas?
You can use the Pandas astype()
function to convert the data type of one or more columns. Pass “category” as an argument to convert to the category
dtype. The following is the syntax –
# convert column "Col" to category dtype df["Col"] = df["Col"].astype("category")
Note that the category values by default, are unordered. You can, however, specify an order for the category.
Examples
Let’s look at some examples of converting column(s) to the category type. First, we will create a dataframe that we’ll be using throughout this tutorial.
import pandas as pd # create a dataframe df = pd.DataFrame({ "Name": ["Harold", "Carter", "John", "Shaw", "Lionel"], "Age": [22, 20, 21, 19, 18], "Shirt Size": ["M", "M", "L", "S", "XL"], "Major": ["CS", "English", "Math", "Chemistry", "Economics"], "Admission Year": [2018, 2020, 2019, 2020, 2021] }) # display the dataframe print(df)
Output:
Name Age Shirt Size Major Admission Year 0 Harold 22 M CS 2018 1 Carter 20 M English 2020 2 John 21 L Math 2019 3 Shaw 19 S Chemistry 2020 4 Lionel 18 XL Economics 2021
We now have a dataframe containing information like the name, age, t-shirt size, major, and the admission year for some students in a university.
Let’s look at the dtypes for all the columns in the dataframe.
print(df.dtypes)
Output:
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
Name object Age int64 Shirt Size object Major object Admission Year int64 dtype: object
Columns “Name”, “Shirt Size”, and “Major” are of object type whereas the columns “Age” and “Admission Year” are of int type.
Convert object type column to category type in pandas
Let’s convert the t-shirt size column to category
dtype using the pandas astype()
function.
# convert "Shirt Size" to category type df["Shirt Size"] = df["Shirt Size"].astype("category") # display the column print(df["Shirt Size"])
Output:
0 M 1 M 2 L 3 S 4 XL Name: Shirt Size, dtype: category Categories (4, object): ['L', 'M', 'S', 'XL']
You can see that the “t-shirt-size” column is now of category type. Note that, it is unordered by default.
# check if category column is ordered df["Shirt Size"].cat.ordered
Output:
False
Here we use the ordered
property to check if a category is ordered or not.
Convert multiple columns to category type
You can also use the astype()
function to change the dtype of more than one column. For example, let’s change the columns “Admission Year” and “Major” to category dtype.
# convert "Major" and "Admission Year" to category type df[["Major", "Admission Year"]] = df[["Major", "Admission Year"]].astype("category") # display the column dtypes print(df[["Major", "Admission Year"]].dtypes)
Output:
Major category Admission Year category dtype: object
The “Admission Year” and the “Major” columns are now of category dtype.
You might also be interested in –
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.