Skip to Content

Pandas – Change Column Type to Category

In this tutorial, we will look at how to convert a column in a pandas dataframe to the category type with the help of some examples.

How to convert object type to category in Pandas?

You can use the Pandas astype() function to convert the data type of one or more columns. Pass “category” as an argument to convert to the category dtype. The following is the syntax –

# convert column "Col" to category dtype
df["Col"] = df["Col"].astype("category")

Note that the category values by default, are unordered. You can, however, specify an order for the category.

Examples

Let’s look at some examples of converting column(s) to the category type. First, we will create a dataframe that we’ll be using throughout this tutorial.

import pandas as pd

# create a dataframe 
df = pd.DataFrame({
    "Name": ["Harold", "Carter", "John", "Shaw", "Lionel"],
    "Age": [22, 20, 21, 19, 18],
    "Shirt Size": ["M", "M", "L", "S", "XL"],
    "Major": ["CS", "English", "Math", "Chemistry", "Economics"],
    "Admission Year": [2018, 2020, 2019, 2020, 2021]
})

# display the dataframe
print(df)

Output:

     Name  Age Shirt Size      Major  Admission Year
0  Harold   22          M         CS            2018
1  Carter   20          M    English            2020
2    John   21          L       Math            2019
3    Shaw   19          S  Chemistry            2020
4  Lionel   18         XL  Economics            2021

We now have a dataframe containing information like the name, age, t-shirt size, major, and the admission year for some students in a university.

Let’s look at the dtypes for all the columns in the dataframe.

print(df.dtypes)

Output:

Name              object
Age                int64
Shirt Size        object
Major             object
Admission Year     int64
dtype: object

Columns “Name”, “Shirt Size”, and “Major” are of object type whereas the columns “Age” and “Admission Year” are of int type.

Convert object type column to category type in pandas

Let’s convert the t-shirt size column to category dtype using the pandas astype() function.

# convert "Shirt Size" to category type
df["Shirt Size"] = df["Shirt Size"].astype("category")
# display the column
print(df["Shirt Size"])

Output:

0     M
1     M
2     L
3     S
4    XL
Name: Shirt Size, dtype: category
Categories (4, object): ['L', 'M', 'S', 'XL']

You can see that the “t-shirt-size” column is now of category type. Note that, it is unordered by default.

# check if category column is ordered
df["Shirt Size"].cat.ordered

Output:

False

Here we use the ordered property to check if a category is ordered or not.

Convert multiple columns to category type

You can also use the astype() function to change the dtype of more than one column. For example, let’s change the columns “Admission Year” and “Major” to category dtype.

# convert "Major" and "Admission Year" to category type
df[["Major", "Admission Year"]] = df[["Major", "Admission Year"]].astype("category")
# display the column dtypes
print(df[["Major", "Admission Year"]].dtypes)

Output:

Major             category
Admission Year    category
dtype: object

The “Admission Year” and the “Major” columns are now of category dtype.

You might also be interested in –


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. In the past, he's worked as a Data Scientist for ZS and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.