In this tutorial, we will look at how to convert a column in a pandas dataframe to the category type with the help of some examples.
How to convert object type to category in Pandas?
You can use the Pandas astype()
function to convert the data type of one or more columns. Pass “category” as an argument to convert to the category
dtype. The following is the syntax –
# convert column "Col" to category dtype df["Col"] = df["Col"].astype("category")
Note that the category values by default, are unordered. You can, however, specify an order for the category.
Examples
Let’s look at some examples of converting column(s) to the category type. First, we will create a dataframe that we’ll be using throughout this tutorial.
import pandas as pd # create a dataframe df = pd.DataFrame({ "Name": ["Harold", "Carter", "John", "Shaw", "Lionel"], "Age": [22, 20, 21, 19, 18], "Shirt Size": ["M", "M", "L", "S", "XL"], "Major": ["CS", "English", "Math", "Chemistry", "Economics"], "Admission Year": [2018, 2020, 2019, 2020, 2021] }) # display the dataframe print(df)
Output:
Highlighted programs for you
Flatiron School
Flatiron School
University of Maryland Global Campus
University of Maryland Global Campus
Creighton University
Creighton University
Name Age Shirt Size Major Admission Year 0 Harold 22 M CS 2018 1 Carter 20 M English 2020 2 John 21 L Math 2019 3 Shaw 19 S Chemistry 2020 4 Lionel 18 XL Economics 2021
We now have a dataframe containing information like the name, age, t-shirt size, major, and the admission year for some students in a university.
Let’s look at the dtypes for all the columns in the dataframe.
print(df.dtypes)
Output:
Name object Age int64 Shirt Size object Major object Admission Year int64 dtype: object
Columns “Name”, “Shirt Size”, and “Major” are of object type whereas the columns “Age” and “Admission Year” are of int type.
Convert object type column to category type in pandas
Let’s convert the t-shirt size column to category
dtype using the pandas astype()
function.
# convert "Shirt Size" to category type df["Shirt Size"] = df["Shirt Size"].astype("category") # display the column print(df["Shirt Size"])
Output:
0 M 1 M 2 L 3 S 4 XL Name: Shirt Size, dtype: category Categories (4, object): ['L', 'M', 'S', 'XL']
You can see that the “t-shirt-size” column is now of category type. Note that, it is unordered by default.
# check if category column is ordered df["Shirt Size"].cat.ordered
Output:
False
Here we use the ordered
property to check if a category is ordered or not.
Convert multiple columns to category type
You can also use the astype()
function to change the dtype of more than one column. For example, let’s change the columns “Admission Year” and “Major” to category dtype.
# convert "Major" and "Admission Year" to category type df[["Major", "Admission Year"]] = df[["Major", "Admission Year"]].astype("category") # display the column dtypes print(df[["Major", "Admission Year"]].dtypes)
Output:
Major category Admission Year category dtype: object
The “Admission Year” and the “Major” columns are now of category dtype.
You might also be interested in –
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.