In this tutorial, we will look at how to get the max value in an ordered categorical column (or series) in Pandas.
How to get the max value in an ordered categorical column?
You can apply the Pandas series max()
function to get the max value in a categorical Pandas column (or a series). The following is the syntax –
# s is a categorical type ordered pandas series s.max()
It returns the maximum value in the series based on the categorical order. If the categorical data is not ordered, it will result in a TypeError
.
Examples
Let’s look at some examples of using the above method to get the maximum value in a category type series in Pandas.
Applying the max()
function to an unordered categorical field in Pandas
First, let’s see what happens if we apply the max()
function to an unordered categorical type series in Pandas.
import pandas as pd # create a dataframe df = pd.DataFrame({ "Name": ["Tim", "Sarah", "Hasan", "Jyoti", "Jack"], "Shirt Size": ["M", "S", "M", "M", "L"] }) # change to category dtype df["Shirt Size"] = df["Shirt Size"].astype("category") # get the max value in shirt size print(df["Shirt Size"].max())
Output:
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Input In [6], in <module> 9 df["Shirt Size"] = df["Shirt Size"].astype("category") 10 # get the max value in shirt size ---> 11 print(df["Shirt Size"].max()) TypeError: Categorical is not ordered for operation max you can use .as_ordered() to change the Categorical to an ordered one
We get a TypeError
. Here, we first create a Pandas dataframe with names and shirt sizes of students in a university. We then convert the “Shirt Size” column to category
dtype. And finally, we apply the max()
function to the “Shirt Size” column.
All categorical fields, by default, are unordered unless specified otherwise. We get a TypeError
because there’s no way to compare one categorical value with another for an unordered series and thus computing the max value doesn’t make any sense.
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
Max value in an ordered category field in Pandas
Let’s now modify the “Shirt Size” column to an ordered categorical field with the order of sizes as “S” < “M” < “L”.
# set and order categories for the shirt size column df["Shirt Size"] = df["Shirt Size"].cat.set_categories(["S", "M", "L"], ordered=True) # display the shirt size column print(df["Shirt Size"])
Output:
0 M 1 S 2 M 3 M 4 L Name: Shirt Size, dtype: category Categories (3, object): ['S' < 'M' < 'L']
The “Shirt Size” column is now ordered. Let’s now get the maximum value in the column with the max()
function.
# get the max value in shirt size print(df["Shirt Size"].max())
Output:
L
We get “L” as the maximum value.
Let’s look at another example. What if the possible values in a categorical series are “S”, “M”, and “L” but the data contains only “S” and “M”, what do you think we’d get on applying the max()
function?
# create a pandas series shirt_size = pd.Series(["M", "S", "S", "M"], dtype="category") # set and order categories shirt_size = shirt_size.cat.set_categories(["S", "M", "L"], ordered=True) # get the max value in the series print(shirt_size.max())
Output:
M
We get “M” as the maximum value because it is the maximum value that occurs in our data.
You might also be interested in –
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.