The category
dtype is used to store categorical values in Pandas. In this tutorial, we will look at how to compare data in a category type column in Pandas.
Compare categorical data with other objects in Pandas
There are three common use-cases for comparing category type data with other objects in Pandas.
- Compare category data with a scaler value.
- Equality (== or !=) comparison to a list-like object (for example, list, series, etc.) of the same length as categorical data.
- Compare category data with another category data.
Examples
Let’s look at the above comparisons with the help of some examples.
Category data vs scaler value
Let’s create a Pandas series of category
dtype storing the class information of some students in a primary school.
import pandas as pd # category type pandas series grades = pd.Series([3, 2, 4, 1, 5]).astype('category') # display the series print(grades)
Output:
0 3 1 2 2 4 3 1 4 5 dtype: category Categories (5, int64): [1, 2, 3, 4, 5]
Note that the possible category values for the above series are 1, 2, 3, 4, and 5.
Let’s compare this series with a scaler value, for example, 4.
print(grades == 4)
Output:
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
0 False 1 False 2 True 3 False 4 False dtype: bool
Here, we check for the equality of the values in the above series with the scaler value 4. We get True
if the value in the series is equal to 4 and False
otherwise.
Let’s use the <
operator this time. This will tells us whether the class is less than 4 or not.
print(grades < 4)
Output:
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Input In [6], in <module> ----> 1 print(grades < 4) ... TypeError: Unordered Categoricals can only compare equality or not
We get an error. Because unordered categoricals can only be compared for equality. To use other comparisons, we’ll first have to make the categorical data ordered.
# set category order grades = grades.cat.set_categories([1,2,3,4,5], ordered=True) # compare with scaler using < print(grades < 4)
Output:
0 True 1 True 2 False 3 True 4 False dtype: bool
Now, we get the expected output. True
if the value in the series is less than 4
and False
otherwise.
Category data vs a list-like object
When comparing a category type data with a list-like object of the same length, we can compare for equality (that is, ==
or !=
).
Let’s look at an example. First, let’s print out the Pandas series already created above.
# display the series print(grades)
Output:
0 3 1 2 2 4 3 1 4 5 dtype: category Categories (5, int64): [1 < 2 < 3 < 4 < 5]
Let’s now compare it for equality with a list of the same length.
# list of some grade values ls = [1,1,1,1,2] # compare category series with list print(grades == ls)
Output:
0 False 1 False 2 False 3 True 4 False dtype: bool
Here, we use the Pandas series created above and compare it with a list of values of the sample length. You can see that we get the result as boolean series. We would get the same result had the above series been unordered.
Note that you cannot use non-equality comparisons (for example, <
, >
, <=
, >=
, etc.) for comparing a category type series with a list-like object.
Category data vs Category data
To compare two category type series with one another, it’s important that the categories (possible category values) for both the series must be the same.
Compare two unordered category type Pandas series
Note that unordered category type series can only be compared for equality with another unordered category type series.
# create two category type pandas series grades1 = pd.Series([1, 2, 3, 3]).astype('category') grades2 = pd.Series([1, 2, 3, 1]).astype('category') # compare category data for equality print(grades1 == grades2)
Output:
0 True 1 True 2 True 3 False dtype: bool
Here, we compare two unordered category type series with different values but having the same categories (1, 2, and 3). We get True
where the data is the same and False
otherwise.
Compare two ordered category type Pandas series
Let’s now compare two ordered category type series. Again, the caveat is that the categories for both the category series should be the same.
# create two ordered category type pandas series grades1 = pd.Series([1, 2, 3, 3]).astype('category') grades1 = grades1.cat.set_categories([1,2,3], ordered=True) grades2 = pd.Series([1, 2, 3, 1]).astype('category') grades2 = grades2.cat.set_categories([1,2,3], ordered=True) # compare category data for equality print(grades1 == grades2)
Output:
0 True 1 True 2 True 3 False dtype: bool
We get the expected output.
Ordered category fields (with the same categories and the category order) can also be compared with non-equality operators.
# compare category data for equality print(grades1 > grades2)
Output:
0 False 1 False 2 False 3 True dtype: bool
You might also be interested in –
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.