compare two category type columns in pandas

Compare Category Type Data in Pandas

The category dtype is used to store categorical values in Pandas. In this tutorial, we will look at how to compare data in a category type column in Pandas.

Compare categorical data with other objects in Pandas

There are three common use-cases for comparing category type data with other objects in Pandas.

  • Compare category data with a scaler value.
  • Equality (== or !=) comparison to a list-like object (for example, list, series, etc.) of the same length as categorical data.
  • Compare category data with another category data.

Examples

Let’s look at the above comparisons with the help of some examples.

Category data vs scaler value

Let’s create a Pandas series of category dtype storing the class information of some students in a primary school.

import pandas as pd

# category type pandas series
grades = pd.Series([3, 2, 4, 1, 5]).astype('category')
# display the series
print(grades)

Output:

0    3
1    2
2    4
3    1
4    5
dtype: category
Categories (5, int64): [1, 2, 3, 4, 5]

Note that the possible category values for the above series are 1, 2, 3, 4, and 5.

Let’s compare this series with a scaler value, for example, 4.

print(grades == 4)

Output:

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

0    False
1    False
2     True
3    False
4    False
dtype: bool

Here, we check for the equality of the values in the above series with the scaler value 4. We get True if the value in the series is equal to 4 and False otherwise.

Let’s use the < operator this time. This will tells us whether the class is less than 4 or not.

print(grades < 4)

Output:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [6], in <module>
----> 1 print(grades < 4)
...
TypeError: Unordered Categoricals can only compare equality or not

We get an error. Because unordered categoricals can only be compared for equality. To use other comparisons, we’ll first have to make the categorical data ordered.

# set category order
grades = grades.cat.set_categories([1,2,3,4,5], ordered=True)
# compare with scaler using <
print(grades < 4)

Output:

0     True
1     True
2    False
3     True
4    False
dtype: bool

Now, we get the expected output. True if the value in the series is less than 4 and False otherwise.

Category data vs a list-like object

When comparing a category type data with a list-like object of the same length, we can compare for equality (that is, == or !=).

Let’s look at an example. First, let’s print out the Pandas series already created above.

# display the series
print(grades)

Output:

0    3
1    2
2    4
3    1
4    5
dtype: category
Categories (5, int64): [1 < 2 < 3 < 4 < 5]

Let’s now compare it for equality with a list of the same length.

# list of some grade values
ls = [1,1,1,1,2]
# compare category series with list
print(grades == ls)

Output:

0    False
1    False
2    False
3     True
4    False
dtype: bool

Here, we use the Pandas series created above and compare it with a list of values of the sample length. You can see that we get the result as boolean series. We would get the same result had the above series been unordered.

Note that you cannot use non-equality comparisons (for example, <, >, <=, >=, etc.) for comparing a category type series with a list-like object.

Category data vs Category data

To compare two category type series with one another, it’s important that the categories (possible category values) for both the series must be the same.

Compare two unordered category type Pandas series

Note that unordered category type series can only be compared for equality with another unordered category type series.

# create two category type pandas series
grades1 = pd.Series([1, 2, 3, 3]).astype('category')
grades2 = pd.Series([1, 2, 3, 1]).astype('category')
# compare category data for equality
print(grades1 == grades2)

Output:

0     True
1     True
2     True
3    False
dtype: bool

Here, we compare two unordered category type series with different values but having the same categories (1, 2, and 3). We get True where the data is the same and False otherwise.

Compare two ordered category type Pandas series

Let’s now compare two ordered category type series. Again, the caveat is that the categories for both the category series should be the same.

# create two ordered category type pandas series
grades1 = pd.Series([1, 2, 3, 3]).astype('category')
grades1 = grades1.cat.set_categories([1,2,3], ordered=True)

grades2 = pd.Series([1, 2, 3, 1]).astype('category')
grades2 = grades2.cat.set_categories([1,2,3], ordered=True)

# compare category data for equality
print(grades1 == grades2)

Output:

0     True
1     True
2     True
3    False
dtype: bool

We get the expected output.

Ordered category fields (with the same categories and the category order) can also be compared with non-equality operators.

# compare category data for equality
print(grades1 > grades2)

Output:

0    False
1    False
2    False
3     True
dtype: bool

You might also be interested in –


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

Scroll to Top