Skip to Content

Count Frequency of Category Values in Pandas

In this tutorial, we will look at how to count the frequency of values in a Pandas category type column or series with the help of some examples.

How to get a count of category values in a Pandas series?

You can apply the Pandas series value_counts() function on category type Pandas series as well to get the count of each value in the series. The following is the syntax –

# count of each category value
df["cat_col"].value_counts()

It returns the frequency for each category value in the series. It also shows categories (with count as 0) even if they are not present in the series.

Examples

Let’s look at some examples of getting a count of each value in a categorical column in Pandas. First, let’s create a dataframe that we will be using throughout this tutorial –

import pandas as pd

# create pandas dataframe
df = pd.DataFrame({
    "Year": [2015, 2016, 2017, 2018, 2019],
    "Winner": ["A", "B", "B", "A", "A"],
    "Runners-up": ["C", "C", "A", "B", "C"]
})
# convert to category type
df["Winner"] = df["Winner"].astype("category")
df["Runners-up"] = df["Runners-up"].astype("category")
# display the dataframe
print(df)

Output:

   Year Winner Runners-up
0  2015      A          C
1  2016      B          C
2  2017      B          A
3  2018      A          B
4  2019      A          C

We now have a dataframe containing the information on the winners and the runners-up of a tri-university sports competition. You can see that the column, “Winner” is of category dtype and contains the winning university’s name for the given year.

Let’s now see how many times each university won from the above dataframe. For this, we apply the Pandas value_counts() function on the “Winner” column.

# count of each category in Winner column
print(df["Winner"].value_counts())

Output:

A    3
B    2
Name: Winner, dtype: int64

You can see that university “A” won three times and university “B” won two times.

Notice that we do not get values for the university “C”. This is because “C” does not occur in the “Winners” column. The categories are inferred by the values present in the column which are just “A” and “B”.

# display the Winner column
print(df["Winner"])

Output:

0    A
1    B
2    B
3    A
4    A
Name: Winner, dtype: category
Categories (2, object): ['A', 'B']

Now, you can explicitly specify the categories for a categorical column (or series) in Pandas. Let’s also add “C” as one of the valid categories for the “Winner” column using the add_categories() function.

# add "C" to the categories for the Winner column
df["Winner"] = df["Winner"].cat.add_categories("C")
# display the Winner column
print(df["Winner"])

Output:

0    A
1    B
2    B
3    A
4    A
Name: Winner, dtype: category
Categories (3, object): ['A', 'B', 'C']

Note that we’re not changing any of the records as such, we’re just adding an additional possible value for this categorical column.

Now, if you apply the Pandas value_counts() function, you get the count of occurrence of each category value irrespective of whether it occurs in the series or not.

# count of each category in Winner column
print(df["Winner"].value_counts())

Output:

A    3
B    2
C    0
Name: Winner, dtype: int64

Now we get the number of times each university won the tournament. University “A” won three times, “B” won two times and “C” won 0 times.

You might also be interested in –


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush

    Piyush is a data scientist passionate about using data to understand things better and make informed decisions. In the past, he's worked as a Data Scientist for ZS and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.