Pandas Groupby – Count of rows in each group

Pandas groupby is a great way to group values of a dataframe on one or more column values. When performing such operations, it might happen that you need to know the number of rows in each group. In this tutorial, we will look at how to count the number of rows in each group of a pandas groupby object.

You can use the pandas groupby size() function to count the number of rows in each group of a groupby object. The following is the syntax:

df.groupby('Col1').size()

It returns a pandas series with the count of rows for each group. It determines the number of rows by determining the size of each group (similar to how to get the size of a dataframe, e.g. len(df)) hence is not affected by NaN values in the dataset. That is, it gives a count of all rows for each group whether they are NaN or not. (See the examples below)

Let’s look at some examples of counting the number of rows in each group of a pandas groupby object. First, we will create a sample dataframe that we will be using throughout this tutorial for demonstrating the usage.

import pandas as pd

# create a dataframe
df = pd.DataFrame({
    'Team' : ['A', 'A', 'B', 'B', 'B', 'C'],
    'Points': [15, 18, 11, 17, 10, 13]
})
# display the dataframe
print(df)

Output:

  Team  Points
0    A      15
1    A      18
2    B      11
3    B      17
4    B      10
5    C      13

We now have a dataframe of the top scorers in a debating competitions from across teams.

Let’s group the above dataframe on the column “Team” and get the number of rows in each group using the groupby size() function.

# size of each group
print(df.groupby('Team').size())

Output:

Team
A    2
B    3
C    1
dtype: int64

You can see that we get the count of rows for each group. There are two rows for team A, three rows for team B, and one row for team C in the dataframe df.

You can also use the pandas groupby count() function which gives the “count” of values in each column for each group. For example, let’s group the dataframe df on the “Team” column and apply the count() function.

# count in each group
print(df.groupby('Team').count())

Output:

      Points
Team        
A          2
B          3
C          1

We get a dataframe of counts of values for each group and each column. Note that counts are similar to the row sizes we got above. This is because there are no NaN values present in the dataframe. Alternatively, you can use the pandas value_counts() function if you’re grouping by a single column and want the counts.

# using value_counts()
print(df['Team'].value_counts())

Output:

B    3
A    2
C    1
Name: Team, dtype: int64

Note that the count() function is fundamentally different than the size() function as it gives you the “count of values in each group” which may not necessarily be equal to the number of rows. This is because the count() function will not count any NaN values it encounters. Let’s see the difference between the two through an example.

First, let’s create a new dataframe so that it has some NaN values.

import numpy as np

# create a dataframe
df = pd.DataFrame({
    'Team' : ['A', 'A', 'B', 'B', 'B', 'C'],
    'Points': [15, 18, 11, 17, np.nan, 13]
})
# display the dataframe
print(df)

Output:

  Team  Points
0    A    15.0
1    A    18.0
2    B    11.0
3    B    17.0
4    B     NaN
5    C    13.0

Now, let’s group the dataframe on “Team” and apply the size() function.

# size of each group
print(df.groupby('Team').size())

Output:

Team
A    2
B    3
C    1
dtype: int64

Let’s group the dataframe on “Team” and apply the count() function.

# count in each group
print(df.groupby('Team').count())

Output:

      Points
Team        
A          2
B          2
C          1

You can see that with the count() function we only got the count of non-null values for each group whereas with the size() function we got the actual number of rows for each group.

Thus, if your aim is to get the number of rows in group, use the size() function on the groupby object.

For more on the pandas groupby size() function, refer to its documentation.

With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having numpy version 1.18.5 and pandas version 1.0.5


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Tutorials on common row operations in pandas –