count rows in each group of pandas groupby

Pandas Groupby – Count of rows in each group

Pandas groupby is a great way to group values of a dataframe on one or more column values. When performing such operations, you might need to know the number of rows in each group. In this tutorial, we will look at how to count the number of rows in each group of a pandas groupby object.

count rows in each group of pandas groupby

You can use the pandas groupby size() function to count the number of rows in each group of a groupby object. The following is the syntax:

df.groupby('Col1').size()

It returns a pandas series with the count of rows for each group.

It determines the number of rows by determining the size of each group (similar to how to get the size of a dataframe, e.g. len(df)) hence is not affected by NaN values in the dataset. That is, it gives a count of all rows for each group whether they are NaN or not. (See the examples below)

Let’s look at some examples of counting the number of rows in each group of a pandas groupby object. First, we will create a sample dataframe that we will be using throughout this tutorial.

import pandas as pd

# create a dataframe
df = pd.DataFrame({
    'Team' : ['A', 'A', 'B', 'B', 'B', 'C'],
    'Points': [15, 18, 11, 17, 10, 13]
})
# display the dataframe
df

Output:

team points dataframe

We now have a dataframe of the top scorers in a debating competition from different teams.

Let’s group the above dataframe on the column “Team” and get the number of rows in each group using the groupby size() function.

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

# size of each group
print(df.groupby('Team').size())

Output:

Team
A    2
B    3
C    1
dtype: int64

You can see that we get the count of rows for each group. There are two rows for team A, three rows for team B, and one row for team C in the dataframe df.

You can also use the pandas groupby count() function which gives the “count” of values in each column for each group. For example, let’s group the dataframe df on the “Team” column and apply the count() function.

# count in each group
print(df.groupby('Team').count())

Output:

      Points
Team        
A          2
B          3
C          1

We get a dataframe of counts of values for each group and each column. Note that counts are similar to the row sizes we got above. This is because there are no NaN values present in the dataframe.

Alternatively, you can use the pandas value_counts() function if you’re grouping by a single column and want the counts.

# using value_counts()
print(df['Team'].value_counts())

Output:

B    3
A    2
C    1
Name: Team, dtype: int64

Note that the count() function is fundamentally different than the size() function as it gives you the “count of values in each group” which may not necessarily be equal to the number of rows. This is because the count() function will not count any NaN values it encounters. Let’s see the difference between the two through an example.

First, let’s create a new dataframe so that it has some NaN values.

import numpy as np

# create a dataframe
df = pd.DataFrame({
    'Team' : ['A', 'A', 'B', 'B', 'B', 'C'],
    'Points': [15, 18, 11, 17, np.nan, 13]
})
# display the dataframe
df

Output:

team points dataframe with NaN values

Now, let’s group the dataframe on “Team” and apply the size() function.

# size of each group
print(df.groupby('Team').size())

Output:

Team
A    2
B    3
C    1
dtype: int64

Let’s group the dataframe on “Team” and apply the count() function.

# count in each group
print(df.groupby('Team').count())

Output:

      Points
Team        
A          2
B          2
C          1

You can see that with the count() function we only got the count of non-null values for each group whereas with the size() function we got the actual number of rows for each group.

Thus, if you aim to get the number of rows in each group, use the size() function on the groupby object.

For more on the pandas groupby size() function, refer to its documentation.

Summary – Count Rows in Each Group

In this tutorial, we looked at how we can get the count of rows in each group of a groupby object in Pandas. The following are the key takeaways –

  • Use the pandas groupby size() function to count rows for each group. It counts the rows irrespective of their values.
  • In certain use-cases, the pandas groupby count() can also be used but it only counts the non-Nan values in each group.

With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having numpy version 1.18.5 and pandas version 1.0.5


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Tutorials on common row operations in pandas –

Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

Scroll to Top