Pandas groupby is a great way to group values of a dataframe on one or more column values. When performing such operations, it might happen that you need to know the number of rows in each group. In this tutorial, we will look at how to count the number of rows in each group of a pandas groupby object.

## How to count rows in each group of pandas groupby?

You can use the pandas groupby `size()`

function to count the number of rows in each group of a groupby object. The following is the syntax:

df.groupby('Col1').size()

It returns a pandas series with the count of rows for each group. It determines the number of rows by determining the size of each group (similar to how to get the size of a dataframe, e.g. *len(df)*) hence is not affected by NaN values in the dataset. That is, it gives a count of all rows for each group whether they are NaN or not. (See the examples below)

## Example

Let’s look at some examples of counting the number of rows in each group of a pandas groupby object. First, we will create a sample dataframe that we will be using throughout this tutorial for demonstrating the usage.

import pandas as pd # create a dataframe df = pd.DataFrame({ 'Team' : ['A', 'A', 'B', 'B', 'B', 'C'], 'Points': [15, 18, 11, 17, 10, 13] }) # display the dataframe print(df)

Output:

Team Points 0 A 15 1 A 18 2 B 11 3 B 17 4 B 10 5 C 13

We now have a dataframe of the top scorers in a debating competitions from across teams.

### Using pandas groupby `size()`

Let’s group the above dataframe on the column “Team” and get the number of rows in each group using the groupby `size()`

function.

# size of each group print(df.groupby('Team').size())

Output:

Team A 2 B 3 C 1 dtype: int64

You can see that we get the count of rows for each group. There are two rows for team A, three rows for team B, and one row for team C in the dataframe df.

### Using pandas groupby `count()`

You can also use the pandas groupby `count()`

function which gives the “count” of values in each column for each group. For example, let’s group the dataframe df on the “Team” column and apply the `count()`

function.

# count in each group print(df.groupby('Team').count())

Output:

Points Team A 2 B 3 C 1

We get a dataframe of counts of values for each group and each column. Note that counts are similar to the row sizes we got above. This is because there are no NaN values present in the dataframe. Alternatively, you can use the pandas `value_counts()`

function if you’re grouping by a single column and want the counts.

# using value_counts() print(df['Team'].value_counts())

Output:

B 3 A 2 C 1 Name: Team, dtype: int64

### Difference between `size()`

and `count()`

Note that the `count()`

function is fundamentally different than the `size()`

function as it gives you the “count of values in each group” which may not necessarily be equal to the number of rows. This is because the `count()`

function will not count any NaN values it encounters. Let’s see the difference between the two through an example.

First, let’s create a new dataframe so that it has some NaN values.

import numpy as np # create a dataframe df = pd.DataFrame({ 'Team' : ['A', 'A', 'B', 'B', 'B', 'C'], 'Points': [15, 18, 11, 17, np.nan, 13] }) # display the dataframe print(df)

Output:

Team Points 0 A 15.0 1 A 18.0 2 B 11.0 3 B 17.0 4 B NaN 5 C 13.0

Now, let’s group the dataframe on “Team” and apply the `size()`

function.

# size of each group print(df.groupby('Team').size())

Output:

Team A 2 B 3 C 1 dtype: int64

Let’s group the dataframe on “Team” and apply the `count()`

function.

# count in each group print(df.groupby('Team').count())

Output:

Points Team A 2 B 2 C 1

You can see that with the count() function we only got the count of non-null values for each group whereas with the `size()`

function we got the actual number of rows for each group.

Thus, if your aim is to get the number of rows in group, use the `size()`

function on the groupby object.

For more on the pandas groupby size() function, refer to its documentation.

With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having numpy version 1.18.5 and pandas version 1.0.5

**Subscribe to our newsletter for more informative guides and tutorials. ****We do not spam and you can opt out any time.**

Tutorials on common row operations in pandas –