Pandas groupby is a great way to group values of a dataframe on one or more column values. When performing such operations, you might need to know the number of rows in each group. In this tutorial, we will look at how to count the number of rows in each group of a pandas groupby object.
How to count rows in each group of pandas groupby?
You can use the pandas groupby size()
function to count the number of rows in each group of a groupby object. The following is the syntax:
df.groupby('Col1').size()
It returns a pandas series with the count of rows for each group.
It determines the number of rows by determining the size of each group (similar to how to get the size of a dataframe, e.g. len(df)) hence is not affected by NaN values in the dataset. That is, it gives a count of all rows for each group whether they are NaN or not. (See the examples below)
Example
Let’s look at some examples of counting the number of rows in each group of a pandas groupby object. First, we will create a sample dataframe that we will be using throughout this tutorial.
import pandas as pd # create a dataframe df = pd.DataFrame({ 'Team' : ['A', 'A', 'B', 'B', 'B', 'C'], 'Points': [15, 18, 11, 17, 10, 13] }) # display the dataframe df
Output:
We now have a dataframe of the top scorers in a debating competition from different teams.
Using pandas groupby size()
Let’s group the above dataframe on the column “Team” and get the number of rows in each group using the groupby size()
function.
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
# size of each group print(df.groupby('Team').size())
Output:
Team A 2 B 3 C 1 dtype: int64
You can see that we get the count of rows for each group. There are two rows for team A, three rows for team B, and one row for team C in the dataframe df.
Using pandas groupby count()
You can also use the pandas groupby count()
function which gives the “count” of values in each column for each group. For example, let’s group the dataframe df on the “Team” column and apply the count()
function.
# count in each group print(df.groupby('Team').count())
Output:
Points Team A 2 B 3 C 1
We get a dataframe of counts of values for each group and each column. Note that counts are similar to the row sizes we got above. This is because there are no NaN values present in the dataframe.
Alternatively, you can use the pandas value_counts()
function if you’re grouping by a single column and want the counts.
# using value_counts() print(df['Team'].value_counts())
Output:
B 3 A 2 C 1 Name: Team, dtype: int64
Difference between size()
and count()
Note that the count()
function is fundamentally different than the size()
function as it gives you the “count of values in each group” which may not necessarily be equal to the number of rows. This is because the count()
function will not count any NaN values it encounters. Let’s see the difference between the two through an example.
First, let’s create a new dataframe so that it has some NaN values.
import numpy as np # create a dataframe df = pd.DataFrame({ 'Team' : ['A', 'A', 'B', 'B', 'B', 'C'], 'Points': [15, 18, 11, 17, np.nan, 13] }) # display the dataframe df
Output:
Now, let’s group the dataframe on “Team” and apply the size()
function.
# size of each group print(df.groupby('Team').size())
Output:
Team A 2 B 3 C 1 dtype: int64
Let’s group the dataframe on “Team” and apply the count()
function.
# count in each group print(df.groupby('Team').count())
Output:
Points Team A 2 B 2 C 1
You can see that with the count() function we only got the count of non-null values for each group whereas with the size()
function we got the actual number of rows for each group.
Thus, if you aim to get the number of rows in each group, use the size()
function on the groupby object.
For more on the pandas groupby size() function, refer to its documentation.
Summary – Count Rows in Each Group
In this tutorial, we looked at how we can get the count of rows in each group of a groupby object in Pandas. The following are the key takeaways –
- Use the pandas groupby
size()
function to count rows for each group. It counts the rows irrespective of their values. - In certain use-cases, the pandas groupby
count()
can also be used but it only counts the non-Nan values in each group.
With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having numpy version 1.18.5 and pandas version 1.0.5
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.
Tutorials on common row operations in pandas –