In this tutorial, we will look at how to calculate the average for each row in a pandas dataframe with the help of some examples.
How to find the mean row wise in Pandas?
To get the average for each row in a pandas dataframe, use the pandas dataframe mean()
function with axis=1
. The following is the syntax:
# get mean for each row df.mean(axis=1)
It returns the mean for each row with axis=1
. Note that the pandas mean() function calculates the mean for columns and not rows by default. Thus, make sure to pass 1 to the axis
parameter if you want the get the average for each row.
Examples
Let’s look at some examples of using the above syntax. First, we will create a dataframe that we will be using throughout this tutorial.
import pandas as pd # create a pandas dataframe scores_df = pd.DataFrame({ 'Name': ['Sam', 'Soniya', 'Neeraj'], 'Maths': [49, 81, 83], 'History': [88, 70, 76], 'Science': [61, 76, 90] }) # display the dataframe print(scores_df)
Output:
Name Maths History Science 0 Sam 49 88 61 1 Soniya 81 70 76 2 Neeraj 83 76 90
We created a dataframe with three rows, each storing the scores of a student in the subjects – Maths, History, and Science.
1. Average for each row in the dataframe
To get the mean for each row in the dataframe, apply the pandas dataframe mean() function with axis=1. For example, let’s find the average score for each of the students in the dataframe scores_df
# get mean for each row print(scores_df.mean(axis=1))
Output:
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
0 66.000000 1 75.666667 2 83.000000 dtype: float64
We get the mean for each row as a pandas series.
Let’s add a new column to the scores_df dataframe representing the mean scores for each student.
# add new column with average score of each student scores_df['Average Score'] = scores_df.mean(axis=1) # display the dataframe print(scores_df)
Output:
Name Maths History Science Average Score 0 Sam 49 88 61 66.000000 1 Soniya 81 70 76 75.666667 2 Neeraj 83 76 90 83.000000
2. Average if the row contains NaN values
By default, the pandas mean() function doesn’t take into account the NA values when computing the average. To demonstrate this, let’s create a scores dataframe with some missing values.
import numpy as np # dataframe with some misssing values scores_df = pd.DataFrame({ 'Name': ['Sam', 'Soniya', 'Neeraj'], 'Maths': [49, np.nan, 83], 'History': [np.nan, 70, 76], 'Science': [61, np.nan, 90] }) # display the dataframe print(scores_df)
Output:
Name Maths History Science 0 Sam 49.0 NaN 61.0 1 Soniya NaN 70.0 NaN 2 Neeraj 83.0 76.0 90.0
Now let’s see how the result will look like when getting the average for each row.
# add new column with average score of each student scores_df['Average Score'] = scores_df.mean(axis=1) # display the dataframe print(scores_df)
Output:
Name Maths History Science Average Score 0 Sam 49.0 NaN 61.0 55.0 1 Soniya NaN 70.0 NaN 70.0 2 Neeraj 83.0 76.0 90.0 83.0
You can see that the average value for each row doesn’t take the NaN values into account.
If you want to include the NaN values when calculating the average, pass skipna=False
to the pandas mean() function.
# add new column with average score of each student scores_df['Average Score'] = scores_df.mean(axis=1, skipna=False) # display the dataframe print(scores_df)
Output:
Name Maths History Science Average Score 0 Sam 49.0 NaN 61.0 NaN 1 Soniya NaN 70.0 NaN NaN 2 Neeraj 83.0 76.0 90.0 83.0
We get a NaN in the average if any of the values in the row is NaN.
For more on the pandas mean() function, refer to its documentation.
With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having numpy version 1.18.5 and pandas version 1.0.5
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.