In this tutorial, we will look at how to get the sum of one or more columns in a pandas dataframe.
How to calculate the sum of pandas column?
You can use the pandas series sum()
function to get the sum of a single column or the pandas dataframe sum()
function to get the sum of each column in the dataframe. The following is the syntax:
# sum of single column df['Col'].sum() # sum of all columns in dataframe df.sum()
Let’s create a sample dataframe that we will be using throughout this tutorial to demonstrate the usage of the methods and syntax mentioned.
import pandas as pd # create a dataframe df = pd.DataFrame({ 'sepal_length': [5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0], 'sepal_width': [3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4], 'petal_length': [1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5], 'petal_width': [0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2], 'sepices': ['setosa']*8 }) # display the dataframe print(df)
Output:
sepal_length sepal_width petal_length petal_width sepices 0 5.1 3.5 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa 2 4.7 3.2 1.3 0.2 setosa 3 4.6 3.1 1.5 0.2 setosa 4 5.0 3.6 1.4 0.2 setosa 5 5.4 3.9 1.7 0.4 setosa 6 4.6 3.4 1.4 0.3 setosa 7 5.0 3.4 1.5 0.2 setosa
The sample dataframe is taken form a section of the Iris dataset. This sample has petal and sepal dimensions of eight data points of the “Setosa” species.
Sum of a single column
You can use the pandas series sum()
function to get the sum of values in individual columns (which essentially are pandas series). For example, let’s get the sum of the “sepal_length” column in the above dataframe.
# sum of sepal_length column print(df['sepal_length'].sum())
Output:
39.3
You see that we get the total of all values in the “sepal_length” column as the scaler value 39.3.
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
Sum of more than one columns
To get the sum of multiple columns together, first, create a dataframe with the columns you want to calculate the sum for and then apply the pandas dataframe sum()
function. For example, let’s get the sum of the values in the columns “sepal_length” and “sepal_width”.
# sum of more than one columns print(df[['sepal_length', 'sepal_width']].sum())
Output:
sepal_length 39.3 sepal_width 27.1 dtype: float64
Here, we first created a subset of the dataframe “df” with only the columns “sepal_length” and “sepal_width” and then applied the sum function. You can see that we get the sum for both the columns. Note that we get the result as a pandas series.
Sum of all the columns
To get the sum of all the columns, use the same method as above but this time on the entire dataframe. Let’s use this function on the dataframe “df” created above.
# sum of all the columns print(df.sum(numeric_only=True))
Output:
sepal_length 39.3 sepal_width 27.1 petal_length 11.6 petal_width 1.9 dtype: float64
We get the sum of all the numerical columns present in the dataframe. Note that we passed numeric_only=True
to calculate the sum only for the numeric columns.
For more on the function, refer to its documentation.
You can similar methods to get the descriptive statistics like the mean, median, standard deviation, etc. of values in pandas columns.