Pandas – Get Sum of one or more Columns

In this tutorial, we will look at how to get the sum of one or more columns in a pandas dataframe.

Sum of column values

You can use the pandas series sum() function to get the sum of a single column or the pandas dataframe sum() function to get the sum of each column in the dataframe. The following is the syntax:

# sum of single column
df['Col'].sum()
# sum of all columns in dataframe
df.sum()

Let’s create a sample dataframe that we will be using throughout this tutorial to demonstrate the usage of the methods and syntax mentioned.

import pandas as pd

# create a dataframe
df = pd.DataFrame({
    'sepal_length': [5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0],
    'sepal_width': [3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4],
    'petal_length': [1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5],
    'petal_width': [0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2],
    'sepices': ['setosa']*8
})
# display the dataframe
print(df)

Output:

   sepal_length  sepal_width  petal_length  petal_width sepices
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa
5           5.4          3.9           1.7          0.4  setosa
6           4.6          3.4           1.4          0.3  setosa
7           5.0          3.4           1.5          0.2  setosa

The sample dataframe is taken form a section of the Iris dataset. This sample has petal and sepal dimensions of eight data points of the “Setosa” species.

You can use the pandas series sum() function to get the sum of values in individual columns (which essentially are pandas series). For example, let’s get the sum of the “sepal_length” column in the above dataframe.

# sum of sepal_length column
print(df['sepal_length'].sum())

Output:

39.3

You see that we get the total of all values in the “sepal_length” column as the scaler value 39.3.

To get the sum of multiple columns together, first, create a dataframe with the columns you want to calculate the sum for and then apply the pandas dataframe sum() function. For example, let’s get the sum of the values in the columns “sepal_length” and “sepal_width”.

# sum of more than one columns
print(df[['sepal_length', 'sepal_width']].sum())

Output:

sepal_length    39.3
sepal_width     27.1
dtype: float64

Here, we first created a subset of the dataframe “df” with only the columns “sepal_length” and “sepal_width” and then applied the sum function. You can see that we get the sum for both the columns. Note that we get the result as a pandas series.

To get the sum of all the columns, use the same method as above but this time on the entire dataframe. Let’s use this function on the dataframe “df” created above.

# sum of all the columns
print(df.sum(numeric_only=True))

Output:

sepal_length    39.3
sepal_width     27.1
petal_length    11.6
petal_width      1.9
dtype: float64

We get the sum of all the numerical columns present in the dataframe. Note that we passed numeric_only=True to calculate the sum only for the numeric columns.

For more on the function, refer to its documentation.

You can similar methods to get the descriptive statistics like the mean, median, standard deviation, etc. of values in pandas columns.

Leave a Reply

Your email address will not be published. Required fields are marked *