Standard Deviation of pandas column values

Pandas – Get Standard Deviation of one or more Columns

Standard deviation is a measure of spread in the values. It’s used in a number of statistical tests and it can be handy to know how to quickly calculate it in pandas. In this tutorial, we will look at how to get the standard deviation of one or more columns in a pandas dataframe.

You can use the pandas series std() function to get the standard deviation of a single column or the pandas dataframe std() function to get the standard deviation of all numerical columns in the dataframe. The following is the syntax:

# std dev of single column
df['Col'].std()
# std dev of all numerical columns in dataframe
df.std()

Let’s create a sample dataframe that we will be using throughout this tutorial to demonstrate the usage of the methods and syntax mentioned.

import pandas as pd

# create a dataframe
df = pd.DataFrame({
    'sepal_length': [5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0],
    'sepal_width': [3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4],
    'petal_length': [1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5],
    'petal_width': [0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2],
    'sepices': ['setosa']*8
})
# display the dataframe
print(df)

Output:

   sepal_length  sepal_width  petal_length  petal_width sepices
0          5.1          3.5           1.4          0.2  setosa
1          4.9          3.0           1.4          0.2  setosa
2          4.7          3.2           1.3          0.2  setosa
3          4.6          3.1           1.5          0.2  setosa
4          5.0          3.6           1.4          0.2  setosa
5          5.4          3.9           1.7          0.4  setosa
6          4.6          3.4           1.4          0.3  setosa
7          5.0          3.4           1.5          0.2  setosa

The sample dataframe is taken form a section of the Iris dataset. This sample has petal and sepal dimensions of eight data points of the “Setosa” species.

First, let’s see how to get the standard deviation of a single dataframe column.

You can use the pandas series std() function to get the std dev of individual columns (which essentially are pandas series). For example, let’s get the std dev of the “sepal_length” column in the above dataframe.

# std dev of sepal_length column
print(df['sepal_length'].std())

Output:

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

0.27483761439387144

You see that we get the standard deviation of the values in the “sepal_length” column as a scaler value.

First, create a dataframe with the columns you want to calculate the std dev for and then apply the pandas dataframe std() function. For example, let’s get the std dev of the columns “petal_length” and “petal_width”

# std dev of more than one columns
print(df[['petal_length', 'petal_width']].std())

Output:

petal_length    0.119523
petal_width     0.074402
dtype: float64

We get the result as a pandas series. Here, we first created a subset of the dataframe “df” with only the columns “petal_length” and “petal_width” and then applied the std() function.

To get the std dev of all the columns, use the same method as above but this time on the entire dataframe. Let’s use this function on the dataframe “df” created above.

# std dev of all the columns
print(df.std())

Output:

sepal_length    0.274838
sepal_width     0.290012
petal_length    0.119523
petal_width     0.074402
dtype: float64

You can see that we get the standard deviation of all the numerical columns present in the dataframe.

Note that you can also use the pandas describe() function to look at statistics including the standard deviation of columns in the dataframe.

# get dataframe statistics
df.describe()

Output:

Dataframe statistics including standard deviation from the describe() function.

For more on the pandas dataframe std() function, refer to its documention.

You might also be interested in: Pandas – Get Mean of one or more Columns

With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having pandas version 1.0.5


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Tutorials on getting statistics for pandas dataframe values –

Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

Scroll to Top