Treating missing values is an important step in cleaning the data and making it ready for further analysis and usage. It is thus important to correctly identify them. Having an idea of how these values are present can give us directions in treating them. For example, if a column has mostly missing values with very little information, it can be dropped altogether. Or, there could be a pattern to these missing values that could be of use when treating or imputing them. In this tutorial, we will look at how to count the number of missing values in each column of a pandas dataframe.
Sum of Missing Values in each column
To get the count of missing values in each column of a dataframe, you can use the pandas isnull()
and sum()
functions together. The following is the syntax:
# count of missing values in each column df.isnull().sum()
It gives you pandas series of column names along with the sum of missing values in each column. If you instead want to know the total number of missing values in the entire dataset, you can use the sum()
function twice which results in a scaler count. The following is the syntax:
# total number of missing values in the dataframe df.isnull().sum().sum()
Example
Let’s demonstrate the usage of the above syntax on a dataframe to count the missing values in each column. For this, we’ll load the rain in Australia dataset from a CSV file present locally.
import pandas as pd # read the data df = pd.read_csv('weatherAUS.csv') # display the dataframe head df.head()
Output:
Let’s get the count of missing values in each column of the dataframe df
# count of null values in each column print(df.isnull().sum())
Output:
Date 0 Location 0 MinTemp 1485 MaxTemp 1261 Rainfall 3261 Evaporation 62790 Sunshine 69835 WindGustDir 10326 WindGustSpeed 10263 WindDir9am 10566 WindDir3pm 4228 WindSpeed9am 1767 WindSpeed3pm 3062 Humidity9am 2654 Humidity3pm 4507 Pressure9am 15065 Pressure3pm 15028 Cloud9am 55888 Cloud3pm 59358 Temp9am 1767 Temp3pm 3609 RainToday 3261 RainTomorrow 3267 dtype: int64
You can see that except for the “Date” and “Location” columns all the other columns have missing values present. Columns “Evaporation”, “Sunshine”, “Cloud9am”, “Cloud3pm” have a considerably large number of missing values.
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
Let’s get the total number of missing values in the entire dataframe by using the sum()
function twice.
# total number of null values in the dataset print(df.isnull().sum().sum())
Output:
343248
The dataset has 343248 missing values.
Using pandas info()
function
Alternatively, you can also use the pandas info()
function to quickly check which columns have missing values present. It also tells you the count of non-null values. So, if the number of non-null values in a column is equal to the number of rows in the dataframe then it does not have any missing values.
# using pandas info() print(df.info())
Output:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 145460 entries, 0 to 145459 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 145460 non-null object 1 Location 145460 non-null object 2 MinTemp 143975 non-null float64 3 MaxTemp 144199 non-null float64 4 Rainfall 142199 non-null float64 5 Evaporation 82670 non-null float64 6 Sunshine 75625 non-null float64 7 WindGustDir 135134 non-null object 8 WindGustSpeed 135197 non-null float64 9 WindDir9am 134894 non-null object 10 WindDir3pm 141232 non-null object 11 WindSpeed9am 143693 non-null float64 12 WindSpeed3pm 142398 non-null float64 13 Humidity9am 142806 non-null float64 14 Humidity3pm 140953 non-null float64 15 Pressure9am 130395 non-null float64 16 Pressure3pm 130432 non-null float64 17 Cloud9am 89572 non-null float64 18 Cloud3pm 86102 non-null float64 19 Temp9am 143693 non-null float64 20 Temp3pm 141851 non-null float64 21 RainToday 142199 non-null object 22 RainTomorrow 142193 non-null object dtypes: float64(16), object(7) memory usage: 25.5+ MB None
The pandas info()
function is great to get quick exploratory insights on the dataset like the type of columns it has, or whether there are columns with missing values or not.
With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having pandas version 1.0.5
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.