Missing values count in each column of pandas dataframe

Pandas – Count Missing Values in Each Column

Treating missing values is an important step in cleaning the data and making it ready for further analysis and usage. It is thus important to correctly identify them. Having an idea of how these values are present can give us directions in treating them. For example, if a column has mostly missing values with very little information, it can be dropped altogether. Or, there could be a pattern to these missing values that could be of use when treating or imputing them. In this tutorial, we will look at how to count the number of missing values in each column of a pandas dataframe.

To get the count of missing values in each column of a dataframe, you can use the pandas isnull() and sum() functions together. The following is the syntax:

# count of missing values in each column
df.isnull().sum()

It gives you pandas series of column names along with the sum of missing values in each column. If you instead want to know the total number of missing values in the entire dataset, you can use the sum() function twice which results in a scaler count. The following is the syntax:

# total number of missing values in the dataframe
df.isnull().sum().sum()

Let’s demonstrate the usage of the above syntax on a dataframe to count the missing values in each column. For this, we’ll load the rain in Australia dataset from a CSV file present locally.

import pandas as pd

# read the data
df = pd.read_csv('weatherAUS.csv')
# display the dataframe head
df.head()

Output:

first five rows of the dataframe

Let’s get the count of missing values in each column of the dataframe df

# count of null values in each column
print(df.isnull().sum())

Output:

Date                 0
Location             0
MinTemp           1485
MaxTemp           1261
Rainfall          3261
Evaporation      62790
Sunshine         69835
WindGustDir      10326
WindGustSpeed    10263
WindDir9am       10566
WindDir3pm        4228
WindSpeed9am      1767
WindSpeed3pm      3062
Humidity9am       2654
Humidity3pm       4507
Pressure9am      15065
Pressure3pm      15028
Cloud9am         55888
Cloud3pm         59358
Temp9am           1767
Temp3pm           3609
RainToday         3261
RainTomorrow      3267
dtype: int64

You can see that except for the “Date” and “Location” columns all the other columns have missing values present. Columns “Evaporation”, “Sunshine”, “Cloud9am”, “Cloud3pm” have a considerably large number of missing values.

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

Let’s get the total number of missing values in the entire dataframe by using the sum() function twice.

# total number of null values in the dataset
print(df.isnull().sum().sum())

Output:

343248

The dataset has 343248 missing values.

Alternatively, you can also use the pandas info() function to quickly check which columns have missing values present. It also tells you the count of non-null values. So, if the number of non-null values in a column is equal to the number of rows in the dataframe then it does not have any missing values.

# using pandas info()
print(df.info())

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145460 entries, 0 to 145459
Data columns (total 23 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Date           145460 non-null  object 
 1   Location       145460 non-null  object 
 2   MinTemp        143975 non-null  float64
 3   MaxTemp        144199 non-null  float64
 4   Rainfall       142199 non-null  float64
 5   Evaporation    82670 non-null   float64
 6   Sunshine       75625 non-null   float64
 7   WindGustDir    135134 non-null  object 
 8   WindGustSpeed  135197 non-null  float64
 9   WindDir9am     134894 non-null  object 
 10  WindDir3pm     141232 non-null  object 
 11  WindSpeed9am   143693 non-null  float64
 12  WindSpeed3pm   142398 non-null  float64
 13  Humidity9am    142806 non-null  float64
 14  Humidity3pm    140953 non-null  float64
 15  Pressure9am    130395 non-null  float64
 16  Pressure3pm    130432 non-null  float64
 17  Cloud9am       89572 non-null   float64
 18  Cloud3pm       86102 non-null   float64
 19  Temp9am        143693 non-null  float64
 20  Temp3pm        141851 non-null  float64
 21  RainToday      142199 non-null  object 
 22  RainTomorrow   142193 non-null  object 
dtypes: float64(16), object(7)
memory usage: 25.5+ MB
None

The pandas info() function is great to get quick exploratory insights on the dataset like the type of columns it has, or whether there are columns with missing values or not.

With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having pandas version 1.0.5


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

Scroll to Top