Skip to Content

Pandas – Get Columns with Missing Values

This tutorial will look at how to get the columns with missing values in a pandas dataframe.

How to find columns with missing values in a pandas dataframe?

get column names with missing values in pandas

To get the columns containing missing values, you can use a combination of the pandas isna() function and the any() function in Python.

The idea is to find the columns containing any missing values. The following is the syntax –

# get names of columns with missing values
df.columns[df.isna().any()]

Here, we first, create a boolean index of columns with missing values and use it to filter df.columns to get the columns that contain any missing values.

Examples

Let’s now look at some examples of using the above syntax –

First, we will create a pandas dataframe.

import numpy as np
import pandas as pd

# employee data
data = {
    "Name": ["Jim", "Dwight", "Angela", "Tobi"],
    "Age": [26, np.nan, 27, 32],
    "Department": ["Sales", "Sales", None, np.nan]
}

# create pandas dataframe
df = pd.DataFrame(data)

# display the dataframe
df

Output:

employee dataframe with some missing values

Here, we created a dataframe with information about some employees in an office. The dataframe has the columns – “Name”, “Age”, and “Department”. You can see that some of the values in the above dataframe are missing.

Let’s now get the columns in the above dataframe containing missing values.

We can use the pandas dataframe isna() function to check whether a value in a dataframe is a missing value or not.

Let’s apply this function to the above dataframe.

df.isna()

Output:

result of isna() on the employee dataframe

We get a dataframe with boolean values. Each cell here represents whether the value in this position (in the original dataframe) is missing (na) or not.

Now, we want columns that contain any missing values. You can apply the any() function on top of the resulting dataframe from the isna() function.

df.isna().any()

Output:

Name          False
Age            True
Department     True
dtype: bool

We get a boolean series with the columns containing any missing values marked at True.

You can use the above result as a boolean index to filter the df.columns and get the names of the columns with missing values.

df.columns[df.isna().any()]

Output:

Index(['Age', 'Department'], dtype='object')

We get the names of the columns with missing values (“Age” and “Department”) in the above dataframe.

You may also be interested in –


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush

    Piyush is a data scientist passionate about using data to understand things better and make informed decisions. In the past, he's worked as a Data Scientist for ZS and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.