get rows with nan values in pandas

Get Rows with NaN values in Pandas

In this tutorial, we will learn how to get rows with NaN values from a Pandas dataframe.

How to get rows with NaN values in a Pandas DataFrame?

get rows with nan values in pandas

To get the rows with NaN values in Pandas we use the following syntax-

#Create a mask for the rows containing atleast one NaN value.
mask = df.isna().any(axis=1)
#Pass the mask to df.loc[] to obtain the required rows.
rows_with_null_values = df.loc[mask]

Here,

  • df — A Pandas DataFrame object.
  • df.loc[] — A Pandas property to fetch rows using row labels or mask.
  • df.isna() — A Pandas DataFrame method to get a boolean mask for NaN values
  • df.isna().any(axis=1) — Gets the boolean mask for rows containing any (at least one) null value.

We use df.isna() to create a mask for null values. df.isna() returns a dataframe with the same shape as the original dataframe, but the NaN values are replaced by True while all other values are replaced by False. Since the object returned by df.isna() is a dataframe, we can further use Pandas dataframe methods on it. For our purpose, we used any() method.

any() method takes the axis as input. If we pass axis = 0 (or axis = 'index'), it checks for a value to be True along each column. If at least one value is True, it returns True for that column.

Similarly, if we pass axis = 1 (or axis = 'columns'), the any() method checks for any value to be True along each row. If a value is True along a row, it returns True for that row; else it returns False.

The idea we have implemented here is that df.isna() will have the same number of rows and columns as our original dataframe, except that NaN values are replaced by True, and all others are replaced by False. So, if we use df.isna().any(axis=1), the any() method will check each row for a True value. As discussed earlier, all NaN values are replaced by True in df.isna(). So, if there is a NaN value along a row, the any() method will return True for that row. Thus, we have a boolean mask for rows containing one or more NaN values.

We pass the mask to df.loc to get the required rows.

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

Similarly, if we want to get rows containing NaN values only (all the values are NaN), then we use the following syntax-

#Create a mask for the rows containing all NaN values.
mask = df.isna().all(axis=1)
#Pass the mask to df.loc[] to obtain the required rows.
rows_with_null_values = df.loc[mask]

Instead of the any() method, we have used the all() method here. all(axis=1) method checks each row if all its values are True. Since we are using all() method on df.isna(), it returns True if all the row elements in the dataframe have NaN value. Thus, we create a mask for the rows containing only NaN values.

We pass this mask to df.loc[ ] to get rows containing only NaN values.

Examples

We will see some examples to understand the syntax we discussed above. Let’s consider weather data for a city for a week.

import pandas as pd

#Create data for dataframe
d = {
    "Max. Temp." : [41.7, None, 37.7, 35.9, None, 40.2, None],
    "Min. Temp." : [28.8, None, 26.6, 33.1, None, 28.8, 29.6],
    "RH%"        : [None, None, 66.6, 78.3, None, 72.7, 91.0],
    }

#Create row labels for dataframe
index = ["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"]

#Create the dataframe
df = pd.DataFrame(d, index=index)

#Print the dataframe
print(df)

Output:

           Max. Temp.  Min. Temp.   RH%
Sunday           41.7        28.8   NaN
Monday            NaN         NaN   NaN
Tuesday          37.7        26.6  66.6
Wednesday        35.9        33.1  78.3
Thursday          NaN         NaN   NaN
Friday           40.2        28.8  72.7
Saturday          NaN        29.6  91.0

In this dataframe, we have replaced some values with None. Pandas represents None values of Python as NaN, as seen in this case.

Example 1: Obtain rows containing at least one NaN value from a dataframe

Let’s create a mask for the NaN values in our dataframe.

#Create a mask for the NaN values in dataframe df
mask_for_nan_values = df.isna()
#Print the mask created
print(mask_for_nan_values)

Output:

           Max. Temp.  Min. Temp.    RH%
Sunday          False       False   True
Monday           True        True   True
Tuesday         False       False  False
Wednesday       False       False  False
Thursday         True        True   True
Friday          False       False  False
Saturday         True       False  False

We can see that the NaN values in the original dataframe are replaced by True, while False replaces all other values. To get the rows containing at least one NaN value, we can create a mask as-

#Create a mask for the rows containing atleast one NaN value.
mask_for_rows_wth_atleast_1_nan_value = mask_for_nan_values.any(axis=1)
#Print the mask
print(mask_for_rows_wth_atleast_1_nan_value)

Output:

Sunday        True
Monday        True
Tuesday      False
Wednesday    False
Thursday      True
Friday       False
Saturday      True
dtype: bool

As seen in the output above, the mask has True values for rows containing NaN values, while for others, it has False values.

Now, let’s pass this mask to df.loc[ ]

#Pass the mask to df.loc[] to obtain the required rows.
rows_cont_atleast_1_nan_value = df.loc[mask_for_rows_wth_atleast_1_nan_value]
#Print the obtained rows.
print(rows_cont_atleast_1_nan_value)

Output:

          Max. Temp.  Min. Temp.   RH%
Sunday          41.7        28.8   NaN
Monday           NaN         NaN   NaN
Thursday         NaN         NaN   NaN
Saturday         NaN        29.6  91.0

Thus, we have obtained the rows containing one or more NaN values. The above code in one cell will look like this-

#Create a mask for the rows containing atleast one NaN value.
mask = df.isna().any(axis=1)
#Pass the mask to df.loc[] to obtain the required rows.
required_rows = df.loc[mask]
#Print the obtained rows.
print(required_rows)

Output:

          Max. Temp.  Min. Temp.   RH%
Sunday          41.7        28.8   NaN
Monday           NaN         NaN   NaN
Thursday         NaN         NaN   NaN
Saturday         NaN        29.6  91.0

We get the same output as before since the code is the same, only this time it is compact.

Example 2: Obtain rows containing only NaN values from a dataframe

To obtain rows containing only NaN values for a dataframe, we will do the same as in the previous example, except we replace any() with all().

#Create a mask for the rows containing all NaN values.
mask = df.isna().all(axis=1)
#Pass the mask to df.loc[] to obtain the required rows.
required_rows = df.loc[mask]
#Print the obtained rows.
print(required_rows)

Output:

          Max. Temp.  Min. Temp.  RH%
Monday           NaN         NaN  NaN
Thursday         NaN         NaN  NaN

In the above output, the rows with only NaN values are present.

Summary

From this tutorial, we looked at how to:

  • Get the rows containing one or more NaN values using the loc property, isna(), and any() methods of the dataframe.
  • Get the rows containing only NaN values using loc property, isna(), and all() methods of the dataframe.

You might also be interested in –


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

Scroll to Top