In this tutorial, we will learn how to get rows with NaN values from a Pandas dataframe.
How to get rows with NaN values in a Pandas DataFrame?
To get the rows with NaN values in Pandas we use the following syntax-
#Create a mask for the rows containing atleast one NaN value. mask = df.isna().any(axis=1) #Pass the mask to df.loc[] to obtain the required rows. rows_with_null_values = df.loc[mask]
Here,
df
— A Pandas DataFrame object.df.loc[]
— A Pandas property to fetch rows using row labels or mask.df.isna()
— A Pandas DataFrame method to get a boolean mask for NaN valuesdf.isna().any(axis=1)
— Gets the boolean mask for rows containing any (at least one) null value.
We use df.isna()
to create a mask for null values. df.isna()
returns a dataframe with the same shape as the original dataframe, but the NaN values are replaced by True
while all other values are replaced by False
. Since the object returned by df.isna()
is a dataframe, we can further use Pandas dataframe methods on it. For our purpose, we used any()
method.
any()
method takes the axis as input. If we pass axis = 0
(or axis = 'index'
), it checks for a value to be True
along each column. If at least one value is True, it returns True for that column.
Similarly, if we pass axis = 1
(or axis = 'columns'
), the any()
method checks for any value to be True along each row. If a value is True
along a row, it returns True
for that row; else it returns False
.
The idea we have implemented here is that df.isna()
will have the same number of rows and columns as our original dataframe, except that NaN values are replaced by True, and all others are replaced by False. So, if we use df.isna().any(axis=1)
, the any()
method will check each row for a True
value. As discussed earlier, all NaN values are replaced by True in df.isna()
. So, if there is a NaN value along a row, the any()
method will return True for that row. Thus, we have a boolean mask for rows containing one or more NaN values.
We pass the mask to df.loc
to get the required rows.
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
Similarly, if we want to get rows containing NaN values only (all the values are NaN), then we use the following syntax-
#Create a mask for the rows containing all NaN values. mask = df.isna().all(axis=1) #Pass the mask to df.loc[] to obtain the required rows. rows_with_null_values = df.loc[mask]
Instead of the any()
method, we have used the all()
method here. all(axis=1)
method checks each row if all its values are True
. Since we are using all()
method on df.isna()
, it returns True
if all the row elements in the dataframe have NaN value. Thus, we create a mask for the rows containing only NaN values.
We pass this mask to df.loc[ ]
to get rows containing only NaN values.
Examples
We will see some examples to understand the syntax we discussed above. Let’s consider weather data for a city for a week.
import pandas as pd #Create data for dataframe d = { "Max. Temp." : [41.7, None, 37.7, 35.9, None, 40.2, None], "Min. Temp." : [28.8, None, 26.6, 33.1, None, 28.8, 29.6], "RH%" : [None, None, 66.6, 78.3, None, 72.7, 91.0], } #Create row labels for dataframe index = ["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"] #Create the dataframe df = pd.DataFrame(d, index=index) #Print the dataframe print(df)
Output:
Max. Temp. Min. Temp. RH% Sunday 41.7 28.8 NaN Monday NaN NaN NaN Tuesday 37.7 26.6 66.6 Wednesday 35.9 33.1 78.3 Thursday NaN NaN NaN Friday 40.2 28.8 72.7 Saturday NaN 29.6 91.0
In this dataframe, we have replaced some values with None. Pandas represents None values of Python as NaN, as seen in this case.
Example 1: Obtain rows containing at least one NaN value from a dataframe
Let’s create a mask for the NaN values in our dataframe.
#Create a mask for the NaN values in dataframe df mask_for_nan_values = df.isna() #Print the mask created print(mask_for_nan_values)
Output:
Max. Temp. Min. Temp. RH% Sunday False False True Monday True True True Tuesday False False False Wednesday False False False Thursday True True True Friday False False False Saturday True False False
We can see that the NaN values in the original dataframe are replaced by True
, while False
replaces all other values. To get the rows containing at least one NaN value, we can create a mask as-
#Create a mask for the rows containing atleast one NaN value. mask_for_rows_wth_atleast_1_nan_value = mask_for_nan_values.any(axis=1) #Print the mask print(mask_for_rows_wth_atleast_1_nan_value)
Output:
Sunday True Monday True Tuesday False Wednesday False Thursday True Friday False Saturday True dtype: bool
As seen in the output above, the mask has True
values for rows containing NaN values, while for others, it has False
values.
Now, let’s pass this mask to df.loc[ ]
#Pass the mask to df.loc[] to obtain the required rows. rows_cont_atleast_1_nan_value = df.loc[mask_for_rows_wth_atleast_1_nan_value] #Print the obtained rows. print(rows_cont_atleast_1_nan_value)
Output:
Max. Temp. Min. Temp. RH% Sunday 41.7 28.8 NaN Monday NaN NaN NaN Thursday NaN NaN NaN Saturday NaN 29.6 91.0
Thus, we have obtained the rows containing one or more NaN values. The above code in one cell will look like this-
#Create a mask for the rows containing atleast one NaN value. mask = df.isna().any(axis=1) #Pass the mask to df.loc[] to obtain the required rows. required_rows = df.loc[mask] #Print the obtained rows. print(required_rows)
Output:
Max. Temp. Min. Temp. RH% Sunday 41.7 28.8 NaN Monday NaN NaN NaN Thursday NaN NaN NaN Saturday NaN 29.6 91.0
We get the same output as before since the code is the same, only this time it is compact.
Example 2: Obtain rows containing only NaN values from a dataframe
To obtain rows containing only NaN values for a dataframe, we will do the same as in the previous example, except we replace any()
with all()
.
#Create a mask for the rows containing all NaN values. mask = df.isna().all(axis=1) #Pass the mask to df.loc[] to obtain the required rows. required_rows = df.loc[mask] #Print the obtained rows. print(required_rows)
Output:
Max. Temp. Min. Temp. RH% Monday NaN NaN NaN Thursday NaN NaN NaN
In the above output, the rows with only NaN values are present.
Summary
From this tutorial, we looked at how to:
- Get the rows containing one or more NaN values using the
loc
property,isna()
, andany()
methods of the dataframe. - Get the rows containing only NaN values using
loc
property,isna()
, andall()
methods of the dataframe.
You might also be interested in –
- Drop Rows with NaNs in Pandas DataFrame
- Pandas – Get Columns with Missing Values
- Missing Values in Pandas Category Column
- Pandas – Percentage of Missing Values in Each Column
- Pandas DataFrame – Get Row Count
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.