Median of Numpy Array with NaN Values

The Numpy library in Python comes with a number of useful built-in functions for computing common descriptive statistics like mean, median, standard deviation, etc. In this tutorial, we will look at how to get the median value of a Numpy array containing one or more NaN values.

Can you use the `numpy.median()` function on an array with NaN values?

We use the numpy.median() function to get the median value of an array in Numpy. But what happens if the array contains one or more NaN values?

Let’s find out.

import numpy as np

# create array
ar = np.array([1, 2, np.nan, 3])
# get array median
print(np.median(ar))

Output:

nan

Here, we created a one-dimensional Numpy array containing some numbers and a NaN value. We then applied the numpy.median() function which resulted in nan. This happened because the numpy.median() function wasn’t able to handle the nan value present in the array when computing the median.

Thus, you cannot use the numpy.median() function to calculate the median of an array with NaN values.

How to ignore NaN values when calculating the median of a Numpy array?

You can use the numpy.nanmedian() function to calculate the median of a Numpy array containing NaN values. Pass the array as an argument.

The following is the syntax –

📚 Data Science Programs By Skill Level

Introductory ⭐

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

# median of array with nan values
numpy.nanmedian(ar)

It returns the median value in the array ignoring all the NaN values.

Let’s look at some examples of using the numpy.nanmedian() function.

Example 1 – Median of one-dimensional array with NaN values

Let’s apply the numpy.nanmedian() function on the same array used in the example above.

import numpy as np

# create array
ar = np.array([1, 2, np.nan, 3])
# get array median
print(np.median(ar))

Output:

2.0

We get the median in the above array as 2.0. The numpy.nanmedian() function ignores the NaN values when computing the median (2 is the median among 1, 2, 3).

Example 2 – Median of multi-dimensional array with NaN values

The numpy.nanmedian() function is very similar to the numpy.median() function in its arguments. For example, use the axis parameter to specify the axis along which to compute the median.

First, let’s create a 2-D Numpy array.

# create 2-D numpy array
ar = np.array([[1, np.nan, 3],
               [np.nan, 5, np.nan]])
# display the array
print(ar)

Output:

[[ 1. nan  3.]
 [nan  5. nan]]

Here, we used the numpy.array() function to create a Numpy array with two rows and three columns. You can see that there are some NaN values present in the array.

If you use the Numpy nanmedian() function on an array without specifying the axis, it will return the median of all the values inside the array.

# median of array
print(np.nanmedian(ar))

Output:

3.0

We get the median of all the values inside the 2-D array.

Use the numpy.nanmedian() function with axis=1 to get the median for each row in the array.

# median of each row in array
print(np.nanmedian(ar, axis=1))

Output:

[2. 5.]

We get the median of each row in the above 2-D array. The median of values in the first row is (1+3)/2 = 2 and the median of values in the second row is 5 (since it’s the only non-NaN value in that row).

Use the numpy.nanmedian() function with axis=0 to get the median of each column in the array.

# median of each column in array
print(np.nanmedian(ar, axis=0))

Output:

[1. 5. 3.]

We get the median of each column in the above 2-D array. In this example, each column has one NaN value and one non-NaN value (which naturally becomes the median since it’s the only value in the column).

Summary – Median of Numpy array with NaN values

The following is a short summary of the important points mentioned in this tutorial.

Using the numpy.median() function on an array with NaN values results in NaN.
Use the numpy.nanmedian() function to get the median value in an array containing one or more NaN values. It computes the median by taking into account only the non-NaN values in the array.
Similar to the numpy.median() function, you can specify the axis along which you want to compute the median with the numpy.nanmedian() function.

Author

Piyush Raj

Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

View all posts

Can you use the numpy.median() function on an array with NaN values?

How to ignore NaN values when calculating the median of a Numpy array?

Example 1 – Median of one-dimensional array with NaN values

Example 2 – Median of multi-dimensional array with NaN values

Summary – Median of Numpy array with NaN values

Author

Can you use the `numpy.median()` function on an array with NaN values?