check if numpy array has any duplicates

Numpy – Check If Array has any Duplicates

In this tutorial, we will check if a Numpy array has any duplicates or not with the help of some examples.

A numpy array is said to have duplicates if one (or more) values in the array occur more than once in the array. For example, in the array, [1, 2, 2, 3, 4], the value 2 occurs more than once and thus is a duplicate.

Methods to check if a numpy array has duplicates

To check if a numpy array has any duplicates, check if the count of unique values in the array is less than the length of the array. The idea is, if an array has any duplicate values, the unique value count will be less than the original size of the array. You can use the following two methods –

  1. Get the unique values in the array using numpy.unique() function and compare its length with that of the original array.
  2. Get the length of the set resulting from the original array and compare its length with the original array.

The following is the syntax –

# check if numpy array has duplicates

# method 1
len(numpy.unique(ar)) < len(ar)

# method 2
len(set(ar)) < len(ar)

Let’s now look at some examples of using the above methods.

Example 1 – Using the numpy.unique() function

The numpy.unique() function returns a numpy array of the unique values in the passed array. If the size of this array is less than the size of the original array, we can say that the array has some duplicate values.

Let’s look at an example.

import numpy as np

# create two arrays
ar1 = np.array([1, 2, 2, 3, 4, 5])
ar2 = np.array([1, 2, 3, 4, 5])

# check if array has duplicates
print(len(np.unique(ar1)) < len(ar1))
print(len(np.unique(ar2)) < len(ar2))

Output:

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

True
False

Here, we created two numpy arrays – ar1, with duplicates and ar2 without any duplicate values and checked if they had any duplicates using the numpy.unique() method. We get True for ar1 and False for ar2 which are the correct results.

Example 2 – Using a set

Alternatively, you can convert the numpy array to a set to get only the unique values and then compare the length of the set and that of the original array.

Let’s take the same example as above.

import numpy as np

# create two arrays
ar1 = np.array([1, 2, 2, 3, 4, 5])
ar2 = np.array([1, 2, 3, 4, 5])

# check if array has duplicates
print(len(set(ar1)) < len(ar1))
print(len(set(ar2)) < len(ar2))

Output:

True
False

We get the same results as above.

You might also be interested in –


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

Scroll to Top