Pandas – Count of Unique Values in Each Column

Generally, the data in each column represents a different feature of the dataframe. It may be continuous, categorical, or something totally different like distinct texts. If you’re not sure about the nature of the values you’re dealing with, it might be a good exploratory step to know about the count of distinct values. In this tutorial, we’ll look at how to get the count of unique values in each column of a pandas dataframe.

To count the unique values of each column of a dataframe, you can use the pandas dataframe nunique() function. The following is the syntax:

counts = df.nunique()

Here, df is the dataframe for which you want to know the unique counts. It returns a pandas Series of counts. By default, the pandas dataframe nunique() function counts the distinct values along axis=0, that is, row-wise which gives you the count of distinct values in each column.

Let’s look at the some of the different use cases of getting unique counts through some examples. First, we’ll create a sample dataframe that we’ll be using throughout this tutorial.

import pandas as pd
import numpy as np

# create a sample dataframe
data = {
    'A': ['E1', 'E2', 'E3', 'E4', 'E5'],
    'B': ['Male', 'Female', 'Female', 'Male', 'Male'],
    'C': [27, 24, 29, 24, 25],
    'D': ['Accounting', 'Sales', 'Accounting', np.nan, 'Sales']
}
df = pd.DataFrame(data)

# print the dataframe
print(df)

Output:

    A       B   C           D
0  E1    Male  27  Accounting
1  E2  Female  24       Sales
2  E3  Female  29  Accounting
3  E4    Male  24         NaN
4  E5    Male  25       Sales

Using the pandas dataframe nunique() function with default parameters gives a count of all the distinct values in each column.

print(df.nunique())

Output:

A    5
B    2
C    4
D    2
dtype: int64

In the above example, the nunique() function returns a pandas Series with counts of distinct values in each column. Note that, for column D we only have two distinct values as the nunique() function, by default, ignores all NaN values.

You can also get the count of distinct values in each row by setting the axis parameter to 1 or 'columns' in the nunique() function.

print(df.nunique(axis=1))

Output:

0    4
1    4
2    4
3    3
4    4
dtype: int64

In the above example, you can see that we have 4 distinct values in each row except for the row with index 3 which has 3 unique values due to the presence of a NaN value.

For more on the pandas dataframe nunique() function, refer to its official documentation.

In case you want to know the count of each of the distinct values of a specific column, you can use the pandas value_counts() function. In the above dataframe df, if you want to know the count of each distinct value in the column B, you can use –

print(df['B'].value_counts())

Output:

Male      3
Female    2
Name: B, dtype: int64

In the above example, the pandas series value_counts() function is used to get the counts of 'Male' and 'Female', the distinct values in the column B of the dataframe df.

With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having numpy version 1.18.5 and pandas version 1.0.5


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.