Skip to Content

Pandas – Count of Unique Values in Each Column

Generally, the data in each column represents a different feature of a pandas dataframe. It may be continuous, categorical, or something totally different like distinct texts. If you’re not sure about the nature of the values you’re dealing with, it might be a good exploratory step to know about the count of distinct values. In this tutorial, we’ll look at how to get the count of unique values in each column of a pandas dataframe.

To count the unique values of each column of a dataframe, you can use the pandas dataframe nunique() function. The following is the syntax:

counts = df.nunique()

Here, df is the dataframe for which you want to know the unique counts. It returns a pandas Series of counts. By default, the pandas dataframe nunique() function counts the distinct values along axis=0, that is, row-wise which gives you the count of distinct values in each column.

Let’s look at some of the different use cases for getting unique counts through some examples. First, we’ll create a sample dataframe that we’ll be using throughout this tutorial.

import pandas as pd
import numpy as np

# create a sample dataframe
data = {
    'EmpCode': ['E1', 'E2', 'E3', 'E4', 'E5'],
    'Gender': ['Male', 'Female', 'Female', 'Male', 'Male'],
    'Age': [27, 24, 29, 24, 25],
    'Department': ['Accounting', 'Sales', 'Accounting', np.nan, 'Sales']
}
df = pd.DataFrame(data)

# display the dataframe
df

Output:

the resulting dataframe with employee information

Here, we created a dataframe with information about some employees in an office. The dataframe has the following columns – “EmpCode”, “Gender”, “Age”, and the “Department”.

Using the pandas dataframe nunique() function with default parameters gives a count of all the distinct values in each column.

# count of unique values in each column
print(df.nunique())

Output:

EmpCode       5
Gender        2
Age           4
Department    2
dtype: int64

In the above example, the nunique() function returns a pandas Series with counts of distinct values in each column. Note that, for the Department column we only have two distinct values as the nunique() function, by default, ignores all NaN values.

You can also get the count of distinct values in each row by setting the axis parameter to 1 or 'columns' in the nunique() function.

# count of unique values in each row
print(df.nunique(axis=1))

Output:

0    4
1    4
2    4
3    3
4    4
dtype: int64

In the above example, you can see that we have 4 distinct values in each row except for the row with index 3 which has 3 unique values due to the presence of a NaN value.

For more on the pandas dataframe nunique() function, refer to its official documentation.

In case you want to know the count of each of the distinct values of a specific column, you can use the pandas value_counts() function. In the above dataframe df, if you want to know the count of each distinct value in the column Gender, you can use –

# count of each unique value in the "Gender" column
print(df['Gender'].value_counts())

Output:

Male      3
Female    2
Name: Gender, dtype: int64

In the above example, the pandas series value_counts() function is used to get the counts of 'Male' and 'Female', the distinct values in the column B of the dataframe df.

With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having numpy version 1.18.5 and pandas version 1.0.5


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush

    Piyush is a data scientist passionate about using data to understand things better and make informed decisions. In the past, he's worked as a Data Scientist for ZS and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.