count of distinct values in each column of pandas dataframe

Pandas – Count of Unique Values in Each Column

Generally, the data in each column represents a different feature of a pandas dataframe. It may be continuous, categorical, or something totally different like distinct texts. If you’re not sure about the nature of the values you’re dealing with, it might be a good exploratory step to know about the count of distinct values. In this tutorial, we’ll look at how to get the count of unique values in each column of a pandas dataframe.

If you prefer a video tutorial over text, check out the following video detailing the steps in this tutorial –

To count the unique values of each column of a dataframe, you can use the pandas dataframe nunique() function. The following is the syntax:

counts = df.nunique()

Here, df is the dataframe for which you want to know the unique counts. It returns a pandas Series of counts. By default, the pandas dataframe nunique() function counts the distinct values along axis=0, that is, row-wise which gives you the count of distinct values in each column.

Let’s look at some of the different use cases for getting unique counts through some examples. First, we’ll create a sample dataframe that we’ll be using throughout this tutorial.

import pandas as pd
import numpy as np

# create a sample dataframe
data = {
    'EmpCode': ['E1', 'E2', 'E3', 'E4', 'E5'],
    'Gender': ['Male', 'Female', 'Female', 'Male', 'Male'],
    'Age': [27, 24, 29, 24, 25],
    'Department': ['Accounting', 'Sales', 'Accounting', np.nan, 'Sales']
}
df = pd.DataFrame(data)

# display the dataframe
df

Output:

the resulting dataframe with employee information

Here, we created a dataframe with information about some employees in an office. The dataframe has the following columns – “EmpCode”, “Gender”, “Age”, and the “Department”.

Using the pandas dataframe nunique() function with default parameters gives a count of all the distinct values in each column.

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

# count of unique values in each column
print(df.nunique())

Output:

EmpCode       5
Gender        2
Age           4
Department    2
dtype: int64

In the above example, the nunique() function returns a pandas Series with counts of distinct values in each column. Note that, for the Department column we only have two distinct values as the nunique() function, by default, ignores all NaN values.

You can also get the count of distinct values in each row by setting the axis parameter to 1 or 'columns' in the nunique() function.

# count of unique values in each row
print(df.nunique(axis=1))

Output:

0    4
1    4
2    4
3    3
4    4
dtype: int64

In the above example, you can see that we have 4 distinct values in each row except for the row with index 3 which has 3 unique values due to the presence of a NaN value.

For more on the pandas dataframe nunique() function, refer to its official documentation.

In case you want to know the count of each of the distinct values of a specific column, you can use the pandas value_counts() function. In the above dataframe df, if you want to know the count of each distinct value in the column Gender, you can use –

# count of each unique value in the "Gender" column
print(df['Gender'].value_counts())

Output:

Male      3
Female    2
Name: Gender, dtype: int64

In the above example, the pandas series value_counts() function is used to get the counts of 'Male' and 'Female', the distinct values in the column B of the dataframe df.

With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having numpy version 1.18.5 and pandas version 1.0.5


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

Scroll to Top