pandas dataframe describe function

Pandas dataframe describe() function

In this tutorial, we will look at the pandas dataframe describe() function with the help of some examples.

What does describe() do in Pandas dataframe?

The pandas dataframe describe() function is used to get the descriptive statistics for a dataframe. The following is the syntax –

# get dataframe's descriptive stats
df.describe()

You can also apply the describe() function to a pandas series.

The describe() function takes the following arguments.

  • percentiles (list or list-like of numbers) – The percentiles (for numeric fields) to include in the result. The percentile values lie between 0 and 1 and by default, it includes the following percentiles [0.25, 0.5, 0.75].
  • includes (‘all’, None, or list-like of dtypes) – Indicates which type of fields to include when generating the description. By default, it’s None in which case, the description is generated only for numeric columns. You can use 'all' to include all the columns or pass a list of dtypes that you want to be included.
  • exclude (None, or list-like of dtypes) – Indicates which fields to exclude when generating the description. By default, it’s None, meaning don’t additionally exclude anything. You can also pass a list of dtypes that you want to be excluded.
  • datetime_is_numeric (bool) – Whether to treat datetime fields (columns) as numeric types when generating the description. It is False by default.

It returns the resulting descriptive statistics as a pandas dataframe (a pandas series if you apply it on a series).

Examples

Let’s now look at some examples of using the above syntax to generate descriptions of some dataframes.

First, we will create a pandas dataframe that we will be using throughout this tutorial.

import pandas as pd

# employee data
data = {
    "Name": ["Jim", "Dwight", "Angela", "Tobi"],
    "Age": [26, 28, 27, 32],
    "Department": ["Sales", "Sales", "Accounting", "HR"],
    "Salary": [55000, 60000, 52000, 45000]
}

# create pandas dataframe
df = pd.DataFrame(data)

# display the dataframe
df

Output:

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

dataframe with employee data

Here, we created a dataframe containing information about some employees in an office. The dataframe has four columns – “Name”, “Age”, “Department” and “Employees”.

Let’s check the data type of the columns in the above dataframe. You can use the pandas dataframe dtypes property.

# get column dtypes
df.dtypes

Output:

Name          object
Age            int64
Department    object
Salary         int64
dtype: object

We get the dtype of each column in the above dataframe. You can see that the “Age” and “Salary” columns are of int64 type (they are numeric) and the “Name” and “Department” columns are of object type (generally used for string and categorical fields).

Let’s now look at examples of using the describe() function.

Example 1 – Get statistics for only numeric columns using pandas describe()

The pandas dataframe describe() function, by default, includes only the numeric columns when generating the dataframe’s description. (The default value for the include parameter is None).

Let’s apply the describe() function on the above dataframe without any parameters (that is, using the default values of the parameters).

# get dataframe's descriptive statistics
df.describe()

Output:

pandas describe results for numeric columns

We get the description only for the numeric columns – “Age” and “Salary”. The result contains descriptive statistics like count, mean, min, max, standard deviation, and percentile values for the 25th, 50th, and 75th percentile.

Example 2 – Get statistics for only non-numeric columns using pandas describe()

Let’s now get the statistics for only the object type columns in the above dataframe.

Pass the dtypes you want to be included as a list to the include parameter.

# get dataframe's descriptive statistics for non-numeric columns
df.describe(include=['object'])

Output:

pandas describe results for object type columns

We get the description only for the object type columns – “Name” and “Department”. The result contains statistics like count, unique values, top (the most frequent value), and freq (the count of the most frequent value in the column).

Example 3 – Get the statistics for all the columns using describe()

To get the statistics for all the columns using the pandas dataframe describe() function. Pass include='all'.

# get dataframe's descriptive statistics for all columns
df.describe(include='all')

Output:

pandas describe results for all columns

We get the statistics for all the columns in the above dataframe.

You can see that this result is a sort of combination of the above two results.

Summary

In this tutorial, we looked at how to get descriptive statistics for a dataframe using the describe() function in pandas. The following are the key takeaways –

  • The describe() function, by default, generates the statistics for only the numeric columns.
  • To include specific column types in the result, pass the dtypes to include as a list to the include parameter.
  • If you want to get the statistics for all the columns, pass 'all' to the include parameter.

You might also be interested in –


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

Scroll to Top