In this tutorial, we will look at the pandas dataframe describe() function with the help of some examples.
What does describe() do in Pandas dataframe?
The pandas dataframe describe()
function is used to get the descriptive statistics for a dataframe. The following is the syntax –
# get dataframe's descriptive stats df.describe()
You can also apply the describe()
function to a pandas series.
The describe()
function takes the following arguments.
- percentiles (list or list-like of numbers) – The percentiles (for numeric fields) to include in the result. The percentile values lie between 0 and 1 and by default, it includes the following percentiles
[0.25, 0.5, 0.75]
. - includes (‘all’, None, or list-like of dtypes) – Indicates which type of fields to include when generating the description. By default, it’s
None
in which case, the description is generated only for numeric columns. You can use'all'
to include all the columns or pass a list of dtypes that you want to be included. - exclude (None, or list-like of dtypes) – Indicates which fields to exclude when generating the description. By default, it’s
None
, meaning don’t additionally exclude anything. You can also pass a list of dtypes that you want to be excluded. - datetime_is_numeric (bool) – Whether to treat datetime fields (columns) as numeric types when generating the description. It is
False
by default.
It returns the resulting descriptive statistics as a pandas dataframe (a pandas series if you apply it on a series).
Examples
Let’s now look at some examples of using the above syntax to generate descriptions of some dataframes.
First, we will create a pandas dataframe that we will be using throughout this tutorial.
import pandas as pd # employee data data = { "Name": ["Jim", "Dwight", "Angela", "Tobi"], "Age": [26, 28, 27, 32], "Department": ["Sales", "Sales", "Accounting", "HR"], "Salary": [55000, 60000, 52000, 45000] } # create pandas dataframe df = pd.DataFrame(data) # display the dataframe df
Output:
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
Here, we created a dataframe containing information about some employees in an office. The dataframe has four columns – “Name”, “Age”, “Department” and “Employees”.
Let’s check the data type of the columns in the above dataframe. You can use the pandas dataframe dtypes
property.
# get column dtypes df.dtypes
Output:
Name object Age int64 Department object Salary int64 dtype: object
We get the dtype of each column in the above dataframe. You can see that the “Age” and “Salary” columns are of int64
type (they are numeric) and the “Name” and “Department” columns are of object
type (generally used for string and categorical fields).
Let’s now look at examples of using the describe()
function.
Example 1 – Get statistics for only numeric columns using pandas describe()
The pandas dataframe describe()
function, by default, includes only the numeric columns when generating the dataframe’s description. (The default value for the include
parameter is None
).
Let’s apply the describe()
function on the above dataframe without any parameters (that is, using the default values of the parameters).
# get dataframe's descriptive statistics df.describe()
Output:
We get the description only for the numeric columns – “Age” and “Salary”. The result contains descriptive statistics like count, mean, min, max, standard deviation, and percentile values for the 25th, 50th, and 75th percentile.
Example 2 – Get statistics for only non-numeric columns using pandas describe()
Let’s now get the statistics for only the object
type columns in the above dataframe.
Pass the dtypes you want to be included as a list to the include
parameter.
# get dataframe's descriptive statistics for non-numeric columns df.describe(include=['object'])
Output:
We get the description only for the object
type columns – “Name” and “Department”. The result contains statistics like count, unique values, top (the most frequent value), and freq (the count of the most frequent value in the column).
Example 3 – Get the statistics for all the columns using describe()
To get the statistics for all the columns using the pandas dataframe describe()
function. Pass include='all'
.
# get dataframe's descriptive statistics for all columns df.describe(include='all')
Output:
We get the statistics for all the columns in the above dataframe.
You can see that this result is a sort of combination of the above two results.
Summary
In this tutorial, we looked at how to get descriptive statistics for a dataframe using the describe() function in pandas. The following are the key takeaways –
- The
describe()
function, by default, generates the statistics for only the numeric columns. - To include specific column types in the result, pass the dtypes to include as a list to the
include
parameter. - If you want to get the statistics for all the columns, pass
'all'
to theinclude
parameter.
You might also be interested in –
- Pandas – Get Standard Deviation of one or more Columns
- Pandas – Get Mean of one or more Columns
- Pandas – Get DataFrame Size (With Examples)
- Pandas – Create DataFrame Copy
- Pandas – Get Value of a Cell in Dataframe
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.