In this tutorial, we’ll try to understand how to plot histograms by group in pandas with the help of some examples.

Plotting histograms using grouped data from a pandas DataFrame creates one histogram for each group in the DataFrame. For example, you group the data by values of column 1 and then show the distribution of values in column 2 for each group of data points using a histogram.

You can use the following methods to plot histograms by group in pandas:

- Plot Histograms by Group Using Multiple Plots – one histogram for each group
- Plot Histograms by Group Using One Plot – all the histograms on a single plot

Let’s now look at both methods in detail.

## Method 1 – Plot Histograms by Group Using Multiple Plots

You can use the `pandas.DataFrame.hist()`

method to create histograms for different groups of data. Each group is plotted on a separate subplot.

You can specify the column to group the data by using the `by`

parameter and the column to show the distribution of using the `column`

parameter. You can also directly apply this method to an individual column of the dataframe and just specify the column(s) to group the data on.

The following is the syntax –

**Basic Syntax:**

**Data Science Programs By Skill Level**

**Introductory** ⭐

- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science

**Intermediate ⭐⭐⭐**

- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization

**Advanced ⭐⭐⭐⭐⭐**

- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science

**🔎 Find Data Science Programs 👨💻 111,889 already enrolled**

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

DataFrame.hist(column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, bins=10, backend=None, legend=False, **kwargs)

**Parameters:**

**data**– The pandas object holding the data.**column**– If passed, will be used to limit data to a subset of columns.**by**– If passed, then used to form histograms for separate groups.

For more details about the arguments, refer this.

Now let us understand the above method with some worked out examples.

### Example 1 – Plot histogram of column values by group in a pandas dataframe on separate plots

Let’s create a pandas dataframe with two columns – “col1”, a column storing categorical data which will be used to group the data, and “col2”, a column with numerical data.

And then, plot the distribution of the values in “col2” for each group (decided by “col1” values) using the pandas `hist()`

function.

import pandas as pd import numpy as np #create DataFrame df = pd.DataFrame({'col1': np.repeat(['W','X', 'Y', 'Z'], 50), 'col2': np.random.normal(loc=10, scale=2, size=200)}) #Plotting the histogram by group in multiple plots df['col2'].hist(by=df['col1'])

Output:

In the above example, we –

- Import the required modules.
- Create a dataframe with the first column filled with values
`W, X, Y, Z`

each 50 times, then filled the second column with numerical values using`numpy.random.normal`

(refer this). - Plot the histogram by group in multiple plots using the pandas
`hist()`

function.

### Example 2 – Histgrom by group in multiple plots with customizations

You can customize the resulting plots by passing additional parameters to the pandas `hist()`

function, for example, let’s change the edge color of the histogram bars to red.

import pandas as pd import numpy as np #create DataFrame df = pd.DataFrame({'col1': np.repeat(['W','X', 'Y', 'Z'], 50), 'col2': np.random.normal(loc=10, scale=2, size=200)}) #Plotting the histogram by group in multiple plots df['col2'].hist(by=df['col1'], edgecolor='red', figsize = (8,6))

Output:

The histogram bars now have a red edge.

## Method 2 – Plot Histograms by Group in One Plot

You can use the `matplotlib.pyplot.hist()`

function to plot the histograms of groups of data in a single plot. This type of histogram shows the level of overlap in the distribution of the values across different groups.

The following is the syntax –

**Basic Syntax:**

matplotlib.pyplot.hist(x, bins=None, range=None, density=False, weights=None, cumulative=False, bottom=None, histtype='bar', align='mid', orientation='vertical', rwidth=None, log=False, color=None, label=None, stacked=False, *, data=None, **kwargs)

**Parameters:**

**x**: Input values, this takes either a single array or a sequence of arrays that are not required to be of the same length.**range**: The lower and upper range of the bins. Lower and upper outliers are ignored. If not provided, range is`(x.min(), x.max())`

. Range has no effect if bins is a sequence.

For more details about the arguments, refer this.

Now let us understand the usage of this method with an example.

We’ll take the same dataframe as above – a dataframe with two columns – “col1”, a column storing categorical data which will be used to group the data, and “col2”, a column with numerical data.

Now, the `matplotlib.pyplot.hist()`

function, by itself, cannot group the data. So we’ll have to group the data separately, and then plot the histogram for each group on the same plot.

import pandas as pd import numpy as np import matplotlib.pyplot as plt #create DataFrame df = pd.DataFrame({'col1': np.repeat(['W','X', 'Y', 'z'], 50), 'col2': np.random.normal(loc=10, scale=2, size=200)}) #define points values by group W = df.loc[df['col1'] == 'W', 'col2'] X = df.loc[df['col1'] == 'X', 'col2'] Y = df.loc[df['col1'] == 'Y', 'col2'] Z = df.loc[df['col1'] == 'Z', 'col2'] #add four histograms to one plot plt.hist(W, alpha=0.5, label='W') plt.hist(X, alpha=0.5, label='X') plt.hist(Y, alpha=0.5, label='Y') plt.hist(Z, alpha=0.5, label='Z') plt.legend(title='Col2') plt.show()

Output:

In the above example, we –

- Import the required modules.
- Create a dataframe with the first column filled with values
`W, X, Y, Z`

each 50 times, then filled the second column with numerical values using`numpy.random.normal`

(refer this). - Group the data based on the values in “col1”.
- Plot the histogram for each group in the same plot using the
`matplotlib.pyplot.hist()`

function. Note that we use the`alpha`

parameter to make the histograms more transparent so that we can easily see the overlap.

You might also be interested in –

- How to Create a Contour Plot in Matplotlib
- Pandas – Plot Multiple Dataframes in Subplots
- How to Create Multiple Matplotlib Plots in One Figure?

**Subscribe to our newsletter for more informative guides and tutorials. ****We do not spam and you can opt out any time.**