Plot a histogram in matplotlib

Plot Histogram in Python using Matplotlib

Histograms show the frequency distribution of values of a variable across different buckets. They are great for visualizing the distribution of a variable. In this tutorial, we’ll look at how to plot a histogram in python using matplotlib.

Matplotlib is a library in Python used for plotting visualizations and comes with a number of handy formatting and plot options. To plot a histogram you can use matplotlib pyplot’s hist() function. The following is the syntax:

import matplotlib.pyplot as plt
plt.hist(x)
plt.show()

Here, x is the array or sequence of values of the variable for which you want to construct a histogram. You can also specify the number of bins or the bin edges you want in the plot using the bins parameter (see the examples below).

Let’s look at some of the examples of using the hist() function to plot a histogram.

Let’s say you want to plot a histogram of the marks obtained by 100 students in a high school Math class. You can use matplotlib pyplot’s hist() function for it. Let’s see what we get using just the default parameters.

import matplotlib.pyplot as plt

# scores in the Math class
math_scores = [72, 41, 65, 63, 82, 63, 51, 57, 39, 63,
               62, 68, 52, 76, 62, 73, 72, 73, 71, 62,
               76, 53, 71, 79, 77, 35, 65, 59, 58, 70,
               73, 69, 59, 75, 73, 63, 65, 81, 46, 59,
               53, 71, 79, 80, 60, 60, 64, 40, 73, 75,
               68, 58, 81, 65, 55, 62, 82, 47, 85, 62,
               39, 77, 82, 78, 57, 58, 72, 75, 65, 68,
               86, 49, 39, 64, 54, 68, 85, 77, 62, 53,
               52, 76, 80, 84, 69, 61, 69, 65, 89, 97,
               71, 61, 77, 40, 83, 52, 78, 54, 64, 58]

# plot histogram
plt.hist(history_scores)
plt.show()

Output:

Plain histogram of math scores without any formatting.

This histogram somewhat resembles a normal distribution with a large number of students getting scores between 60 to 80 (closer to the mean) and the frequency tapering at both ends. We can add some basic formatting to the above plot such as axis labels and chart title to make it more clear.

import matplotlib.pyplot as plt

# scores in the Math class
math_scores = [72, 41, 65, 63, 82, 63, 51, 57, 39, 63,
               62, 68, 52, 76, 62, 73, 72, 73, 71, 62,
               76, 53, 71, 79, 77, 35, 65, 59, 58, 70,
               73, 69, 59, 75, 73, 63, 65, 81, 46, 59,
               53, 71, 79, 80, 60, 60, 64, 40, 73, 75,
               68, 58, 81, 65, 55, 62, 82, 47, 85, 62,
               39, 77, 82, 78, 57, 58, 72, 75, 65, 68,
               86, 49, 39, 64, 54, 68, 85, 77, 62, 53,
               52, 76, 80, 84, 69, 61, 69, 65, 89, 97,
               71, 61, 77, 40, 83, 52, 78, 54, 64, 58]

# plot histogram
plt.hist(math_scores)
# add formatting
plt.xlabel("Score")
plt.ylabel("Students")
plt.title("Histogram of scores in the Math class")
plt.show()

Output:

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

Histogram of scores with axis labels and chart title

You can change the values on the y-axis from frequencies to probabilities with each bin representing its probability density using the density parameter which is False by default.

import matplotlib.pyplot as plt

# scores in the Math class
math_scores = [72, 41, 65, 63, 82, 63, 51, 57, 39, 63,
               62, 68, 52, 76, 62, 73, 72, 73, 71, 62,
               76, 53, 71, 79, 77, 35, 65, 59, 58, 70,
               73, 69, 59, 75, 73, 63, 65, 81, 46, 59,
               53, 71, 79, 80, 60, 60, 64, 40, 73, 75,
               68, 58, 81, 65, 55, 62, 82, 47, 85, 62,
               39, 77, 82, 78, 57, 58, 72, 75, 65, 68,
               86, 49, 39, 64, 54, 68, 85, 77, 62, 53,
               52, 76, 80, 84, 69, 61, 69, 65, 89, 97,
               71, 61, 77, 40, 83, 52, 78, 54, 64, 58]

# plot histogram
plt.hist(math_scores, density=True)
# add formatting
plt.xlabel("Score")
# plt.ylabel("Students")
plt.title("Histogram of scores in the Math class")
plt.show()

Output:

Histogram plotted with probability densities in matplotlib

In the above chart, each bin basically represents the “density” of the frequency concentrated in it. That is, for a bin, density = count inside the bin / (total count x bin width)

In the above examples, you can see that the hist() function, by default, uses 10 equal-width bins. You can specify your own bin count using the bins parameter. For instance, if you want the histogram to have 20 bins:

import matplotlib.pyplot as plt

# scores in the Math class
math_scores = [72, 41, 65, 63, 82, 63, 51, 57, 39, 63,
               62, 68, 52, 76, 62, 73, 72, 73, 71, 62,
               76, 53, 71, 79, 77, 35, 65, 59, 58, 70,
               73, 69, 59, 75, 73, 63, 65, 81, 46, 59,
               53, 71, 79, 80, 60, 60, 64, 40, 73, 75,
               68, 58, 81, 65, 55, 62, 82, 47, 85, 62,
               39, 77, 82, 78, 57, 58, 72, 75, 65, 68,
               86, 49, 39, 64, 54, 68, 85, 77, 62, 53,
               52, 76, 80, 84, 69, 61, 69, 65, 89, 97,
               71, 61, 77, 40, 83, 52, 78, 54, 64, 58]


# plot histogram
plt.hist(math_scores, bins=20)
# add formatting
plt.xlabel("Score")
plt.ylabel("Students")
plt.title("Histogram of scores in the Math class")
plt.show()

Output:

Histogram of scores with 20 bins

You can see that at a higher bin size we get more thinner and granular bins. Also, note that except for the last bin the values in each bin include the lower bound and exclude the upper bound [include, exclude). For the final bin, both lower and upper bounds are included [include, include].

You can also specify your own bin edges which can be unequally spaced. For this, instead of passing an integer to the bins parameter, pass a sequence with the bin edges. For example, if you want to have bins 0 to 20, 20 to 50, 50 to 70, 70 to 90, and 90 to 100 :

import matplotlib.pyplot as plt

# scores in the Math class
math_scores = [72, 41, 65, 63, 82, 63, 51, 57, 39, 63,
               62, 68, 52, 76, 62, 73, 72, 73, 71, 62,
               76, 53, 71, 79, 77, 35, 65, 59, 58, 70,
               73, 69, 59, 75, 73, 63, 65, 81, 46, 59,
               53, 71, 79, 80, 60, 60, 64, 40, 73, 75,
               68, 58, 81, 65, 55, 62, 82, 47, 85, 62,
               39, 77, 82, 78, 57, 58, 72, 75, 65, 68,
               86, 49, 39, 64, 54, 68, 85, 77, 62, 53,
               52, 76, 80, 84, 69, 61, 69, 65, 89, 97,
               71, 61, 77, 40, 83, 52, 78, 54, 64, 58]

# specify the bin edges
bin_edges = [0,20,50,70,90,100]

# plot histogram
plt.hist(math_scores, bins=bin_edges)
# add formatting
plt.xlabel("Marks in Math")
plt.ylabel("Students")
plt.title("Histogram of scores in the Math class")
plt.show()

Output:

Histogram plotted with matplotlib having custom bin edges.

Here, the bins are unequally spaced because of the bin edges specified. Matplotlib’s hist() function also has a number of other parameters to customize your plots even further. For more, refer to its documentation.

With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having matplotlib version 3.2.2


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

Scroll to Top