In this tutorial, we will look at the purpose and the usage of the pandas get_dummies() function with the help of some examples.
What does the pandas get_dummies()
function do?
The pandas get_dummies()
function is used to convert a categorical variable to indicator/dummy variables (columns). It returns the dummy coded data as a pandas dataframe.

Let’s apply this function to a list containing t-shirt sizes of 5 students in a class.
import pandas as pd # list with t-shirt sizes ls = ['M', 'L', 'S', 'XL', 'M'] # get dummies pd.get_dummies(ls)
Output:

You can see that we get the dummy data for the above list as a dataframe. Note that we have one column for each unique value in the list and each row represents a list item with the respective t-shirt size.
Highlighted programs for you
Flatiron School
Flatiron School
University of Maryland Global Campus
University of Maryland Global Campus
Creighton University
Creighton University
Encode Categorical Columns in Pandas Dataframe
Generally, the get_dummies()
a function is applied to categorical columns in a pandas dataframe to generate dummy (one-hot encoded) columns. This is an important step in data science / ML pipelines that require data in numeric form.
Let’s look at some examples of using the pandas get_dummies() function to encode categorical columns.
Get Dummies for a single column
Here we pass a single dataframe column to the get_dummies()
function. Let’s look at an example. First, we will create a sample dataframe.
import pandas as pd # create dataframe df = pd.DataFrame({ "student_id": [1, 2, 3, 4, 5, 6], "year": ["Senior", "Senior", "Junior", "Sophomore", "Freshman", "Freshman"], "shirt_size": ['M', 'L', 'S', 'S', 'M', 'M'] }) # display the dataframe df
Output:

We have a dataframe containing the student_id, year, and the t-shirt sizes of some students in a university. Let’s one-hot encode the “shirt_size” column.
# one-hot encode the "shirt_size" column pd.get_dummies(df["shirt_size"])
Output:

It returns a dataframe resulting from encoding the “shirt_size” column. Note that each unique size has a separate column.
You can also specify a prefix to use for all the dummy columns. Pass your desired prefix as an argument to the prefix
parameter of the get_dummies()
function.
# one-hot encode the "shirt_size" column pd.get_dummies(df["shirt_size"], prefix="shirt_size")
Output:

Here we use the column name, “shirt_size” as the prefix for each dummy column.
Get Dummies for Multiple Columns
You can also pass a dataframe with multiple columns to the get_dummies()
function. It returns a dummy-coded dataframe from all the categorical columns in the dataframe.
Let’s look at an example. This time, let’s pass the entire dataframe df used in the above example to the get_dummies()
function.
# one-hot encode the all categorical columns pd.get_dummies(df)
Output:

We get one-hot encoded data for all the categorical columns – “class” and “shirt_size” as a dataframe. Note that the numerical column “student_id” remained unchanged. Also, note that we didn’t have to specify a prefix here (the function itself used the column names as prefixes since there are multiple categorical fields in the dataframe).
You might also be interested in –
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.