Pandas Get Dummies Function - get_dummies()

In this tutorial, we will look at the purpose and the usage of the pandas get_dummies() function with the help of some examples.

What does the pandas `get_dummies()` function do?

The pandas get_dummies() function is used to convert a categorical variable to indicator/dummy variables (columns). It returns the dummy coded data as a pandas dataframe.

Let’s apply this function to a list containing t-shirt sizes of 5 students in a class.

import pandas as pd

# list with t-shirt sizes
ls = ['M', 'L', 'S', 'XL', 'M']
# get dummies
pd.get_dummies(ls)

Output:

Dummy data for t-shirt sizes list as dataframe.

You can see that we get the dummy data for the above list as a dataframe. Note that we have one column for each unique value in the list and each row represents a list item with the respective t-shirt size.

Encode Categorical Columns in Pandas Dataframe

Generally, the get_dummies() a function is applied to categorical columns in a pandas dataframe to generate dummy (one-hot encoded) columns. This is an important step in data science / ML pipelines that require data in numeric form.

Let’s look at some examples of using the pandas get_dummies() function to encode categorical columns.

Get Dummies for a single column

Here we pass a single dataframe column to the get_dummies() function. Let’s look at an example. First, we will create a sample dataframe.

📚 Data Science Programs By Skill Level

Introductory ⭐

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

import pandas as pd

# create dataframe 
df = pd.DataFrame({
    "student_id": [1, 2, 3, 4, 5, 6],
    "year": ["Senior", "Senior", "Junior", "Sophomore", "Freshman", "Freshman"],
    "shirt_size": ['M', 'L', 'S', 'S', 'M', 'M']
})
# display the dataframe
df

Output:

We have a dataframe containing the student_id, year, and the t-shirt sizes of some students in a university. Let’s one-hot encode the “shirt_size” column.

# one-hot encode the "shirt_size" column
pd.get_dummies(df["shirt_size"])

Output:

result from pandas get_dummies on single column

It returns a dataframe resulting from encoding the “shirt_size” column. Note that each unique size has a separate column.

You can also specify a prefix to use for all the dummy columns. Pass your desired prefix as an argument to the prefix parameter of the get_dummies() function.

# one-hot encode the "shirt_size" column
pd.get_dummies(df["shirt_size"], prefix="shirt_size")

Output:

dummies from shirt_size column with prefix

Here we use the column name, “shirt_size” as the prefix for each dummy column.

Get Dummies for Multiple Columns

You can also pass a dataframe with multiple columns to the get_dummies() function. It returns a dummy-coded dataframe from all the categorical columns in the dataframe.

Let’s look at an example. This time, let’s pass the entire dataframe df used in the above example to the get_dummies() function.

# one-hot encode the all categorical columns
pd.get_dummies(df)

Output:

result of pandas get_dummies function on entire dataframe.

We get one-hot encoded data for all the categorical columns – “class” and “shirt_size” as a dataframe. Note that the numerical column “student_id” remained unchanged. Also, note that we didn’t have to specify a prefix here (the function itself used the column names as prefixes since there are multiple categorical fields in the dataframe).

Author

Piyush Raj

Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

View all posts

What does the pandas get_dummies() function do?

Encode Categorical Columns in Pandas Dataframe

Get Dummies for a single column

Get Dummies for Multiple Columns

Author

What does the pandas `get_dummies()` function do?