In this tutorial, we will look at how to set a column in a pandas dataframe as the index of the dataframe with the help of some examples.
How to set a column as the index in a pandas dataframe?
You can use the pandas dataframe set_index()
function to set a column as the index of a pandas dataframe. Pass the column name as an argument.
The following is the syntax –
# set a column as dataframe's index df.set_index("Col") # set combination of columns as dataframe's index df.set_index(["Col1", "Col2"])
It returns the updated dataframe (with the column set as the index). You can also pass inplace=True
to modify the dataframe in place, in which case, the set_index()
function changes the original dataframe and does not return any value.
You can also use a combination of columns as the index for the dataframe, in that case, pass the list of columns as the argument.
Examples
Let’s now look at some examples of using the above syntax –
First, we will create a dataframe that we will use throughout this tutorial.
import pandas as pd # employee data data = { "Name": ["Jim", "Dwight", "Angela", "Tobi"], "Age": [26, 28, 27, 32], "Department": ["Sales", "Sales", "Accounting", "HR"], "Salary": [55000, 60000, 52000, 45000] } # create pandas dataframe df = pd.DataFrame(data) # display the dataframe df
Output:
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
Here, we created a dataframe with information about some employees in an office. You can see that the dataframe has the columns – “Name”, “Age”, “Department”, and “Salary”.
Example 1 – Set a column as the dataframe index
The dataframe created above has a default index, let’s modify the dataframe such that it uses the “Name” column as the index in the dataframe.
# set Name column as index df.set_index("Name", inplace=True) # display the dataframe df
Output:
You can see that now the dataframe’s index is the values from the “Name” column.
Let’s print out the dataframe’s index.
# display the dataframe index df.index
Output:
Index(['Jim', 'Dwight', 'Angela', 'Tobi'], dtype='object', name='Name')
The dataframe’s index is the values from the “Name” column in the original dataframe.
You can use the pandas dataframe reset_index()
function to reset the index of the dataframe back to its default index.
# reset the index df.reset_index(inplace=True) # display the dataframe df
Output:
Here, we passed inplace=True
to modify the dataframe in place.
Example 2 – Set multiple columns as dataframe index
You can also set the index of a dataframe as a combination of multiple columns.
For example, let’s use the combination of the “Name” and the “Age” columns as the index for the above dataframe.
# set combination of Name and Age columns as index df.set_index(["Name", "Age"], inplace=True) # display the dataframe df
Output:
You can see that now the dataframe index is a combination of the “Name” and “Age” column values.
If you print out the dataframe index you’ll see that the dataframe index is now of MultiIndex
type.
# display the dataframe index df.index
Output:
MultiIndex([( 'Jim', 26), ('Dwight', 28), ('Angela', 27), ( 'Tobi', 32)], names=['Name', 'Age'])
Summary
In this tutorial, we looked at how to set a column in a pandas dataframe as its index. The following are the key takeaways –
- Use the pandas dataframe
set_index()
function to set a column (or combination of columns) as the index of the dataframe. - If you set a combination of columns as the index, the dataframe index will be of
MultiIndex
type. - Use the pandas dataframe
reset_index()
function to reset the index of the dataframe to its default.
You might also be interested in –
- Reset Index in Pandas – With Examples
- Pandas – Add Column From Another Dataframe
- Pandas – Add an Empty Column to a DataFrame
- Pandas Dataframe insert() function (Examples)
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.