In this tutorial, we will look at how to rename the columns of a dataframe resulting from the merge of two dataframes.
When you’re merging two dataframes that have columns with the same names, pandas gives them default suffixes like _x
, or _y
to keep the names of the columns different in the resulting dataframe after the merge.
Let’s look at an example.
import pandas as pd # Create the first DataFrame df1 = pd.DataFrame({'Employee': ['John Smith', 'Jane Doe', 'Bob Johnson'], 'Job Title': ['Data Scientist', 'Product Manager', 'CEO'], 'Years of Experience': [7, 6, 15]}) # Create the second DataFrame df2 = pd.DataFrame({'Employee': ['John Smith', 'Jane Doe', 'Bob Johnson', 'Alice Williams'], 'Job Title': ['Accountant', 'Marketing Manager', 'CFO', 'HR Manager'], 'Years of Experience': [2, 1, 10, 6]})
Here, we created two dataframes, df1
and df2
. Now, both the dataframes store information about some employees in an office, the only difference is that df1
stores the current information whereas the data in df2
is from five years back.
Let’s merge the two dataframes on the “Employee” column and see what we get by default.
# merge df1 and df2 in a left join df = df1.merge(df2, on="Employee", how="left") # display the resulting dataframe df
Output:
You can see that since the merged dataframe had column names with the same name, the merge()
function gave them suffixes to identify which column came from which dataframe. The columns with _x
suffix came from the left dataframe and the columns with the _y
suffix came from the right dataframe (in the merge).
How to change columns names after the merge?
There are two ways to take this up –
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
- Use the pandas dataframe
rename()
function to change the column names of specific columns in the merged dataframe.
Pass a dictionary of{old_col_name: new_col_name}
as an argument to thecolumns
parameter of therename()
function. You can also assign the dataframe new column names by using a list, for example,df.columns = new_col_names_ls
. - Alternatively, you can specify the suffix in the
merge()
function itself to distinguish between the columns using thesuffixes
parameter. Pass a tuple, for example,(left_suffix, right_suffix)
to thesuffixes
parameter.
Let’s now look at both methods.
Change column names after the merge
# merge df1 and df2 in a left join df = df1.merge(df2, on="Employee", how="left") # specify new column names df.columns = ["Employee", "Job Title_Current", "Years of Experience_Current", "Job Title_Past", "Years of Experience_Past"] # display the dataframe df
Output:
Specify the suffixes in the merge()
function itself
Let’s specify the suffixes to be used by the common column names from the left and the right dataframes using the suffixes
parameter.
# merge df1 and df2 in a left join with custom suffixes df = df1.merge(df2, on="Employee", how="left", suffixes=("_Current", "_Past")) # display the dataframe df
Output:
You can see that the resulting dataframe has the passed suffixes.
You might also be interested in –
- Pandas – Rename Columns in Dataframe after Groupby
- Pandas – Rename Column Names
- Reset Index in Pandas – With Examples
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.