Skip to Content

Pandas – Rename Columns After Merge

In this tutorial, we will look at how to rename the columns of a dataframe resulting from the merge of two dataframes.

rename column names after merge in pandas

When you’re merging two dataframes that have columns with the same names, pandas gives them default suffixes like _x, or _y to keep the names of the columns different in the resulting dataframe after the merge.

Let’s look at an example.

import pandas as pd

# Create the first DataFrame
df1 = pd.DataFrame({'Employee': ['John Smith', 'Jane Doe', 'Bob Johnson'],
                    'Job Title': ['Data Scientist', 'Product Manager', 'CEO'],
                    'Years of Experience': [7, 6, 15]})

# Create the second DataFrame
df2 = pd.DataFrame({'Employee': ['John Smith', 'Jane Doe', 'Bob Johnson', 'Alice Williams'],
                    'Job Title': ['Accountant', 'Marketing Manager', 'CFO', 'HR Manager'],
                    'Years of Experience': [2, 1, 10, 6]})

Here, we created two dataframes, df1 and df2. Now, both the dataframes store information about some employees in an office, the only difference is that df1 stores the current information whereas the data in df2 is from five years back.

Let’s merge the two dataframes on the “Employee” column and see what we get by default.

# merge df1 and df2 in a left join
df = df1.merge(df2, on="Employee", how="left")
# display the resulting dataframe
df

Output:

the resulting dataframe after the merge operation

You can see that since the merged dataframe had column names with the same name, the merge() function gave them suffixes to identify which column came from which dataframe. The columns with _x suffix came from the left dataframe and the columns with the _y suffix came from the right dataframe (in the merge).

How to change columns names after the merge?

There are two ways to take this up –

  1. Use the pandas dataframe rename() function to change the column names of specific columns in the merged dataframe.
    Pass a dictionary of {old_col_name: new_col_name} as an argument to the columns parameter of the rename() function. You can also assign the dataframe new column names by using a list, for example, df.columns = new_col_names_ls.
  2. Alternatively, you can specify the suffix in the merge() function itself to distinguish between the columns using the suffixes parameter. Pass a tuple, for example, (left_suffix, right_suffix) to the suffixes parameter.

Let’s now look at both methods.

Change column names after the merge

# merge df1 and df2 in a left join
df = df1.merge(df2, on="Employee", how="left")
# specify new column names
df.columns = ["Employee", "Job Title_Current", "Years of Experience_Current", "Job Title_Past", "Years of Experience_Past"]
# display the dataframe
df

Output:

column name changed of the merged dataframe

Specify the suffixes in the merge() function itself

Let’s specify the suffixes to be used by the common column names from the left and the right dataframes using the suffixes parameter.

# merge df1 and df2 in a left join with custom suffixes
df = df1.merge(df2, on="Employee", how="left", suffixes=("_Current", "_Past"))
# display the dataframe
df

Output:

columns with custom suffixes after the merge operation

You can see that the resulting dataframe has the passed suffixes.

You might also be interested in –


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. In the past, he's worked as a Data Scientist for ZS and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.