In this tutorial, we’ll look at the difference between pandas join() and merge() functions and when exactly should you use them. If you’re looking for a refresher on the different types of joins, you can refer to Understanding Joins in Pandas
Difference between pandas join and merge
Both the functions are used to perform joins on pandas dataframes but they’re used in different scenarios. The join() function is generally used to join dataframes on index whereas the merge() function is a more versatile function that lets you join dataframes on indexes as well as columns.
You can, however, use the join() function to join the left dataframe on a column but the right dataframe must be joined on its index. So, if you want to perform database-style joins on dataframes use the more versatile merge() function.
To join dataframes using the join() function, use the following syntax:
df_merged = df_left.join(df_right, how='left')Note that, the join() function by default performs a join on index. If you want to join the left dataframe on a column, pass the column name to the on parameter. But remember, the right dataframe must be joined on its index. Also note, that the join() function performs a left join by default.
To join dataframes using the merge() function, use the following syntax:
df_merged = df_left.merge(df_right, on='Col1', how='inner')Here, the on parameter is used to pass the column to merge the dataframes on and the how parameter is used to determine the kind of join to perform. The merge() function performs an inner join by default.
Examples
Let’s look at some example use-cases to illustrate the difference between the two.
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
1. Merge dataframes on index
Let’s merge two dataframes on their indexes using join() and merge(). For this, we’ll create two dataframes “df_names” and “df_portfolio”. Both having a common column “Symbol”.
import pandas as pd
# stock names
df_names = pd.DataFrame({
    'Name': ['Apple', 'Mircosoft', 'Tesla', 'Google', 'Netflix'],
    'Symbol': ['APPL', 'MSFT', 'TSLA', 'GOOG', 'NFLX']
})
# sample porfolio
df_portfolio = pd.DataFrame({
    'Symbol': ['TSLA', 'APPL', 'GOOG', 'AMZN'],
    'Shares': [20, 50, 50, 100]
})The two dataframes look as follows:

Now, let’s set the “Symbol” column as the index for both the dataframes.
df_names.set_index('Symbol', inplace=True)
df_portfolio.set_index('Symbol', inplace=True)This is how the dataframes look after setting the index:

Left Join using pandas join()
Now, let’s perform a left join on the two dataframes by merging them on their indexes using the join() function.
df_merged = df_names.join(df_portfolio, how='left')
print(df_merged)Output:
             Name  Shares
Symbol                   
APPL        Apple    50.0
MSFT    Mircosoft     NaN
TSLA        Tesla    20.0
GOOG       Google    50.0
NFLX      Netflix     NaNYou can see the resulting dataframe with NaNs where a match was not found in the right dataframe, df_portfolio. 
Left Join using pandas merge()
Let’s perform the same left join on index as above but this time using the merge() function.
df_merged = df_names.merge(df_portfolio, left_index=True, right_index=True, how='left')
print(df_merged)Output:
             Name  Shares
Symbol                   
APPL        Apple    50.0
MSFT    Mircosoft     NaN
TSLA        Tesla    20.0
GOOG       Google    50.0
NFLX      Netflix     NaNYou can see that we get the same result as we got with the join() function. Note that, the merge() function required passing values for parameters left_index and right_index to specify that we wanted to perform a merge on the indexes.
2. Merge the left dataframe on column and right on index
Now, let’s merge the left dataframe on the column “Symbol” and the right dataframe on its index. First, we’ll recreate the two dataframes and then set the “Symbol” column as the index only for the right dataframe.
import pandas as pd
# stock names
df_names = pd.DataFrame({
    'Name': ['Apple', 'Mircosoft', 'Tesla', 'Google', 'Netflix'],
    'Symbol': ['APPL', 'MSFT', 'TSLA', 'GOOG', 'NFLX']
})
# sample porfolio
df_portfolio = pd.DataFrame({
    'Symbol': ['TSLA', 'APPL', 'GOOG', 'AMZN'],
    'Shares': [20, 50, 50, 100]
})
# set Symbol as axis for df_names
df_portfolio.set_index('Symbol', inplace=True)This is how the two dataframes look:

Left Join using join()
df_merged = df_names.join(df_portfolio, on='Symbol', how='left')
print(df_merged)Output:
        Name Symbol  Shares
0      Apple   APPL    50.0
1  Mircosoft   MSFT     NaN
2      Tesla   TSLA    20.0
3     Google   GOOG    50.0
4    Netflix   NFLX     NaNYou can see that we were able to merge the left dataframe on the column “Symbol” and the right datafrome on its index using the pandas join() function. 
Left Join using merge()
df_merged = df_names.merge(df_portfolio, left_on='Symbol', right_index=True, how='left')
print(df_merged)Output:
        Name Symbol  Shares
0      Apple   APPL    50.0
1  Mircosoft   MSFT     NaN
2      Tesla   TSLA    20.0
3     Google   GOOG    50.0
4    Netflix   NFLX     NaNThe pandas merge() function was able to merge the left dataframe on the column “Symbol” and the right one on its index. Note that, we had to pass right_index=True to indicate that the right dataframe should be merged on its index.
So far so good.
3. Merge the left dataframe on index and right on column
Now, let’s merge the left dataframe on its index and the right dataframe on the column “Symbol”. First, we’ll recreate the two dataframes and then set the “Symbol” column as the index only for the left dataframe.
import pandas as pd
# stock names
df_names = pd.DataFrame({
    'Name': ['Apple', 'Mircosoft', 'Tesla', 'Google', 'Netflix'],
    'Symbol': ['APPL', 'MSFT', 'TSLA', 'GOOG', 'NFLX']
})
# sample porfolio
df_portfolio = pd.DataFrame({
    'Symbol': ['TSLA', 'APPL', 'GOOG', 'AMZN'],
    'Shares': [20, 50, 50, 100]
})
# set Symbol as axis for df_names
df_names.set_index('Symbol', inplace=True)This is how the two dataframes look:

Left Join using join()
If you try to join the left dataframe on index and the right dataframe on a column using join(), it’ll result in an error.
df_merged = df_names.join(df_portfolio, on='Symbol', how='left')
print(df_merged)Output:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
 in 
----> 1 df_merged = df_names.join(df_portfolio, on='Symbol', how='left')
      2 print(df_merged)
** lines skipped to keep the error message short **  
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
 This happened because the join() function requires the right dataframe to be merged on its index. 
Left Join using merge()
The merge() function allows you the flexibility to merge the left dataframe on index and the right one on a column.
df_merged = df_names.merge(df_portfolio, left_index=True, right_on='Symbol', how='left')
print(df_merged)Output:
          Name Symbol  Shares
1.0      Apple   APPL    50.0
NaN  Mircosoft   MSFT     NaN
0.0      Tesla   TSLA    20.0
2.0     Google   GOOG    50.0
NaN    Netflix   NFLX     NaNNote that, we had to pass left_index=True to indicate that the left dataframe should be merged on its index. 
4. Merge dataframes on column
Now, let’s merge the two dataframes directly on the common column “Symbol”.
import pandas as pd
# stock names
df_names = pd.DataFrame({
    'Name': ['Apple', 'Mircosoft', 'Tesla', 'Google', 'Netflix'],
    'Symbol': ['APPL', 'MSFT', 'TSLA', 'GOOG', 'NFLX']
})
# sample porfolio
df_portfolio = pd.DataFrame({
    'Symbol': ['TSLA', 'APPL', 'GOOG', 'AMZN'],
    'Shares': [20, 50, 50, 100]
})This is how the dataframes look:

Left join using join()
The pandas join() function results in an error if you try to merge the two dataframes on a common column.
df_merged = df_names.join(df_portfolio, on='Symbol', how='left')
print(df_merged)Output:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
 in 
----> 1 df_merged = df_names.join(df_portfolio, on='Symbol', how='left')
      2 print(df_merged)  
** lines skipped to keep the error message short **
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
 Left Join using pandas merge
If the column to merge on has the same name in the two dataframes, pass it to the on parameter. If they have different names, pass the respective names to left_on and right_on parameters.
df_merged = df_names.merge(df_portfolio, on='Symbol', how='left')
print(df_merged)Output:
        Name Symbol  Shares
0      Apple   APPL    50.0
1  Mircosoft   MSFT     NaN
2      Tesla   TSLA    20.0
3     Google   GOOG    50.0
4    Netflix   NFLX     NaNConclusion
The pandas merge() and join() functions are used to join dataframes but have different use-cases. If you want to perform database-style joins on dataframes prefer the merge() function as it gives you more flexibility on how to perform the merge. If you, instead, want to join dataframes specifically on their indexes use the join() function. 
The following table compares the two functions on the different combinations of joining dataframe on column or index.
| DataFrame Merge | join() | merge() | 
|---|---|---|
| On index | ✔️ | ✔️ | 
| Left on column, right on index | ✔️ | ✔️ | 
| Left on index, right on column | ❌ | ✔️ | 
| On column | ❌ | ✔️ | 
With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having pandas version 1.0.5
Subscribe to our newsletter for more informative guides and tutorials. 
We do not spam and you can opt out any time.
Tutorials on combining data in pandas –
- Pandas – Join vs Merge
- Pandas – Merge DataFrames on Multiple Columns
- Understanding Joins in Pandas
- Append Rows to a Pandas DataFrame
- Concat DataFrames in Pandas


