In this tutorial, we’ll look at the difference between pandas join() and merge() functions and when exactly should you use them. If you’re looking for a refresher on the different types of joins, you can refer to Understanding Joins in Pandas
Difference between pandas join and merge
Both the functions are used to perform joins on pandas dataframes but they’re used in different scenarios. The join()
function is generally used to join dataframes on index whereas the merge()
function is a more versatile function that lets you join dataframes on indexes as well as columns.
You can, however, use the join() function to join the left dataframe on a column but the right dataframe must be joined on its index. So, if you want to perform database-style joins on dataframes use the more versatile merge() function.
To join dataframes using the join()
function, use the following syntax:
df_merged = df_left.join(df_right, how='left')
Note that, the join() function by default performs a join on index. If you want to join the left dataframe on a column, pass the column name to the on
parameter. But remember, the right dataframe must be joined on its index. Also note, that the join() function performs a left join by default.
To join dataframes using the merge()
function, use the following syntax:
df_merged = df_left.merge(df_right, on='Col1', how='inner')
Here, the on
parameter is used to pass the column to merge the dataframes on and the how
parameter is used to determine the kind of join to perform. The merge() function performs an inner join by default.
Examples
Let’s look at some example use-cases to illustrate the difference between the two.
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
1. Merge dataframes on index
Let’s merge two dataframes on their indexes using join() and merge(). For this, we’ll create two dataframes “df_names” and “df_portfolio”. Both having a common column “Symbol”.
import pandas as pd
# stock names
df_names = pd.DataFrame({
'Name': ['Apple', 'Mircosoft', 'Tesla', 'Google', 'Netflix'],
'Symbol': ['APPL', 'MSFT', 'TSLA', 'GOOG', 'NFLX']
})
# sample porfolio
df_portfolio = pd.DataFrame({
'Symbol': ['TSLA', 'APPL', 'GOOG', 'AMZN'],
'Shares': [20, 50, 50, 100]
})
The two dataframes look as follows:

Now, let’s set the “Symbol” column as the index for both the dataframes.
df_names.set_index('Symbol', inplace=True)
df_portfolio.set_index('Symbol', inplace=True)
This is how the dataframes look after setting the index:

Left Join using pandas join()
Now, let’s perform a left join on the two dataframes by merging them on their indexes using the join()
function.
df_merged = df_names.join(df_portfolio, how='left')
print(df_merged)
Output:
Name Shares
Symbol
APPL Apple 50.0
MSFT Mircosoft NaN
TSLA Tesla 20.0
GOOG Google 50.0
NFLX Netflix NaN
You can see the resulting dataframe with NaNs where a match was not found in the right dataframe, df_portfolio
.
Left Join using pandas merge()
Let’s perform the same left join on index as above but this time using the merge()
function.
df_merged = df_names.merge(df_portfolio, left_index=True, right_index=True, how='left')
print(df_merged)
Output:
Name Shares
Symbol
APPL Apple 50.0
MSFT Mircosoft NaN
TSLA Tesla 20.0
GOOG Google 50.0
NFLX Netflix NaN
You can see that we get the same result as we got with the join() function. Note that, the merge()
function required passing values for parameters left_index
and right_index
to specify that we wanted to perform a merge on the indexes.
2. Merge the left dataframe on column and right on index
Now, let’s merge the left dataframe on the column “Symbol” and the right dataframe on its index. First, we’ll recreate the two dataframes and then set the “Symbol” column as the index only for the right dataframe.
import pandas as pd
# stock names
df_names = pd.DataFrame({
'Name': ['Apple', 'Mircosoft', 'Tesla', 'Google', 'Netflix'],
'Symbol': ['APPL', 'MSFT', 'TSLA', 'GOOG', 'NFLX']
})
# sample porfolio
df_portfolio = pd.DataFrame({
'Symbol': ['TSLA', 'APPL', 'GOOG', 'AMZN'],
'Shares': [20, 50, 50, 100]
})
# set Symbol as axis for df_names
df_portfolio.set_index('Symbol', inplace=True)
This is how the two dataframes look:

Left Join using join()
df_merged = df_names.join(df_portfolio, on='Symbol', how='left')
print(df_merged)
Output:
Name Symbol Shares
0 Apple APPL 50.0
1 Mircosoft MSFT NaN
2 Tesla TSLA 20.0
3 Google GOOG 50.0
4 Netflix NFLX NaN
You can see that we were able to merge the left dataframe on the column “Symbol” and the right datafrome on its index using the pandas join()
function.
Left Join using merge()
df_merged = df_names.merge(df_portfolio, left_on='Symbol', right_index=True, how='left')
print(df_merged)
Output:
Name Symbol Shares
0 Apple APPL 50.0
1 Mircosoft MSFT NaN
2 Tesla TSLA 20.0
3 Google GOOG 50.0
4 Netflix NFLX NaN
The pandas merge()
function was able to merge the left dataframe on the column “Symbol” and the right one on its index. Note that, we had to pass right_index=True
to indicate that the right dataframe should be merged on its index.
So far so good.
3. Merge the left dataframe on index and right on column
Now, let’s merge the left dataframe on its index and the right dataframe on the column “Symbol”. First, we’ll recreate the two dataframes and then set the “Symbol” column as the index only for the left dataframe.
import pandas as pd
# stock names
df_names = pd.DataFrame({
'Name': ['Apple', 'Mircosoft', 'Tesla', 'Google', 'Netflix'],
'Symbol': ['APPL', 'MSFT', 'TSLA', 'GOOG', 'NFLX']
})
# sample porfolio
df_portfolio = pd.DataFrame({
'Symbol': ['TSLA', 'APPL', 'GOOG', 'AMZN'],
'Shares': [20, 50, 50, 100]
})
# set Symbol as axis for df_names
df_names.set_index('Symbol', inplace=True)
This is how the two dataframes look:

Left Join using join()
If you try to join the left dataframe on index and the right dataframe on a column using join()
, it’ll result in an error.
df_merged = df_names.join(df_portfolio, on='Symbol', how='left')
print(df_merged)
Output:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in
----> 1 df_merged = df_names.join(df_portfolio, on='Symbol', how='left')
2 print(df_merged)
** lines skipped to keep the error message short **
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
This happened because the join()
function requires the right dataframe to be merged on its index.
Left Join using merge()
The merge()
function allows you the flexibility to merge the left dataframe on index and the right one on a column.
df_merged = df_names.merge(df_portfolio, left_index=True, right_on='Symbol', how='left')
print(df_merged)
Output:
Name Symbol Shares
1.0 Apple APPL 50.0
NaN Mircosoft MSFT NaN
0.0 Tesla TSLA 20.0
2.0 Google GOOG 50.0
NaN Netflix NFLX NaN
Note that, we had to pass left_index=True
to indicate that the left dataframe should be merged on its index.
4. Merge dataframes on column
Now, let’s merge the two dataframes directly on the common column “Symbol”.
import pandas as pd
# stock names
df_names = pd.DataFrame({
'Name': ['Apple', 'Mircosoft', 'Tesla', 'Google', 'Netflix'],
'Symbol': ['APPL', 'MSFT', 'TSLA', 'GOOG', 'NFLX']
})
# sample porfolio
df_portfolio = pd.DataFrame({
'Symbol': ['TSLA', 'APPL', 'GOOG', 'AMZN'],
'Shares': [20, 50, 50, 100]
})
This is how the dataframes look:

Left join using join()
The pandas join()
function results in an error if you try to merge the two dataframes on a common column.
df_merged = df_names.join(df_portfolio, on='Symbol', how='left')
print(df_merged)
Output:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in
----> 1 df_merged = df_names.join(df_portfolio, on='Symbol', how='left')
2 print(df_merged)
** lines skipped to keep the error message short **
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
Left Join using pandas merge
If the column to merge on has the same name in the two dataframes, pass it to the on
parameter. If they have different names, pass the respective names to left_on
and right_on
parameters.
df_merged = df_names.merge(df_portfolio, on='Symbol', how='left')
print(df_merged)
Output:
Name Symbol Shares
0 Apple APPL 50.0
1 Mircosoft MSFT NaN
2 Tesla TSLA 20.0
3 Google GOOG 50.0
4 Netflix NFLX NaN
Conclusion
The pandas merge()
and join()
functions are used to join dataframes but have different use-cases. If you want to perform database-style joins on dataframes prefer the merge()
function as it gives you more flexibility on how to perform the merge. If you, instead, want to join dataframes specifically on their indexes use the join()
function.
The following table compares the two functions on the different combinations of joining dataframe on column or index.
DataFrame Merge | join() | merge() |
---|---|---|
On index | ✔️ | ✔️ |
Left on column, right on index | ✔️ | ✔️ |
Left on index, right on column | ❌ | ✔️ |
On column | ❌ | ✔️ |
With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having pandas version 1.0.5
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.
Tutorials on combining data in pandas –
- Pandas – Join vs Merge
- Pandas – Merge DataFrames on Multiple Columns
- Understanding Joins in Pandas
- Append Rows to a Pandas DataFrame
- Concat DataFrames in Pandas