In this tutorial, we’ll look at the difference between pandas join() and merge() functions and when exactly should you use them. If you’re looking for a refresher on the different types of joins, you can refer to Understanding Joins in Pandas
Difference between pandas join and merge
Both the functions are used to perform joins on pandas dataframes but they’re used in different scenarios. The join()
function is generally used to join dataframes on index whereas the merge()
function is a more versatile function that lets you join dataframes on indexes as well as columns.
You can, however, use the join() function to join the left dataframe on a column but the right dataframe must be joined on its index. So, if you want to perform database-style joins on dataframes use the more versatile merge() function.
To join dataframes using the join()
function, use the following syntax:
df_merged = df_left.join(df_right, how='left')
Note that, the join() function by default performs a join on index. If you want to join the left dataframe on a column, pass the column name to the on
parameter. But remember, the right dataframe must be joined on its index. Also note, that the join() function performs a left join by default.
To join dataframes using the merge()
function, use the following syntax:
df_merged = df_left.merge(df_right, on='Col1', how='inner')
Here, the on
parameter is used to pass the column to merge the dataframes on and the how
parameter is used to determine the kind of join to perform. The merge() function performs an inner join by default.
Examples
Let’s look at some example use-cases to illustrate the difference between the two.
1. Merge dataframes on index
Let’s merge two dataframes on their indexes using join() and merge(). For this, we’ll create two dataframes “df_names” and “df_portfolio”. Both having a common column “Symbol”.
import pandas as pd
# stock names
df_names = pd.DataFrame({
'Name': ['Apple', 'Mircosoft', 'Tesla', 'Google', 'Netflix'],
'Symbol': ['APPL', 'MSFT', 'TSLA', 'GOOG', 'NFLX']
})
# sample porfolio
df_portfolio = pd.DataFrame({
'Symbol': ['TSLA', 'APPL', 'GOOG', 'AMZN'],
'Shares': [20, 50, 50, 100]
})
The two dataframes look as follows:

Now, let’s set the “Symbol” column as the index for both the dataframes.
df_names.set_index('Symbol', inplace=True)
df_portfolio.set_index('Symbol', inplace=True)
This is how the dataframes look after setting the index:

Left Join using pandas join()
Now, let’s perform a left join on the two dataframes by merging them on their indexes using the join()
function.
df_merged = df_names.join(df_portfolio, how='left')
print(df_merged)
Output:
Name Shares
Symbol
APPL Apple 50.0
MSFT Mircosoft NaN
TSLA Tesla 20.0
GOOG Google 50.0
NFLX Netflix NaN
You can see the resulting dataframe with NaNs where a match was not found in the right dataframe, df_portfolio
.
Left Join using pandas merge()
Let’s perform the same left join on index as above but this time using the merge()
function.
df_merged = df_names.merge(df_portfolio, left_index=True, right_index=True, how='left')
print(df_merged)
Output:
Name Shares
Symbol
APPL Apple 50.0
MSFT Mircosoft NaN
TSLA Tesla 20.0
GOOG Google 50.0
NFLX Netflix NaN
You can see that we get the same result as we got with the join() function. Note that, the merge()
function required passing values for parameters left_index
and right_index
to specify that we wanted to perform a merge on the indexes.
2. Merge the left dataframe on column and right on index
Now, let’s merge the left dataframe on the column “Symbol” and the right dataframe on its index. First, we’ll recreate the two dataframes and then set the “Symbol” column as the index only for the right dataframe.
import pandas as pd
# stock names
df_names = pd.DataFrame({
'Name': ['Apple', 'Mircosoft', 'Tesla', 'Google', 'Netflix'],
'Symbol': ['APPL', 'MSFT', 'TSLA', 'GOOG', 'NFLX']
})
# sample porfolio
df_portfolio = pd.DataFrame({
'Symbol': ['TSLA', 'APPL', 'GOOG', 'AMZN'],
'Shares': [20, 50, 50, 100]
})
# set Symbol as axis for df_names
df_portfolio.set_index('Symbol', inplace=True)
This is how the two dataframes look:

Left Join using join()
df_merged = df_names.join(df_portfolio, on='Symbol', how='left')
print(df_merged)
Output:
Name Symbol Shares
0 Apple APPL 50.0
1 Mircosoft MSFT NaN
2 Tesla TSLA 20.0
3 Google GOOG 50.0
4 Netflix NFLX NaN
You can see that we were able to merge the left dataframe on the column “Symbol” and the right datafrome on its index using the pandas join()
function.
Left Join using merge()
df_merged = df_names.merge(df_portfolio, left_on='Symbol', right_index=True, how='left')
print(df_merged)
Output:
Name Symbol Shares
0 Apple APPL 50.0
1 Mircosoft MSFT NaN
2 Tesla TSLA 20.0
3 Google GOOG 50.0
4 Netflix NFLX NaN
The pandas merge()
function was able to merge the left dataframe on the column “Symbol” and the right one on its index. Note that, we had to pass right_index=True
to indicate that the right dataframe should be merged on its index.
So far so good.
3. Merge the left dataframe on index and right on column
Now, let’s merge the left dataframe on its index and the right dataframe on the column “Symbol”. First, we’ll recreate the two dataframes and then set the “Symbol” column as the index only for the left dataframe.
import pandas as pd
# stock names
df_names = pd.DataFrame({
'Name': ['Apple', 'Mircosoft', 'Tesla', 'Google', 'Netflix'],
'Symbol': ['APPL', 'MSFT', 'TSLA', 'GOOG', 'NFLX']
})
# sample porfolio
df_portfolio = pd.DataFrame({
'Symbol': ['TSLA', 'APPL', 'GOOG', 'AMZN'],
'Shares': [20, 50, 50, 100]
})
# set Symbol as axis for df_names
df_names.set_index('Symbol', inplace=True)
This is how the two dataframes look:

Left Join using join()
If you try to join the left dataframe on index and the right dataframe on a column using join()
, it’ll result in an error.
df_merged = df_names.join(df_portfolio, on='Symbol', how='left')
print(df_merged)
Output:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in
----> 1 df_merged = df_names.join(df_portfolio, on='Symbol', how='left')
2 print(df_merged)
** lines skipped to keep the error message short **
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
This happened because the join()
function requires the right dataframe to be merged on its index.
Left Join using merge()
The merge()
function allows you the flexibility to merge the left dataframe on index and the right one on a column.
df_merged = df_names.merge(df_portfolio, left_index=True, right_on='Symbol', how='left')
print(df_merged)
Output:
Name Symbol Shares
1.0 Apple APPL 50.0
NaN Mircosoft MSFT NaN
0.0 Tesla TSLA 20.0
2.0 Google GOOG 50.0
NaN Netflix NFLX NaN
Note that, we had to pass left_index=True
to indicate that the left dataframe should be merged on its index.
4. Merge dataframes on column
Now, let’s merge the two dataframes directly on the common column “Symbol”.
import pandas as pd
# stock names
df_names = pd.DataFrame({
'Name': ['Apple', 'Mircosoft', 'Tesla', 'Google', 'Netflix'],
'Symbol': ['APPL', 'MSFT', 'TSLA', 'GOOG', 'NFLX']
})
# sample porfolio
df_portfolio = pd.DataFrame({
'Symbol': ['TSLA', 'APPL', 'GOOG', 'AMZN'],
'Shares': [20, 50, 50, 100]
})
This is how the dataframes look:

Left join using join()
The pandas join()
function results in an error if you try to merge the two dataframes on a common column.
df_merged = df_names.join(df_portfolio, on='Symbol', how='left')
print(df_merged)
Output:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in
----> 1 df_merged = df_names.join(df_portfolio, on='Symbol', how='left')
2 print(df_merged)
** lines skipped to keep the error message short **
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
Left Join using pandas merge
If the column to merge on has the same name in the two dataframes, pass it to the on
parameter. If they have different names, pass the respective names to left_on
and right_on
parameters.
df_merged = df_names.merge(df_portfolio, on='Symbol', how='left')
print(df_merged)
Output:
Name Symbol Shares
0 Apple APPL 50.0
1 Mircosoft MSFT NaN
2 Tesla TSLA 20.0
3 Google GOOG 50.0
4 Netflix NFLX NaN
Conclusion
The pandas merge()
and join()
functions are used to join dataframes but have different use-cases. If you want to perform database-style joins on dataframes prefer the merge()
function as it gives you more flexibility on how to perform the merge. If you, instead, want to join dataframes specifically on their indexes use the join()
function.
The following table compares the two functions on the different combinations of joining dataframe on column or index.
DataFrame Merge | join() | merge() |
---|---|---|
On index | ✔️ | ✔️ |
Left on column, right on index | ✔️ | ✔️ |
Left on index, right on column | ❌ | ✔️ |
On column | ❌ | ✔️ |
With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having pandas version 1.0.5
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.
Tutorials on combining data in pandas –
- Pandas – Join vs Merge
- Pandas – Merge DataFrames on Multiple Columns
- Understanding Joins in Pandas
- Append Rows to a Pandas DataFrame
- Concat DataFrames in Pandas