pandas dataframe deep and shallow copy

Pandas – Create DataFrame Copy

When working with dataframes in pandas, you may require to create a copy of a dataframe before performing your operations for several reasons (for example, you do not want to alter the original data, etc.). In this tutorial, we will look at how to copy a dataframe in Pandas with the help of some examples.

How to create a dataframe copy?

pandas dataframe deep and shallow copy

You can use the pandas dataframe copy() function to create a copy of a dataframe. It creates a deep copy by default. The following is the syntax –

# create dataframe df's copy
df.copy(deep=True)

The copy() function takes a single parameter deep as an argument. This parameter has two possible values, True and False.

  • When deep=True, (as it is by default), a new object is created with a copy of the original (calling) object’s data and indices. Changes made to this new object (the deep copy) will not be reflected in the original object.
  • On the other hand, when deep=False, a new object is created without copying the original (calling) objects data and indices, just the references to the data and the indices are copies. Changes made to this new object (the shallow copy) will be reflected in the original object.

Examples

Let’s now look at some examples of using the above syntax to copy dataframes in Pandas.

First, we will create a dataframe that we will use throughout this tutorial.

import pandas as pd

# create pandas dataframe
df = pd.DataFrame({
    "Roll Number": [1, 2, 3],
    "Name": ["Dwight", "Jim", "Pam"]
})
# display the dataframe
df

Output:

dataframe with "Roll Number" and "Name" columns and three rows

Here, we created a dataframe containing the “Roll Number” and “Name” information of three students in a class.

Example 1 – Create a deep copy of a pandas dataframe

Let’s create a deep copy of the dataframe created above using the pandas dataframe copy() function. Note that we do not need to pass any arguments as the copy() function creates a deep copy by default.

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

# create a deep copy
df1 = df.copy() # deep copy is created by default
# display the dataframe
df1

Output:

the resulting deep copy dataframe

We get the resulting dataframe, df1 (which is a deep copy of the dataframe df).

Since this dataframe is a deep copy, changes made to it will not be reflected in the original dataframe.

Let’s change the “Name” value in the first row (row index 0) from “Dwight” to “Tobi” and print out the dataframes df1 and df.

# make changes to df1
df1.loc[0, "Name"] = "Tobi"
# display df1
print(df1)
# display the original dataframe
print(df)

Output:

   Roll Number  Name
0            1  Tobi
1            2   Jim
2            3   Pam
   Roll Number    Name
0            1  Dwight
1            2     Jim
2            3     Pam

You can see that the update made to the dataframe df1 (the deep copy) did not have any effect on the dataframe df (the original dataframe).

Example 2 – Create a shallow copy of a pandas dataframe

Let’s now create a shallow copy of the dataframe df.

To create a shallow copy of a pandas dataframe, pass deep=False to the pandas dataframe copy() function.

# create a shallow copy
df2 = df.copy(deep=False)
# display the dataframe
df2

Output:

the resulting shallow copy dataframe

We get the resulting dataframe, df2 (which is a shallow copy of the dataframe df).

Since this dataframe is a shallow copy, changes made to it will be reflected in the original dataframe.

Let’s change the “Name” value in the first row (row index 0) from “Dwight” to “Tobi” and print out the dataframes df2 and df.

# make changes to df2
df2.loc[0, "Name"] = "Tobi"
# display df2
print(df2)
# display the original dataframe
print(df)

Output:

   Roll Number  Name
0            1  Tobi
1            2   Jim
2            3   Pam
   Roll Number  Name
0            1  Tobi
1            2   Jim
2            3   Pam

You can see that the update made to the dataframe df2 (the shallow copy) modifies the dataframe df (the original dataframe) as well.

Summary

In this tutorial, we looked at how to copy a dataframe in pandas. The following are the key takeaways –

  • Use the pandas dataframe copy() function to create a dataframe’s copy. It takes a single parameter deep.
    • Use deep=True (default value) to create a deep copy.
    • Use deep=False to create a shallow copy.
  • Changes made to the deep copy do not reflect in the original dataframe.
  • Changes made to the shallow copy are also reflected in the original dataframe.

You might also be interested in –


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

Scroll to Top