When working with dataframes in pandas, you may require to create a copy of a dataframe before performing your operations for several reasons (for example, you do not want to alter the original data, etc.). In this tutorial, we will look at how to copy a dataframe in Pandas with the help of some examples.
How to create a dataframe copy?
You can use the pandas dataframe copy()
function to create a copy of a dataframe. It creates a deep copy by default. The following is the syntax –
# create dataframe df's copy df.copy(deep=True)
The copy()
function takes a single parameter deep
as an argument. This parameter has two possible values, True
and False
.
- When
deep=True
, (as it is by default), a new object is created with a copy of the original (calling) object’s data and indices. Changes made to this new object (the deep copy) will not be reflected in the original object. - On the other hand, when
deep=False
, a new object is created without copying the original (calling) objects data and indices, just the references to the data and the indices are copies. Changes made to this new object (the shallow copy) will be reflected in the original object.
Examples
Let’s now look at some examples of using the above syntax to copy dataframes in Pandas.
First, we will create a dataframe that we will use throughout this tutorial.
import pandas as pd # create pandas dataframe df = pd.DataFrame({ "Roll Number": [1, 2, 3], "Name": ["Dwight", "Jim", "Pam"] }) # display the dataframe df
Output:
Here, we created a dataframe containing the “Roll Number” and “Name” information of three students in a class.
Example 1 – Create a deep copy of a pandas dataframe
Let’s create a deep copy of the dataframe created above using the pandas dataframe copy()
function. Note that we do not need to pass any arguments as the copy()
function creates a deep copy by default.
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
# create a deep copy df1 = df.copy() # deep copy is created by default # display the dataframe df1
Output:
We get the resulting dataframe, df1
(which is a deep copy of the dataframe df
).
Since this dataframe is a deep copy, changes made to it will not be reflected in the original dataframe.
Let’s change the “Name” value in the first row (row index 0) from “Dwight” to “Tobi” and print out the dataframes df1
and df
.
# make changes to df1 df1.loc[0, "Name"] = "Tobi" # display df1 print(df1) # display the original dataframe print(df)
Output:
Roll Number Name 0 1 Tobi 1 2 Jim 2 3 Pam Roll Number Name 0 1 Dwight 1 2 Jim 2 3 Pam
You can see that the update made to the dataframe df1
(the deep copy) did not have any effect on the dataframe df
(the original dataframe).
Example 2 – Create a shallow copy of a pandas dataframe
Let’s now create a shallow copy of the dataframe df
.
To create a shallow copy of a pandas dataframe, pass deep=False
to the pandas dataframe copy()
function.
# create a shallow copy df2 = df.copy(deep=False) # display the dataframe df2
Output:
We get the resulting dataframe, df2
(which is a shallow copy of the dataframe df
).
Since this dataframe is a shallow copy, changes made to it will be reflected in the original dataframe.
Let’s change the “Name” value in the first row (row index 0) from “Dwight” to “Tobi” and print out the dataframes df2
and df
.
# make changes to df2 df2.loc[0, "Name"] = "Tobi" # display df2 print(df2) # display the original dataframe print(df)
Output:
Roll Number Name 0 1 Tobi 1 2 Jim 2 3 Pam Roll Number Name 0 1 Tobi 1 2 Jim 2 3 Pam
You can see that the update made to the dataframe df2
(the shallow copy) modifies the dataframe df
(the original dataframe) as well.
Summary
In this tutorial, we looked at how to copy a dataframe in pandas. The following are the key takeaways –
- Use the pandas dataframe
copy()
function to create a dataframe’s copy. It takes a single parameterdeep
.- Use
deep=True
(default value) to create a deep copy. - Use
deep=False
to create a shallow copy.
- Use
- Changes made to the deep copy do not reflect in the original dataframe.
- Changes made to the shallow copy are also reflected in the original dataframe.
You might also be interested in –
- Copy Pandas DataFrame to the Clipboard
- Pandas – Iterate over Rows of a Dataframe
- Pandas – Add Column From Another Dataframe
- Pandas – Get DataFrame Size (With Examples)
- Drop Duplicates from a Pandas DataFrame
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.