save dataframe as pickle file in pandas

Save Pandas DataFrame to a Pickle File

Writing a dataframe to a pickle file instead of a CSV file can be very helpful particularly if you want to preserve the state of the dataframe. If you are loading the pickle file back, it saves you time on data type transformations since the data type information is already saved in the pickle file. In this tutorial, we will look at how to save a pandas dataframe to a pickle file.

You can use the pandas dataframe to_pickle() function to write a pandas dataframe to a pickle file. The following is the syntax:

df.to_pickle(file_name)

Here, file_name is the name with which you want to save the dataframe (generally as a .pkl file).

Let’s look at an example of using the above syntax to save a dataframe as a pickle file. First, we will create a sample dataframe:

import pandas as pd

# create a dataframe
df = pd.DataFrame({
    'Date': ['2021-04-01','2021-04-02','2021-04-03','2021-04-04','2021-04-05'],
    'Units Sold': [120, 123, 150, 160, 140]
})
# display the dataframe
print(df)

Output:

         Date  Units Sold
0  2021-04-01         120
1  2021-04-02         123
2  2021-04-03         150
3  2021-04-04         160
4  2021-04-05         140

We created a dataframe with two columns – “Date” and “Units Sold”. Let’s look at more information on the dataframe using the pandas dataframe info() function.

print(df.info())

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Date        5 non-null      object
 1   Units Sold  5 non-null      int64 
dtypes: int64(1), object(1)
memory usage: 208.0+ bytes
None

You can see the “Date” column is of type object. Since its a temporal field let’s transform its data type to the datatime format using the pandas to_datetime() function.

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

# change data type to datetime
df['Date'] = pd.to_datetime(df['Date'])
# check data type
print(df.info())

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   Date        5 non-null      datetime64[ns]
 1   Units Sold  5 non-null      int64         
dtypes: datetime64[ns](1), int64(1)
memory usage: 208.0 bytes
None

Now that we have transformed the dataframe let’s save this dataframe to a pickle file locally.

# save dataframe to pickle file
df.to_pickle('sales_df.pkl')

The above dataframe has been saved as sales_df.pkl in the current working directory. We can read the pickle file back as a dataframe using the pandas read_pickle() function.

# read pickle file as dataframe
df_sales = pd.read_pickle('sales_df.pkl')
# display the dataframe
print(df_sales)

Output:

        Date  Units Sold
0 2021-04-01         120
1 2021-04-02         123
2 2021-04-03         150
3 2021-04-04         160
4 2021-04-05         140

Now let’s check the data types of the columns in this dataframe.

print(df.info())

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   Date        5 non-null      datetime64[ns]
 1   Units Sold  5 non-null      int64         
dtypes: datetime64[ns](1), int64(1)
memory usage: 208.0 bytes
None

Notice that the data type of the “Date” column is datetime.

An important thing to note is that pickle files retain the original state of the dataframe. That is, the format of the data is preserved and you don’t need to apply additional transformations after loading the data as you would have to if you had saved it as a CSV file.

For more on the pandas to_pickle() function, refer to its documentation.

With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having pandas version 1.0.5


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

Scroll to Top