Writing a dataframe to a pickle file instead of a CSV file can be very helpful particularly if you want to preserve the state of the dataframe. If you are loading the pickle file back, it saves you time on data type transformations since the data type information is already saved in the pickle file. In this tutorial, we will look at how to save a pandas dataframe to a pickle file.
How to save dataframe to a pickle file?
You can use the pandas dataframe to_pickle()
function to write a pandas dataframe to a pickle file. The following is the syntax:
df.to_pickle(file_name)
Here, file_name is the name with which you want to save the dataframe (generally as a .pkl
file).
Examples
Let’s look at an example of using the above syntax to save a dataframe as a pickle file. First, we will create a sample dataframe:
import pandas as pd # create a dataframe df = pd.DataFrame({ 'Date': ['2021-04-01','2021-04-02','2021-04-03','2021-04-04','2021-04-05'], 'Units Sold': [120, 123, 150, 160, 140] }) # display the dataframe print(df)
Output:
Date Units Sold 0 2021-04-01 120 1 2021-04-02 123 2 2021-04-03 150 3 2021-04-04 160 4 2021-04-05 140
We created a dataframe with two columns – “Date” and “Units Sold”. Let’s look at more information on the dataframe using the pandas dataframe info()
function.
print(df.info())
Output:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 5 non-null object 1 Units Sold 5 non-null int64 dtypes: int64(1), object(1) memory usage: 208.0+ bytes None
You can see the “Date” column is of type object. Since its a temporal field let’s transform its data type to the datatime format using the pandas to_datetime()
function.
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
# change data type to datetime df['Date'] = pd.to_datetime(df['Date']) # check data type print(df.info())
Output:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 5 non-null datetime64[ns] 1 Units Sold 5 non-null int64 dtypes: datetime64[ns](1), int64(1) memory usage: 208.0 bytes None
Now that we have transformed the dataframe let’s save this dataframe to a pickle file locally.
# save dataframe to pickle file df.to_pickle('sales_df.pkl')
The above dataframe has been saved as sales_df.pkl
in the current working directory. We can read the pickle file back as a dataframe using the pandas read_pickle() function.
# read pickle file as dataframe df_sales = pd.read_pickle('sales_df.pkl') # display the dataframe print(df_sales)
Output:
Date Units Sold 0 2021-04-01 120 1 2021-04-02 123 2 2021-04-03 150 3 2021-04-04 160 4 2021-04-05 140
Now let’s check the data types of the columns in this dataframe.
print(df.info())
Output:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 5 non-null datetime64[ns] 1 Units Sold 5 non-null int64 dtypes: datetime64[ns](1), int64(1) memory usage: 208.0 bytes None
Notice that the data type of the “Date” column is datetime.
An important thing to note is that pickle files retain the original state of the dataframe. That is, the format of the data is preserved and you don’t need to apply additional transformations after loading the data as you would have to if you had saved it as a CSV file.
For more on the pandas to_pickle() function, refer to its documentation.
With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having pandas version 1.0.5
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.