column of object type converted to datetime

Pandas – Convert String Column to datetime

If you have a column storing datetime information as string or object type, you might want to convert it into a datetime format. With the datetime format, you can perform a number of temporal operations on the column. For example, you can calculate the time delta between two dates, you can extract granular information like its month, year, etc. In this tutorial, we’ll look at how to convert a string or an object type column of a pandas dataframe to datetime.

You can use the pandas to_datetime() function to convert a string column to datetime. The following is the syntax:

df['Col'] = pd.to_datetime(df['Col'])

Here, “Col” is the column you want to convert to datetime format. The to_datetime() function also provides an argument format to specify the format of the string you want to convert. (See the examples below).

Let’s look at some examples of the using the to_datetime() function.

Suppose we’re given a column that has date values in string format and you want to convert them into datetime. For example, you have the daily sales data of an online shop.

import pandas as pd

# sales data of a shop
dates = ["2020-12-21", "2020-12-22", "2020-12-23", "2020-12-24", "2020-12-25", "2020-12-26"]
units_sold = [10, 8, 14, 21, 16, 12]

# create the dataframe
df = pd.DataFrame({
    "Purchase Date": dates,
    "Sales": units_sold
})

# print the dataframe
print(df)

Output:

  Purchase Date  Sales
0    2020-12-21     10
1    2020-12-22      8
2    2020-12-23     14
3    2020-12-24     21
4    2020-12-25     16
5    2020-12-26     12

Let’s examine the type of values in each column using pandas info() function.

df.info()

Output:

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Purchase Date  6 non-null      object
 1   Sales          6 non-null      int64 
dtypes: int64(1), object(1)
memory usage: 224.0+ bytes

From the above output you can see that the “Purchase Date” column is of object type. To convert it into datetime format, use the pandas to_datetime() function.

df['Purchase Date'] = pd.to_datetime(df['Purchase Date'])
df.info()

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Purchase Date  6 non-null      datetime64[ns]
 1   Sales          6 non-null      int64         
dtypes: datetime64[ns](1), int64(1)
memory usage: 224.0 bytes

You can see that now the “Purchase Date” column is of datetime dtype.

You can also pass a format string to the to_datetime() function to parse the string correctly. For example, if you have the dates like “21122020”, that is, in the “DDMMYYYY” format.

import pandas as pd

# sales data of a shop
dates = ["21122020", "22122020", "23122020", "24122020", "25122020", "26122020"]
units_sold = [10, 8, 14, 21, 16, 12]

# create the dataframe
df = pd.DataFrame({
    "Purchase Date": dates,
    "Sales": units_sold
})

# print the dataframe
print(df)

Output:

  Purchase Date  Sales
0      21122020     10
1      22122020      8
2      23122020     14
3      24122020     21
4      25122020     16
5      26122020     12

The string values in the above format are difficult to parse by the to_datetime() function and results in a ParseError if you try to do so without a format string. To convert such specific formats, pass the respective format string to the format parameter. In our case, the format string is '%d%m%Y'

df['Purchase Date'] = pd.to_datetime(df['Purchase Date'], format='%d%m%Y')
print(df)

Output:

  Purchase Date  Sales
0    2020-12-21     10
1    2020-12-22      8
2    2020-12-23     14
3    2020-12-24     21
4    2020-12-25     16
5    2020-12-26     12

Let’s look at the type of the column values using the pandas info() function.

df.info()

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Purchase Date  6 non-null      datetime64[ns]
 1   Sales          6 non-null      int64         
dtypes: datetime64[ns](1), int64(1)
memory usage: 224.0 bytes

You can see that the “Purchase Date” column was successfully converted to datetime.

Refer to the strftime documentation for different format string codes. And for more on the pandas to_datetime() function, refer to its documentation.

With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having pandas version 1.0.5


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

Scroll to Top