Pandas – Split Column by Delimiter

Pandas dataframes are great for manipulating data. It might happen that you have a column containing delimited string values, for example, “A, B, C” and you want the values to be present in separate columns. In this tutorial, we will look at how to split a text column in a pandas dataframe into multiple columns by delimiter.

split pandas column into multiple columns

You can use the pandas Series.str.split() function to split strings in the column around a given separator/delimiter. It is similar to the python string split() function but applies to the entire dataframe column. The following is the syntax:

# df is a pandas dataframe
# default parameters pandas Series.str.split() function
df['Col'].str.split(pat, n=-1, expand=False)
# to split into multiple columns by delimiter
df['Col'].str.split(delimiter, expand=True)

Pass expand=True to split strings into separate columns. Note that it is False by default.

Use the parameter n to pass the number of splits you want. It is -1 by default to split by all the instances of the delimiter.

Let’s look at the usage of the above method with the help of some examples. First, we will create a dataframe that we will be using throughout this tutorial.

import pandas as pd

# create a dataframe
df = pd.DataFrame({
    'Address': ['4860 Sunset Boulevard,San Francisco,California',
                '3055 Paradise Lane,Salt Lake City,Utah',
                '682 Main Street,Detroit,Michigan',
                '9001 Cascade Road,Kansas City,Missouri']
})
# display the dataframe
df

Output:

Dataframe with a single text column containing comma separated values.

Here we created a dataframe df having a single column “Address”. Note that the strings in the “Address” column have a certain pattern to it. If we look closely, this column can be split into three columns – street name, city, and state.

Let’s go ahead and split this column.

Apply the pandas series str.split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also make sure to pass True to the expand parameter.

# split column into multiple columns by delimiter 
df['Address'].str.split(',', expand=True)

Output:

Columns resulting from the split of "Address" column

You can see that it results in three different columns. If you don’t pass expand=True, the function returns a single column (a pandas series) with the values resulting from the split inside a list.

# using default value for expand parameter
df['Address'].str.split(',')

Output:

0    [4860 Sunset Boulevard, San Francisco, Califor...
1           [3055 Paradise Lane, Salt Lake City, Utah]
2                [682 Main Street, Detroit,  Michigan]
3           [9001 Cascade Road, Kansas City, Missouri]
Name: Address, dtype: object

Here, the values in the text column have been split but this didn’t result in creation of separate columns. The split values are inside a list.

You can still split this column of lists into multiple columns but if your objective is to split a text column into multiple columns its better to pass expand=True to the pandas Series.str.split() function.

Let’s now add the three new columns resulting from the split to the dataframe df.

# split column and add new columns to df
df[['Street', 'City', 'State']] = df['Address'].str.split(',', expand=True)
# display the dataframe
df

Output:

Dataframe with additional three columns resulting from the split of the Address column

You can see that the columns resulting from the split have now been added to the dataframe df.

In the above examples, we split the text column “Address” on every occurrence of the delimiter “,”. That is, two splits resulting in three different column. You can also specify the number of splits you want. For example, let’s make only one split resulting in two columns.

# custom number of splits
df['Address'].str.split(',', n=1, expand=True)

Output:

Dataframe resulting from split of Address column with just one split.

Here, n=1 denotes that we want to make only one split. This split will happen on the first occurrence of the delimiter from the left. You can see that the city and state values are together in the second column.

Just like you used the pandas Series.str.split() function to split a column, you can use the pandas Series.str.cat() to concatenate values from multiple columns into a single column. Here’s an example –

# concat columns by separator
df['Street'].str.cat(df[['City', 'State']], sep=',')

Output:

0    4860 Sunset Boulevard,San Francisco,California
1            3055 Paradise Lane,Salt Lake City,Utah
2                 682 Main Street,Detroit, Michigan
3            9001 Cascade Road,Kansas City,Missouri
Name: Street, dtype: object

You can see that we get the same value as in the “Address” column.

With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having pandas version 1.0.5


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Leave a Reply

Your email address will not be published. Required fields are marked *