Skip to Content

Pandas – Split Column by Delimiter

Pandas dataframes are great for manipulating data. It might happen that you have a column containing delimited string values, for example, “A, B, C” and you want the values to be present in separate columns. In this tutorial, we will look at how to split a text column in a pandas dataframe into multiple columns by delimiter.

split pandas column into multiple columns

You can use the pandas Series.str.split() function to split strings in the column around a given separator/delimiter. It is similar to the python string split() function but applies to the entire dataframe column. The following is the syntax:

# df is a pandas dataframe
# default parameters pandas Series.str.split() function
df['Col'].str.split(pat, n=-1, expand=False)
# to split into multiple columns by delimiter
df['Col'].str.split(delimiter, expand=True)

Pass expand=True to split strings into separate columns. Note that it is False by default.

Use the parameter n to pass the number of splits you want. It is -1 by default to split by all the instances of the delimiter.

Let’s look at the usage of the above method with the help of some examples. First, we will create a dataframe that we will be using throughout this tutorial.

import pandas as pd

# create a dataframe
df = pd.DataFrame({
    'Address': ['4860 Sunset Boulevard,San Francisco,California',
                '3055 Paradise Lane,Salt Lake City,Utah',
                '682 Main Street,Detroit,Michigan',
                '9001 Cascade Road,Kansas City,Missouri']
})
# display the dataframe
df

Output:

Dataframe with a single text column containing comma separated values.

Here we created a dataframe df having a single column “Address”. Note that the strings in the “Address” column have a certain pattern to them. If we look closely, this column can be split into three columns – street name, city, and state.

Let’s go ahead and split this column.

Apply the pandas series str.split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.

# split column into multiple columns by delimiter 
df['Address'].str.split(',', expand=True)

Output:

Columns resulting from the split of "Address" column

You can see that it results in three different columns. If you don’t pass expand=True, the function returns a single column (a pandas series) with the values resulting from the split inside a list.

# using default value for expand parameter
df['Address'].str.split(',')

Output:

0    [4860 Sunset Boulevard, San Francisco, Califor...
1           [3055 Paradise Lane, Salt Lake City, Utah]
2                [682 Main Street, Detroit,  Michigan]
3           [9001 Cascade Road, Kansas City, Missouri]
Name: Address, dtype: object

Here, the values in the text column have been split but this didn’t result in the creation of separate columns. The split values are inside a list.

You can still split this column of lists into multiple columns but if your objective is to split a text column into multiple columns it’s better to pass expand=True to the pandas Series.str.split() function.

Let’s now add the three new columns resulting from the split to the dataframe df.

# split column and add new columns to df
df[['Street', 'City', 'State']] = df['Address'].str.split(',', expand=True)
# display the dataframe
df

Output:

Dataframe with additional three columns resulting from the split of the Address column

You can see that the columns resulting from the split have now been added to the dataframe df.

In the above examples, we split the text column “Address” on every occurrence of the delimiter “,”. That is two splits resulting in three different columns. You can also specify the number of splits you want. For example, let’s make only one split resulting in two columns.

# custom number of splits
df['Address'].str.split(',', n=1, expand=True)

Output:

Dataframe resulting from split of Address column with just one split.

Here, n=1 denotes that we want to make only one split. This split will happen on the first occurrence of the delimiter from the left. You can see that the city and state values are together in the second column.

Just like you used the pandas Series.str.split() function to split a column, you can use the pandas Series.str.cat() to concatenate values from multiple columns into a single column. Here’s an example –

# concat columns by separator
df['Street'].str.cat(df[['City', 'State']], sep=',')

Output:

0    4860 Sunset Boulevard,San Francisco,California
1            3055 Paradise Lane,Salt Lake City,Utah
2                 682 Main Street,Detroit, Michigan
3            9001 Cascade Road,Kansas City,Missouri
Name: Street, dtype: object

You can see that we get the same value as in the “Address” column.

Summary

In this tutorial, we looked at how to split a string column by a delimiter into multiple columns in pandas. The following are the key takeaways –

  • You can apply the string split() function to a pandas series via the .str accessor to split values in a string column by a delimiter, which you can specify using the sep parameter.
  • By default, the above function results in a list of values resulting from the split for each row. To split the values into multiple columns, pass expand=True as an argument.
  • Also, you can specify the number of splits to make using the n parameter.

With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having pandas version 1.0.5

You might also be interested in –


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush

    Piyush is a data scientist passionate about using data to understand things better and make informed decisions. In the past, he's worked as a Data Scientist for ZS and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.