When working with pandas dataframe, you may find yourself in situations where you have a column with values as lists that you’d rather have in separate columns. In this tutorial, we will look at how to split a pandas dataframe column of lists into multiple columns with the help of some examples.
How to create multiple columns from a pandas column of lists?
To split a pandas column of lists into multiple columns, create a new dataframe by applying the tolist()
function to the column. The following is the syntax.
import pandas as pd # assuming 'Col' is the column you want to split df.DataFrame(df['Col'].to_list(), columns = ['c1', 'c2', 'c3'])
You can also pass the names of new columns resulting from the split as a list.
Let’s see it action with the help of an example. First, let’s create a dataframe with a column having a list of values for each row.
import pandas as pd # create a dataframe df = pd.DataFrame({ 'Name' : ['a', 'b', 'c'], 'Values': [[1,2,3], [2,0,1], [3,2,0]] }) # display the column df
Output:
Now, let’s split the column “Values” into multiple columns, one for each value in the list.
# new df from the column of lists split_df = pd.DataFrame(df['Values'].tolist()) # display the resulting df split_df
Output:
Here, we didn’t pass any column names, hence the column names are given by default. Let’s give specific column names to each of the new columns.
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
# new df from the column of lists split_df = pd.DataFrame(df['Values'].tolist(), columns=['v1', 'v2', 'v3']) # display the resulting df split_df
Output:
You may also want to concatenate the resulting dataframe from the split to the original dataframe. For this, use the pandas concat() function.
# new df from the column of lists split_df = pd.DataFrame(df['Values'].tolist(), columns=['v1', 'v2', 'v3']) # concat df and split_df df = pd.concat([df, split_df], axis=1) # display df df
Output:
You may also want to drop the column “Values” now that it has been split into three columns.
# drop Values df = df.drop('Values', axis=1) # display df df
Output:
Split column of lists of variable lengths
What would happen if you use the above method on a column which has lists of variable lengths?
Let’s see for ourselves.
# create a dataframe df = pd.DataFrame({ 'Name' : ['a', 'b', 'c'], 'Values': [[1,2,3], [2,0], [3,2,0]] }) # display the column df
Output:
The column “Values” has lists of different lengths.
# new df from the column of lists split_df = pd.DataFrame(df['Values'].tolist()) # display the resulting df split_df
Output:
If the lists in the column are of different lengths, the resulting dataframe will have columns equal to the length of the largest list with NaNs in places where the function doesn’t find a list value.
Pandas series tolist() function
With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having pandas version 1.0.5
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.