add new column to existing dataframe in pandas

Pandas – Add Column to Existing DataFrame

In this tutorial, we will look at how to add a column to an existing Pandas dataframe with the help of some examples.

When working with data in a pandas dataframe, it can be handy to know how to add a new column to an existing dataframe. For example, you have a dataframe of some students with columns, “Name” and “Age” and you want to add their heights to the dataframe as a new column.

add new column to existing dataframe in pandas

There are multiple ways to add a new column in Pandas –

  1. Create a new column with a list of values.
  2. Use the pandas dataframe insert() function.
  3. Use the pandas dataframe assign() function.

Let’s now look at the above methods with the help of some examples. We will take the same use case for all three methods – You have a dataframe with some information on students in a university, “Name” and “Age” columns already present, and you want to add a column containing their respective heights, “Height”.

import pandas as pd

# name and age data of some students
student_data = {
    "Name": ["Ram", "Steve", "Maria", "Hasan", "Emma"],
    "Age": [17, 18, 17, 19, 18]
}
# create a pandas dataframe
df = pd.DataFrame(student_data)

# display the dataframe
df

Output:

pandas dataframe containing name and age data of students

Create a new column by assigning it to a list of values

In this method, we directly initialize the column to the corresponding column values (for example, a list of heights). Use the following syntax to create a new column with a list of values –

df[new_column_name] = new_column_values # list or array-like

Let’s now apply the above syntax for our use case.

# name and age data of some students
student_data = {
    "Name": ["Ram", "Steve", "Maria", "Hasan", "Emma"],
    "Age": [17, 18, 17, 19, 18]
}
# create a pandas dataframe
df = pd.DataFrame(student_data)

# add new column for "Height" in cm
df["Height"] = [181, 178, 168, 165, 166]

# display the dataframe
df

Output:

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

pandas dataframe df with the new column "Height"

Here, we add the “Height” column to the dataframe df and assign a list of values to it. You can see the resulting column in the output.

Note that if the length of the list does not match the length of the dataframe, you’ll get an error.

# add new column for "Height" in cm
df["Height"] = [181, 178, 168]

Output:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [13], in <module>
      1 # add new column for "Height" in cm
----> 2 df["Height"] = [181, 178, 168]

...
ValueError: Length of values (3) does not match length of index (5)

We got ValueError on using a list of values whose length doesn’t match that of the dataframe’s index.

Use the pandas dataframe insert() function

You can also use the pandas dataframe insert() function to add a new column to an existing dataframe. It inserts a column into the dataframe at the specified location. The following is the syntax –

df.insert(loc, column, value, allow_duplicates=False)

The pandas dataframe insert() function takes the following arguments –

  • loc – (int) The index where the new column is to be inserted.
  • column – (str, num, or hashable object) The label (column name) for the inserted column.
  • value – (scaler, series, or array-like) The column values.
  • allow_duplicates – (bool) Optional argument. Determines whether you can have duplicate columns or not. It is False by default.

Let’s add the “Height” column at the end of our dataframe containing the “Name” and “Age” columns.

# name and age data of some students
student_data = {
    "Name": ["Ram", "Steve", "Maria", "Hasan", "Emma"],
    "Age": [17, 18, 17, 19, 18]
}
# create a pandas dataframe
df = pd.DataFrame(student_data)

# add new column for "Height" in cm
df.insert(2, "Height", [181, 178, 168, 165, 166])

# display the dataframe
df

Output:

pandas dataframe df with the new column "Height"

You can see that the resulting dataframe has a new column called “Height”. Note that we passed, 2 as the position to insert the new column and thus the “Height” column is now at position 2 (loc = 2).

Note that the pandas dataframe insert() function modifies the dataframe in place.

Use the pandas dataframe assign() function

You can also use the pandas dataframe assign() function to add new columns to a dataframe. It is used to assign new columns to a dataframe. The following is the syntax –

df.assign(new_column=new_column_values)

It returns a new dataframe with new columns in addition to the existing columns. Note that use the new_column name without any quotes (that is, do not pass a string or integer expression as the column name).

Let’s now add a “Height” column to our dataframe of students containing the “Name” and “Age” columns.

# name and age data of some students
student_data = {
    "Name": ["Ram", "Steve", "Maria", "Hasan", "Emma"],
    "Age": [17, 18, 17, 19, 18]
}
# create a pandas dataframe
df = pd.DataFrame(student_data)

# add new column for "Height" in cm
df = df.assign(Height=[181, 178, 168, 165, 166])

# display the dataframe
df

Output:

pandas dataframe df with the new column "Height"

You can see that the resulting dataframe now has the “Height” column.

Note that the pandas dataframe assign() function does not modify the dataframe in place, rather it returns the resulting dataframe.

Summary

In this tutorial, we looked at some of the different methods to add a column to an existing dataframe in Pandas. The following are the key takeaways.

  1. You can create a new column directly, by initializing it to a list (or array-like data structure) of values.
  2. The pandas dataframe insert() function is used to insert (or create) a new column at a specific position in the dataframe. Pass the position, column name, and the column values as argument. It modifies the dataframe in place.
  3. You can also use the pandas dataframe assign() function to add new columns to a dataframe. It does not modify the dataframe in place rather it returns the new dataframe with the added column(s).

You might also be interested in –


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

Scroll to Top