create a pandas column based on a condition

Pandas – Create Column based on a Condition

In this tutorial, we will look at how to create a pandas dataframe column with values based on a condition. We will look at some examples to demonstrate the methods mentioned.

create a pandas column based on a condition

Depending upon the use case, you can use np.where(), a list comprehension, a custom function, or a mapping with a dictionary, etc. to create a column with values based on some condition.

The general idea is to first get a list or a series of values that satisfy our condition and then assign the new column to those values. Let’s look at these methods with the help of examples.

First, we will create a sample dataframe that we will be using throughout this tutorial

import numpy as np
import pandas as pd

# create a dataframe
df = pd.DataFrame({
    'Name': ['Siraj', 'Emma', 'Alex', 'Maya', 'Lupin'],
    'Age': [23, 17, 16, 26, 21]
})
# display the dataframe
print(df)

Output:

    Name  Age
0  Siraj   23
1   Emma   17
2   Alex   16
3   Maya   26
4  Lupin   21

We now have a dataframe containing the names and ages of some residents in a suburb. Let’s now create a column “Is_eligible” storing whether they are eligible to vote in the coming elections. Note that a person is eligible to vote if he/she is 18 or more years of age.

Pass the condition to the np.where() function, followed by the value you want if the condition evaluates to True and then the value you want if the condition doesn’t evaluate to True.

# create a new column based on condition
df['Is_eligible'] = np.where(df['Age'] >= 18, True, False)
# display the dataframe
print(df)

Output:

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

    Name  Age  Is_eligible
0  Siraj   23         True
1   Emma   17        False
2   Alex   16        False
3   Maya   26         True
4  Lupin   21         True

You can see that for “Emma” and “Alex” we get False because they are below 18 years of age.

You can also use a list comprehension to fill column values based on a condition.

# create a new column based on condition
df['Is_eligible'] = [True if a >= 18 else False for a in df['Age']]
# display the dataframe
print(df)

Output:

    Name  Age  Is_eligible
0  Siraj   23         True
1   Emma   17        False
2   Alex   16        False
3   Maya   26         True
4  Lupin   21         True

We get the same result as above. In the list comprehension, we iterate through each value in the df[‘Age’] series and then fill the list with True or False depending on the condition. Finally, we assign this list to the new column.

Using functions might be an overkill for this particular example but you can use them more complicated conditions and requirements.

# create a function
def is_eligible(age):
    if age >= 18:
        return True
    else:
        return False

# create a new column based on condition
df['Is_eligible'] = df['Age'].apply(is_eligible)
# display the dataframe
print(df)

Output:

    Name  Age  Is_eligible
0  Siraj   23         True
1   Emma   17        False
2   Alex   16        False
3   Maya   26         True
4  Lupin   21         True

We get the same result as the above two examples.

For some simpler use cases, you can also use dictionary mapping to fill column values. For example, now that we have the “Is_eligible” column, let’s create a new column that tells whether a person is an adult or not, assuming the eligibility to be the same as the voting eligibility.

# create new column using ditionary mapping
df['Is_adult'] = df['Is_eligible'].map({True: 'Yes', False: 'No'})
# display the dataframe
print(df)

Output:

    Name  Age  Is_eligible Is_adult
0  Siraj   23         True      Yes
1   Emma   17        False       No
2   Alex   16        False       No
3   Maya   26         True      Yes
4  Lupin   21         True      Yes

Here, we used the pandas map() function to map values in one column with a dictionary to get values for the other column.

With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having numpy version 1.18.5 and pandas version 1.0.5


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

Scroll to Top