In this tutorial, we will look at how to create a pandas dataframe column with values based on a condition. We will look at some examples to demonstrate the methods mentioned.
How do you create a new column based on a condition in pandas?
Depending upon the use case, you can use np.where()
, a list comprehension, a custom function, or a mapping with a dictionary, etc. to create a column with values based on some condition.
The general idea is to first get a list or a series of values that satisfy our condition and then assign the new column to those values. Let’s look at these methods with the help of examples.
First, we will create a sample dataframe that we will be using throughout this tutorial
import numpy as np import pandas as pd # create a dataframe df = pd.DataFrame({ 'Name': ['Siraj', 'Emma', 'Alex', 'Maya', 'Lupin'], 'Age': [23, 17, 16, 26, 21] }) # display the dataframe print(df)
Output:
Name Age 0 Siraj 23 1 Emma 17 2 Alex 16 3 Maya 26 4 Lupin 21
We now have a dataframe containing the names and ages of some residents in a suburb. Let’s now create a column “Is_eligible” storing whether they are eligible to vote in the coming elections. Note that a person is eligible to vote if he/she is 18 or more years of age.
1. Create column using np.where()
Pass the condition to the np.where()
function, followed by the value you want if the condition evaluates to True and then the value you want if the condition doesn’t evaluate to True.
# create a new column based on condition df['Is_eligible'] = np.where(df['Age'] >= 18, True, False) # display the dataframe print(df)
Output:
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
Name Age Is_eligible 0 Siraj 23 True 1 Emma 17 False 2 Alex 16 False 3 Maya 26 True 4 Lupin 21 True
You can see that for “Emma” and “Alex” we get False
because they are below 18 years of age.
2. Create column using list comprehension
You can also use a list comprehension to fill column values based on a condition.
# create a new column based on condition df['Is_eligible'] = [True if a >= 18 else False for a in df['Age']] # display the dataframe print(df)
Output:
Name Age Is_eligible 0 Siraj 23 True 1 Emma 17 False 2 Alex 16 False 3 Maya 26 True 4 Lupin 21 True
We get the same result as above. In the list comprehension, we iterate through each value in the df[‘Age’] series and then fill the list with True
or False
depending on the condition. Finally, we assign this list to the new column.
3. Create column using a function
Using functions might be an overkill for this particular example but you can use them more complicated conditions and requirements.
# create a function def is_eligible(age): if age >= 18: return True else: return False # create a new column based on condition df['Is_eligible'] = df['Age'].apply(is_eligible) # display the dataframe print(df)
Output:
Name Age Is_eligible 0 Siraj 23 True 1 Emma 17 False 2 Alex 16 False 3 Maya 26 True 4 Lupin 21 True
We get the same result as the above two examples.
4. Create column using a dictionary mapping
For some simpler use cases, you can also use dictionary mapping to fill column values. For example, now that we have the “Is_eligible” column, let’s create a new column that tells whether a person is an adult or not, assuming the eligibility to be the same as the voting eligibility.
# create new column using ditionary mapping df['Is_adult'] = df['Is_eligible'].map({True: 'Yes', False: 'No'}) # display the dataframe print(df)
Output:
Name Age Is_eligible Is_adult 0 Siraj 23 True Yes 1 Emma 17 False No 2 Alex 16 False No 3 Maya 26 True Yes 4 Lupin 21 True Yes
Here, we used the pandas map()
function to map values in one column with a dictionary to get values for the other column.
With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having numpy version 1.18.5 and pandas version 1.0.5
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.