In this tutorial, we will look at how to filter a dataframe in R based on one or more column values with the help of some examples.
How to filter a dataframe in R?
The dplyr
library comes with a number of useful functions to work with a dataframe in R. You can use the dplyr
library’s filter()
function to filter a dataframe in R based on a conditional.
Pass the dataframe and the condition as arguments. The following is the syntax –
filter(dataframe, condition)
It returns a dataframe with the rows that satisfy the above condition.
Let’s now look at some examples of using the above syntax to filter a dataframe in R.
First, we will create a dataframe that we will be using throughout this tutorial.
# create a dataframe scores_df = data.frame( "Name"= c("Jim", "Jim", "Pam", "Pam", "Andy", "Andy", "Howard", "Howard"), "Subject"= c("English", "Math", "English", "Math", "English", "Math", "English", "Math"), "Score" = c(81, 93, 91, 76, 95, 88, 73, 67) ) # display the dataframe print(scores_df)
Output:
Name Subject Score 1 Jim English 81 2 Jim Math 93 3 Pam English 91 4 Pam Math 76 5 Andy English 95 6 Andy Math 88 7 Howard English 73 8 Howard Math 67
We now have a dataframe containing the scores of some students in different subjects in a high school examination. The above dataframe has columns – “Name”, “Subject”, and “Score”.
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
Before we can move ahead to filter the above dataframe using the filter()
function, we have to import the dplyr
library.
library(dplyr)
Here, we used the library()
function in R to import the dplyr
library.
Example 1 – Filter dataframe on a single condition
Let’s now filter the above dataframe such that we only get the scores for the subject “English” in the above dataframe.
Here, we want to filter the dataframe scores_df
such that the value in the “Subject” column is “English”. Pass the dataframe and the condition to the filter()
function.
# filter dataframe english_scores = filter(scores_df, Subject == "English") # display the dataframe print(english_scores)
Output:
Name Subject Score 1 Jim English 81 2 Pam English 91 3 Andy English 95 4 Howard English 73
We get only the rows with scores for “English” from the above dataframe.
Example 2 – Filter dataframe on multiple conditions
You can also use the filter()
function to filter a dataframe on multiple conditions in R. Pass each condition as a comma-separated argument.
Note that when you use comma-separated multiple conditions in the filter()
function, they are combined using &
.
Let’s look at an example – Let’s get the data for students who scored more than 90 in English. That is, we want to filter the above dataframe such that the “Subject” is “English” and the “Score” is greater than 90.
# filter dataframe on multiple conditions english_high_scores = filter(scores_df, Subject == "English", Score > 90) # display the dataframe print(english_high_scores)
Output:
Name Subject Score 1 Pam English 91 2 Andy English 95
We get the rows for students who scored more than 90 in “English”.
If you don’t want to use multiple conditions as comma-separated arguments, you can combine them first and then pass them as a single condition to the filter()
function.
Let’s do the same thing as above – get data for students who scored more than 90 in English.
# filter dataframe on multiple conditions english_high_scores = filter(scores_df, Subject == "English" & Score > 90) # display the dataframe print(english_high_scores)
Output:
Name Subject Score 1 Pam English 91 2 Andy English 95
We get the same result as above.
A good thing about combining conditions into a single condition is that you can also combine them using the |
(or) logical operator.
For example, let’s now filter the above dataframe such that the “Subject” is “English” or the score is greater than 90. That is, even if just one of these two conditions is TRUE
we select that row.
# filter dataframe on multiple conditions df = filter(scores_df, Subject == "English" | Score > 90) # display the dataframe print(df)
Output:
Name Subject Score 1 Jim English 81 2 Jim Math 93 3 Pam English 91 4 Andy English 95 5 Howard English 73
We get the rows where “Subject” is “English” or “Score” is greater than 90.
Summary – Filter Dataframe in R
In this tutorial, we looked at how to filter a dataframe in R. The following is a short summary of the steps mentioned in this tutorial.
- Create a dataframe (skip this step if you already have a dataframe to operate on).
- Use the
dplyr
library’sfilter()
function to filter the dataframe on a condition. You can also filter the dataframe on multiple conditions – Either pass the different conditions as comma-separated arguments or combine them first using logical operators and then pass a single condition to thefilter()
function.
You might also be interested in –
- Get the Maximum Value in an R Column
- Get Unique Values In R Dataframe Column
- How to Add a Row to a Dataframe in R?
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.