filter dataframe in R based on column value

R – Filter Dataframe Based on Column Value

In this tutorial, we will look at how to filter a dataframe in R based on one or more column values with the help of some examples.

How to filter a dataframe in R?

filter dataframe in R based on column value

The dplyr library comes with a number of useful functions to work with a dataframe in R. You can use the dplyr library’s filter() function to filter a dataframe in R based on a conditional.

Pass the dataframe and the condition as arguments. The following is the syntax –

filter(dataframe, condition)

It returns a dataframe with the rows that satisfy the above condition.

Let’s now look at some examples of using the above syntax to filter a dataframe in R.

First, we will create a dataframe that we will be using throughout this tutorial.

# create a dataframe
scores_df = data.frame(
  "Name"= c("Jim", "Jim", "Pam", "Pam", "Andy", "Andy", "Howard", "Howard"),
  "Subject"= c("English", "Math", "English", "Math", "English", "Math", "English", "Math"),
  "Score" = c(81, 93, 91, 76, 95, 88, 73, 67)
)
# display the dataframe
print(scores_df)

Output:

    Name Subject Score
1    Jim English    81
2    Jim    Math    93
3    Pam English    91
4    Pam    Math    76
5   Andy English    95
6   Andy    Math    88
7 Howard English    73
8 Howard    Math    67

We now have a dataframe containing the scores of some students in different subjects in a high school examination. The above dataframe has columns – “Name”, “Subject”, and “Score”.

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

Before we can move ahead to filter the above dataframe using the filter() function, we have to import the dplyr library.

library(dplyr)

Here, we used the library() function in R to import the dplyr library.

Example 1 – Filter dataframe on a single condition

Let’s now filter the above dataframe such that we only get the scores for the subject “English” in the above dataframe.

Here, we want to filter the dataframe scores_df such that the value in the “Subject” column is “English”. Pass the dataframe and the condition to the filter() function.

# filter dataframe
english_scores = filter(scores_df, Subject == "English")
# display the dataframe
print(english_scores)

Output:

    Name Subject Score
1    Jim English    81
2    Pam English    91
3   Andy English    95
4 Howard English    73

We get only the rows with scores for “English” from the above dataframe.

Example 2 – Filter dataframe on multiple conditions

You can also use the filter() function to filter a dataframe on multiple conditions in R. Pass each condition as a comma-separated argument.

Note that when you use comma-separated multiple conditions in the filter() function, they are combined using &.

Let’s look at an example – Let’s get the data for students who scored more than 90 in English. That is, we want to filter the above dataframe such that the “Subject” is “English” and the “Score” is greater than 90.

# filter dataframe on multiple conditions
english_high_scores = filter(scores_df, Subject == "English", Score > 90)
# display the dataframe
print(english_high_scores)

Output:

  Name Subject Score
1  Pam English    91
2 Andy English    95

We get the rows for students who scored more than 90 in “English”.

If you don’t want to use multiple conditions as comma-separated arguments, you can combine them first and then pass them as a single condition to the filter() function.

Let’s do the same thing as above – get data for students who scored more than 90 in English.

# filter dataframe on multiple conditions
english_high_scores = filter(scores_df, Subject == "English" & Score > 90)
# display the dataframe
print(english_high_scores)

Output:

  Name Subject Score
1  Pam English    91
2 Andy English    95

We get the same result as above.

A good thing about combining conditions into a single condition is that you can also combine them using the | (or) logical operator.

For example, let’s now filter the above dataframe such that the “Subject” is “English” or the score is greater than 90. That is, even if just one of these two conditions is TRUE we select that row.

# filter dataframe on multiple conditions
df = filter(scores_df, Subject == "English" | Score > 90)
# display the dataframe
print(df)

Output:

    Name Subject Score
1    Jim English    81
2    Jim    Math    93
3    Pam English    91
4   Andy English    95
5 Howard English    73

We get the rows where “Subject” is “English” or “Score” is greater than 90.

Summary – Filter Dataframe in R

In this tutorial, we looked at how to filter a dataframe in R. The following is a short summary of the steps mentioned in this tutorial.

  1. Create a dataframe (skip this step if you already have a dataframe to operate on).
  2. Use the dplyr library’s filter() function to filter the dataframe on a condition. You can also filter the dataframe on multiple conditions – Either pass the different conditions as comma-separated arguments or combine them first using logical operators and then pass a single condition to the filter() function.

You might also be interested in –


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

Scroll to Top