How to Remove Duplicate Rows in R? - Data Science Parichay

The R programming language comes with a number of useful functions and modules to work with and manipulate data in dataframes. In this tutorial, we will look at how to remove duplicate rows from a dataframe in R with the help of some examples.

Methods to remove duplicate rows from a dataframe in R

There are multiple ways to drop duplicate rows from a dataframe in R. For example, you can use the built-in unique() function or the dplyr library’s distinct() function to remove duplicate rows from a dataframe in R.

Let’s look at these methods with the help of some examples.

First, we will create a dataframe that we will be using throughout this tutorial.

# create a dataframe
employees_df = data.frame(
  "Name"= c("Dwight", "Jim", "Dwight", "Angela"),
  "Age"= c(28, 26, 28, 29),
  "Department"= c("Sales", "Sales", "Sales", "Accounting"),
  "Salary" = c(81000, 78000, 81000, 72000)
)
# display the dataframe
print(employees_df)

Output:

    Name Age Department Salary
1 Dwight  28      Sales  81000
2    Jim  26      Sales  78000
3 Dwight  28      Sales  81000
4 Angela  29 Accounting  72000

We now have a dataframe containing information about some employees in an office. The dataframe has columns – “Name”, “Age”, “Department”, and “Salary”.

Notice that there’s a duplicate row present in the above dataframe for the employee “Dwight”.

Let’s now try to remove duplicates from the above dataframe.

📚 Data Science Programs By Skill Level

Introductory ⭐

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

Method 1 – Remove duplicates using `unique()`

You can use the built-in unique() function in R to remove duplicates from a dataframe. Pass the dataframe as an argument to the function.

# remove duplicate rows
new_df = unique(employees_df)
# display the dataframe
print(new_df)

Output:

    Name Age Department Salary
1 Dwight  28      Sales  81000
2    Jim  26      Sales  78000
4 Angela  29 Accounting  72000

Here, we remove the duplicate rows from the above dataframe and save the result to new_df. You can see that the resulting dataframe does not contain any duplicate rows.

Notice that the rows in the resulting dataframe retain their indices from the original dataframe.

Method 2 – Remove duplicates using `dplyr`‘s `distinct()` function

You can also use the distinct() function available in the dplyr library to remove duplicate rows.

To use the distinct() function, you’ll have to first load the dplyr library which you can do using the library() function.

# load dplyr library
library("dplyr")

Now, let’s use the distinct() function to remove duplicates from our original dataframe employees_df.

# remove duplicate rows
new_df = distinct(employees_df)
# display the dataframe
print(new_df)

Output:

    Name Age Department Salary
1 Dwight  28      Sales  81000
2    Jim  26      Sales  78000
3 Angela  29 Accounting  72000

The resulting dataframe does not have any duplicate rows. Note that the rows in the resulting dataframe do not retain row indices from the original dataframe. They have a new index starting from 1.

Summary – Remove duplicate rows in R

In this tutorial, we looked at how to drop (or remove) duplicate rows from a dataframe in R. The following is a short summary of the steps mentioned in this tutorial.

Create a dataframe (skip this step if you already have a dataframe to operate on).
There are several ways to remove duplicates in R. Some methods that you can use –
- The R built-in unique() function.
- The dplyr library’s distinct() function.
  (Pass the dataframe as an argument to the above functions to remove duplicate rows).

Authors

Piyush Raj

Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

View all posts
Gottumukkala Sravan Kumar

View all posts

Methods to remove duplicate rows from a dataframe in R

Method 1 – Remove duplicates using unique()

Method 2 – Remove duplicates using dplyr‘s distinct() function

Summary – Remove duplicate rows in R

Authors

Method 1 – Remove duplicates using `unique()`

Method 2 – Remove duplicates using `dplyr`‘s `distinct()` function