Sometimes your Excel spreadsheet gets away from you and you might find that it's in need of a good clean up. A particularly common problem is to encounter duplicate information in multiple cells or rows. So here are some easy ways to find and delete duplicate data in Excel to help fix it.
How to Find Duplicate Values in Excel
It is good practice to find and highlight duplicate values in Excel before you delete them. Deleting the values permanently removes them from the Excel spreadsheet, so highlighting them first gives you an opportunity to review the duplicates and make sure you don't need them.
In our example, we will use the following spreadsheet:
Our fictional user is tracking all the fountain pens she has inked up. The spreadsheet contains information about the pen and the color of ink currently in the pen. But, she suspects that she accidentally entered some of her pens more than once.
The easiest way to identify duplicate values is to use conditional formatting. To highlight duplicate values using conditional formatting follow these steps:
- Select the cells you want to check for duplicates.
- Under the Home tab, click on Conditional Formatting in the Styles group.
- Select Highlight Cells Rules > Duplicate Values.
- A popup box appears and allows you to choose a style format for the duplicate cells.
- Once finished, click OK.
This method highlights the duplicates in each column or row. As you can see below, any company, model, or ink color listed more than once appears highlighted.
For this example, we need to look at the Model column to find duplicate pens. We have two Student, 41, and Preppy pens. The 41s have different inks, so they are likely different pens. But the Student and the Preppy pens might be duplicates.
As you can see, this method isn't ideal.
It finds every single duplicate cell, but in this case we are only interested in finding duplicate rows. You can have many different pens from the same company with the same ink. But you are less likely to have more than one pen that is the same company, model, and ink color.
How to Highlight Duplicate Rows in Excel with a Custom Formula
Excel allows you to create custom conditional formatting with formulas. We can use this method to identify duplicate rows. Selecting this option is similar to how we selected the duplicate values above, except instead of selecting Highlight Cell Rules, select New Rule instead.
This brings up a popup menu. Set the Style to Classic, then open the next dropdown menu and choose Use a formula to determine which cells to format.
The formula we want to enter is:
COUNTIFS($A$2:$A$14,$A2,$B$2:$B$14,$B2,$C$2:$C$14,$C2)>1
Let's take a closer look at this formula.
Conditional formating formulas have to return an answer that is either true or false. If the answer is true, the formatting is applied. This equation is true when it counts more than one row with the same information.
To check the rows for duplicates, the formula first selects a column ($A$2:$A$14). We use the absolute location because we want all to use the same range of cells when evaluating rows for duplicates.
The next variable is the target that we are checking for a duplicate value ($A2). This time, the absolute location is used for the column, but not the row. This allows our formula to incrementally check each row.
We repeat this for each column in the row.
Once you enter the formula, don't forget to choose a formatting style. The default is no style. So, even if you do everything right, if you forget that step, you won't see the results.
Afterward, our table looked like this:
This method only highlights the entire rows that are duplicates. Once you've found all the duplicates in your Excel spreadsheet, you can decide if they need to be removed.
How to Remove Duplicates in Excel
Removing duplicates is easy in Excel. Under the Data tab, in the Data Tools group, you will find an option to Remove Duplicates. This tool is very handy because it allows you to select which columns you would like to check.
To use this tool, select your data and then click on Remove Duplicates.
When you do so, a popup appears that lets you choose which columns to check for duplicates. If you Select All columns, it only removes duplicate rows.
However, you do not have to choose all of the columns. For example, if I wanted to know which inks are currently in the pens, you could select Ink Colour and remove duplicate colors. This will leave the first pen that uses a new color and remove any subsequent entry.
Using Excel's Remove Duplicates tool automatically deletes the duplicate entries.
If you only want to temporarily remove duplicate entries, but not delete them, then you might want to filter your data instead.
How to Filter Out Duplicates in Excel
Filters allow you to manipulate how much of your data is shown, without actually removing any of the values. You can find the filter options under the Data tab in the Sort & Filter group.
There are many different ways to filter data in Excel. The example below only shows how to filter out duplicate values:
- Select the data you would like to filter through.
- Click Advanced in the Sort & Filter Group.
- In the popup, enable the option to show Unique Records Only.
- Click OK.
Filtering this data makes Excel hide any duplicate rows. So, in our example, the duplicate Student and Preppy pens were hidden.
But the duplicate data is not gone, it is just hidden from view. This is why our table is still formatted according to the rule we created earlier, despite no duplicate pens appearing.
In fact, if you look along the row numbers, you can see where the two rows are hidden. The rows jump from seven to nine and jump again between 12 and 14. The data is not gone, it is just out of sight.
Reviewing How to Remove Duplicates
The first step in cleaning up your data is identifying any duplicates. Removing duplicates in Excel permanently deletes the data. Highlighting them gives you an opportunity to evaluate the duplicates first. And if you're interested in duplicate rows, you need to use conditional formatting formulas to find them.
And finally, if you don't want to delete your duplicate data, consider using a filter instead. This hides them from view without actually removing any duplicate data from Excel.