Wednesday, September 19, 2012

Removing Duplicate Records

Based on email that I receive, Removing Duplicates in databases is one of the most popular and useful topics for Real-Life Excel professionals. Being able to do this, enables a user to Combine Databases with repetitive data into a properly structured format. It is also a great Troubleshooting Tool for eliminating unwanted duplicate entries.

The really good news is that doing this with the last three versions of Excel (2007, 2010, and the beta 2013) is Remarkably Easy. Here is all you have to do to complete this simple task:

1. Select the entire database (be careful not to select any self-generating key fields)
2. Go to the Data tab in the Data Tools group and click Remove Duplicates
3. Select the column(s) on which you want to base your removal and click OK
4. A message will pop up telling you how many (if any) values were found and removed

There are always little twists that can pop up in Excel World, of course. One such instance was raised by a reader earlier today, where he had a database in which he wanted to remove Duplicate Records based on the Absolute Value of a field. In other words, he wished to remove any record where field values had the same absolute value but different sign, e.g. 7 & -7, 11 & -11, etc.

Although there are several ways to approach this, one Straightforward Method is to:

1. Add an Additional Field and use the ABS( ) Absolute Value function.
2. Then it is a Stress-Free procedure to use steps #2-4 above to weed out the unwanted data

At first glance Removing Duplicates to maintain the Integrity of your data may seem like a challenging endeavor. As is the case with nearly everything in Excel, however, once you know how to do it, it seems so Simple, you laugh…

No comments: