Skip to content

Get Rid of the Duplicate Records

2009 January 14
tags: ,
by Shannon Lowder

Time after time I’ll get a table that has duplicates in it and I have to clear out all the duplicate entries while leaving a distinct set of records. There’s a couple of ways to solve the problem. Today I want to take you through my standard fix.

  1. Script the source table out to a temporary table.
  2. INSERT the DISTINCT records into the temporary table.
  3. TRUNCATE the original table.
  4. INSERT the records back into the source table from the temporary table.

Let’s run through these steps in AdventureWorks.

First, we’re going to set up a table that will be filled with duplicate records.

CREATE TABLE dbo.Contact(
Title NVARCHAR(8)
, FirstName NVARCHAR(50)
, MiddleName NVARCHAR(50)
, LastName NVARCHAR(50)
, Suffix NVARCHAR(10)
)

Next, we’re going to fill that table with two copies of all contacts in our Person.Contact table.

INSERT INTO dbo.Contact
SELECT
Title, FirstName, MiddleName, LastName, Suffix
FROM Person.Contact
UNION ALL
SELECT
Title, FirstName, MiddleName, LastName, Suffix
FROM Person.Contact

OK, in case you’re not familiar with how to check for dupes, here’s a quick way to check. Select all the columns that should be DISTINCT, then GROUP by those columns, and add the HAVING COUNT(*) > 1 clause. This will show you any records that are duplicated in your table.

SELECT
Title, FirstName, MiddleName, LastName, Suffix
FROM dbo.Contact
GROUP BY
Title, FirstName, MiddleName, LastName, Suffix
HAVING COUNT(*) > 1

I’m showing 19,907 records have been duplicated. Your mileage may vary.

Now, let’s create our temp table. My table of choice is a table variable. No need to create a table that will last longer than this operation, and since this is a one shot deal, I’m not very worried about optimizing the operation with indexes or anything too fancy.

DECLARE @Contact TABLE (
Title NVARCHAR(8)
, FirstName NVARCHAR(50)
, MiddleName NVARCHAR(50)
, LastName NVARCHAR(50)
, Suffix NVARCHAR(10)
)

Now, we INSERT the DISTINCT records into our table variable.

INSERT INTO @Contact
SELECT DISTINCT
Title, FirstName, MiddleName, LastName, Suffix
FROM dbo.Contact

Then we TRUNCATE the source table.

TRUNCATE TABLE dbo.Contact

Now sometimes you’ll have to keep records online while removing the duplicates, in which case you’ll have to come up with a way to delete the records that you keep records “online” while doing this delete. There are some ways to do it… can you think of any?

Finally, we want to insert the records back into the table.

INSERT INTO dbo.Contact
SELECT
Title, FirstName, MiddleName, LastName, Suffix
FROM @contact

Double check to make sure you got all the duplicates cleared up.

SELECT
Title, FirstName, MiddleName, LastName, Suffix
FROM dbo.Contact
GROUP BY
Title, FirstName, MiddleName, LastName, Suffix
HAVING COUNT(*) > 1

And that’s how you get rid of duplicates. I’d highly suggest applying a DISTINCT constraint to the table. I’d also suggest you consider a PRIMARY KEY, that way if you need to remove duplicates, you could remove them by ID, which would help you out if you needed to deal with duplicates while leaving your records “online”.

If you have any questions, please feel free to reach out. I’m here to help!

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS