Manually Deleting Stinks!

trash can / ゴミ箱

Go ahead and set up your TroubledTable by running the Create Troubled Table T-SQl Script.  Now take a look at your duplicate rows.

SELECT email, COUNT(*)
FROM troubledTable
GROUP BY email

You’ve got 200 rows that have been duplicated over and over.  Not a pretty sight.  Imagine how long it you would take you to retrieve each distinct row, and manually clear out the duplicate rows.

SELECT * FROM troubledtable
WHERE email = '[email protected]'


FirstName LastName email
Clarke Glass [email protected]
Clarke Glass [email protected]
Clarke Glass [email protected]
Clarke Glass [email protected]

You’d have to then construct two T-SQL statements for each duplicate (all 200 of them).

DELETE FROM troubledtable WHERE email = '[email protected]'

INSERT INTO troubledtable
(firstname, lastname, email)
('Clarke','Glass','[email protected]')

That would take a very long time!  Most of us would argue it would take too long, and look for another method of de-duping that data, right?  Now if you had only one or two dupes, you might still consider this method.  Let’s look at one more reason you might not want to do that.  Look at the execution plan for that delete.

That table scan could be an issue if your table is quite large (1 million+ rows).  While you could change the table scan into a index seek by adding an index, you’d then create an additional operation during the delete: an index update.  If your table is a high transaction volume table, you may not want to trade the performance loss of an index operation for the gain in performance on your look up.

You need to keep performance concerns in mind when developing your solutions.  That way you can have the best answer as well as the fastest answer!  Next time we’re going to take a look at using temp tables to get rid of those pesky dupes.  As always, if you have any questions, please let me know. I’m here to help!

By Shannon Lowder

Shannon Lowder is the Database Engineer you've been looking for! Look no further for expertise in: Business Analysis to gather the business requirements for the database; Database Architecting to design the logical design of the database; Database Development to actually build the objects needed by the business logic; finally, Database Administration to keep the database running in top form, and making sure there is a disaster recovery plan.


Leave a comment

Your email address will not be published. Required fields are marked *