Foreign Keys and Witchcraft
I’ve worked with hundreds databases in my career. I could probably design a handful of them from memory if asked. I have notes to remind me of lessons learned from most of them. When I start looking through that list one of the things I most often reference is the importance of Foreign keys. Why is that?
Because most of the time, all of the control over referential integrity has been moved up to the business logic or application layer. Developers will give me a handful of reasons why they didn’t add them to the database. Unfortunately many of the reasons can be proven false, once you start considering all the costs of not implementing them.
I’m creating this post, so I can refer to it later when developers start calling me a witch, and asking if they can burn me.
Myth #1: I can do updates quicker without foreign keys
This is probably the most common point I have to argue against. Developers think since they have to update the “parent” table first before they can update the “child” table, it will take more time. They usually fail to see the queries they have to write to make sure that ID exists in the “parent” table for every insert/update query they want to write.
Instead of having to write that validation query each time, why not let the database server handle that for you? MS SQL is a database management system. Let it manage those keys for you. Yes, you’ll have to trap an error in your code that handles the case where that primary key doesn’t already exist, but that’s what you’re doing now, more or less.
Myth #2: I won’t be able to import my files into the database with foreign keys
This is the second most common reason I receive from programmers on why they don’t want foreign keys. They want to read a file from another system (or client) directly into their dbms without considering the fact the data could be corrupt in some way. Instead they’ll import the data directly into their production tables, then try to clean it up.
How many of you built an SSIS package or DTS package to read data into your database, and got it right the first time? How many of those packages never received a bad file, a corrupt file?
What? None of you?
Of course you’ve had to handle errors on imports. That’s why you shouldn’t import directly to your production tables. Read the data into a workspace ( a disposable table). Read it in raw. Check for errors on that table. Once the data passes all your validation checks, then import it to your system.
I think perhaps the goal should be to never import bad data into the database. Catch it before it becomes an error.
Myth #3:Foreign key constraints cost you in performance
Another one I hear is “We don’t use foreign keys because they’ll cause the database to be slower.” And then I check the tables, and find an indexes covering every column (several times over).
Those indexes are costing you more than the foreign key constraint ever could. Every insert is having to be entered multiple times into your keys…that you’re using in your application layer to test data integrity.
If you’re having slow downs, you’ll often find the solution in letting the server do more for you, not less. If you limit your server by not letting it know there is a key relationship, the optimizer can’t take advantage of that knowledge.
Let’s take a quick look at an example of how a foreign key will actually help performance. Let’s do some counts.
select count(*) from table1 --659687, and it took logical reads 642 select count(*) from table2 --15204678, and it took logical reads 35574
Without a foreign key constraint between these tables, if we want to check the number of entries in table2 that match table1, we have to do a join. And we’ll have to include that join in every query we write, since the integrity cannot be confirmed without it.
select count(*) from table2 INNER JOIN table1 on table1.sharedKey =table2.sharedKey --14974478 /* Table 'table1'. Scan count 9, logical reads 705... Table 'table2'. Scan count 9, logical reads 35526... */
Now, if I add the foreign key constraint, we only have to count the records in table 2, since the server is now maintaining the integrity between the tables.
select count(*) from table2 --14974478, scan count9, logical reads 35574
Looks to me that 35574 is less than 36231 (705 + 35526). Smaller read counts are better than larger ones. Looks like adding a key made the query faster, rather than slower. Not even to mention that we can now improve the developers coding speed, since he or she no longer has to do a join to table1 each time.
Fact #1: It will be difficult to maintain consistent validation checks across all insert/update queries.
Since you don’t have the server maintaining the relationship between those tables, every time you write to the those tables, you’re going to have to do the validation yourself. That means that you will have to hope each developer does it the same way…or at least hope that they do it well enough to prevent bad data from getting into the tables. I’ve worked with marines before (OOOH-RAAA!), and they beat it into my head: “Hope is not a course of action!”
I’m sure I’ll find more notes on why foreign keys are not witchcraft. When I find them, I’ll update this post. If you have any reasons why developers should implement foreign keys, let me know! I want this article to be more complete. If you have arguments against it, I’ll listen. There may be some legitimate reasons to avoid them on occasion, but those I’ve seen are exceptions, not the rules.
Now, about those villagers with pitchforks… I think I’ll cast magic missile!