Colleagues at work and I have found horrible performance degradation when certain subqueries with joins are part of an ‘if’ clause and are serially applied to several tables with more than a 1,000 rows. This may sound obscure but this would be an obvious approach for looking for rows that satisfy a certain criteria in several linked similar sets of tables. You might have an authors database that has various printed materials tables (tblarticles, tblarticledetails, tblbooks, tblbookdetails, tblblogs and tblblogdetails) for which you want to search for a particular authorid.
Example from a function that you should avoid:
if exists(select id from tblsomething1, tblsomethingelse1 where tblsomething1.id_pk = tblsomethingelse1.id_fk and tblsomething1.aColumn = 3) return 1
if exists(select id from tblsomething2, tblsomethingelse2 where tblsomething2.id_pk = tblsomethingelse2.id_fk and tblsomething2.aColumn = 3) return 1
this query took about 40 seconds when the search through the first table (tblsomething1) didn’t find anything. It was reasonably fast if there was something found in the first ‘if exists’. The problem could be the if statement or perhaps the join itself. A colleague, James, suggested it may be a bug in the SQL engine query optimizer and I am inclined to agree. This a problem in both the Developer and Express editions but we have not seen much on the internet about this bug.
Forcing the join to be a ‘hash join’ does provide speed increases but that could bring up an entirely different set of problems down the road. We found that if we use something like :
if ((select count(id_pk) from tblsomething1, tblsomethingelse1 where tblsomething1.id_pk = tblsomethingelse1.id_fk and tblsomething1.aColumn = 3) > 0) return 1
it took less than an second. This is strange, though. Why would a count of all rows take less time than looking for existence?
We haven’t started evaluating SQL Server 2008 yet but we will certainly be looking for this problem when we do.