How to determine where a SQL Server 2005 Index is needed

The right index in the right place can really up your database performance.  But applying too many indices can lead to a very nasty overhead problem. So how do you learn where to spend your index resources where they’re most needed?
Knowing your database schema is crucial but a complex database can make deciding where to put your index a very difficult task.
An article by Ian Stirk in January’s MSDN magazine really helped me. The brilliant code and my knowledge of the schema helped pinpoint places where I should put indices. Look, especially, at STEP04 [Identify the missing (or incomplete indexes) (TOP 20), for ALL databases].
Highly recommended.

SQL Server Performance and temporary tables

Yesterday I golbed about SQL Server performance and I thought I’d continue on in that vein today.
Temporary tables where a feature in SQL Server that I tended to avoid. I figured that I should handle everything through code and using subqueries (with IN and EXISTS clauses) or even the new table variables (well… new since SQL Server 2000), but there are excellent efficiencies to be gained by using the tempdb database.
There are two types of temporary table: local (prefixed with #) or the global (##). I haven’t found much use for the latter type, but the former is useful when you are doing repeated tasks, in a stored procedure, with the same information. Instead of querying the data each time with IN or EXISTS you can dump it into a temporary table and then join to that table.
Here’s an example using yesterday’s Adventureworks code. Let’s say I’m writing a stored procedure to do a great deal of work on Products which have a Product Number beginning with FW. I could use this:

USE AdventureWorks ;
GO
create table #productids (productid int NOT NULL primary key);

insert into #productids
SELECT ProductID
FROM Production.Product p
WHERE ProductNumber LIKE ‘FW%’
order by ProductID

select
/* or do something else with */
SalesOrderID
FROM Sales.SalesOrderDetail sod, #productids p
WHERE sod.ProductID = p.productid

/* and now do something else with it */

/* but at the end don’t forget to: */
drop table #productids

This can really cut down on your execution time and you also benefit by making your code more modular and readable.

PS: If you copy this code to fool around with it, don’t forget to replace the singles quotes with normal single quotes as you may be getting strange replacement characters over the internet.

SQL Server Performance and the IN clause

I enjoy using sub-queries with the IN clause because it fits in my head nicely. I mean that I can translate a sub-query easily in my head with little effort whereas a join requires a little more brain gymnastics. Of course, mileage may vary for you.
Here’s an example taken from the Internet:

USE AdventureWorks ;
GO
SELECT DISTINCT c.LastName, c.FirstName
FROM Person.Contact c JOIN HumanResources.Employee e
ON e.ContactID = c.ContactID WHERE EmployeeID IN
(SELECT SalesPersonID
FROM Sales.SalesOrderHeader
WHERE SalesOrderID IN
(SELECT SalesOrderID
FROM Sales.SalesOrderDetail
WHERE ProductID IN
(SELECT ProductID
FROM Production.Product p
WHERE ProductNumber LIKE ‘FW%’)));
GO

This appeals to my way of thinking: the employeeIDs from the SalesOrderHeader table and orderID from the SalesOrderDetail and ProductID like a certain product number. The bucket brigade up the list of nested queries makes eminent sense.
Unfortunately, the SQL Server database engine isn’t human and doesn’t ‘think’ like me either. It’s inner workings, like all relational databases, are based on Set Theory from the abstract world of mathematics.
So if efficiency is what you’re after then use joins. Even if I only remove one of the sub-queries above and make it a join I get a performance boost:

USE AdventureWorks ;
GO
SELECT DISTINCT c.LastName, c.FirstName
FROM Person.Contact c JOIN HumanResources.Employee e
ON e.ContactID = c.ContactID WHERE EmployeeID IN
(SELECT SalesPersonID
FROM Sales.SalesOrderHeader
WHERE SalesOrderID IN
(SELECT SalesOrderID
FROM Sales.SalesOrderDetail sod, Production.Product p
WHERE sod.ProductID = p.ProductID
and p.ProductNumber LIKE ‘FW%’));
GO

Here the last sub-query is made into a join. Try it for yourself with your own T-SQL code.
As I’ve heard said before. The best optimization is done using brain power rather than computing power.

Tuning SQL Server Queries

As everyone who has worked with a useful database for any length of times knows… they get bigger. And as they get bigger it takes longer to squeeze information or data out. So what’s the problem? You might think it’s the fault of the computer…. blame the system! Heck! I’ve done it. But if you are seeing noticeable degradation of your queries in a short period of time then, 9 times out of 10, you gots to fix da code man.
And that takes experience and learning.
One of the best sources I can give you for really learning the ins and outs of Transact SQL (the Microsoft flavour of SQL in SQL Server) then there’s one book you should look at: Ken Henderson’s The Guru’s Guide to Transact-SQL. It is, hands down, the best book on practical SQL that I use. It never leaves my desk and my only wish is that Ken would write an up dated version. I would buy it right away.
But there are simple things you can do to get a good idea about where your problem areas are. The two settings I like to use are the statistics io and time settings. These really help, especially the first one, to give you a clear idea of the processing ‘cost’ of a database query. The basic commands are:
set statistics io on
and
set statistics time on

Turning them off is easy: just substitute the on with off. Once you’ve done that they’re on for your entire session. When you execute a query you’ll get some additional processing information but, in my opinion, the most important performance counter in them is the logical reads number. This is the number of 8K pages of information it read from the data cache to give you your result. Of course, the less pages, the better. It’s also a useful value because it is independent of a lot of the other ‘background’ noise like other users and locking that most of our queries have to deal with in the read database world. As long as you are returning the same rows of data and using the same command this value should also remain very close to the same amount.
Try it.