A common scenario in database development is optimizing a query to make it perform better. Either, it's been identified as having poor performance or it's a new query that you have or can see multiple ways of achieving the same result.
So you set about comparing the two versions of the query to see which appears to perform best. This article aims to give some pointers as to how to fairly and consistently compare queries to provide a reliable basis for performance comparison.
1. What do I need to look at to see how a query is performing?
Core statistics things to be interested in include:
a) duration of the query (at the end of the day, everyone wants a query that runs quickly!)
b) number of reads
d) number of writes
These can be monitored using SQL Profiler. I will assume you are familiar with SQL Profiler; if not, a quick Google on it will provide you with plenty of guides on how to use it and set up traces on a database.
Another important factor is the execution plan. To view this in SQL Server 2005, with your query open in a tab in SQL Server Management Studio, go to "Query" in the top menu and select "Include Actual Execution Plan". In SQL Server 2000 Query Analyzer, go to "Query" and select "Show Execution Plan". When the query finishes running, the execution plan will be shown in a separate results tab.
I won't go into detail as to how to analyze the execution plan, or specifically how to tune a query as those are separate topics deserving their own articles. This article is here to try to help ensure you get a fair comparison of performance.
2. What things should I be aware of when comparing multiple versions of a query?
You need to take into account that SQL Server caches data and execution plans. Let's say you run a query for the first time and it takes 30 seconds to return the results. You run the query again. This time it takes 2 seconds. Was the first time it ran a one-off "glitch? No, more than likely not. The reason it runs quicker is because SQL Server caches data in memory and caches execution plans. So the second time it runs, relevant data is already available in the cache and is much quicker to return from memory. Likewise, SQL Server realises it has an execution plan for that query that was generated for the first time the query was run, and that it can reuse so it does.
To see what plans are in the cache, you can run the following script, substituting in your database name (compatible with both SQL Server 2000 and 2005):
SELECT c.sql, c.cacheobjtype, c.objtype, d.name AS DBName, c.usecounts AS NumberOfTimesUsed, c.*
FROM master..syscacheobjects c
JOIN master..sysdatabases d ON c.dbid = d.dbid
WHERE d.name = 'NameOfYourDatabase'
No imagine you have your two versions of a query to compare performance of: Query A and Query B.
You run Query A. Then you run Query B straight after. Query B may well run quicker, but it's not necessarily because it's more efficient is it will be gaining an unfair advantage by having data already in the SQL Server cache.
3. How do I ensure a fair comparison?
Between each run of a query, you should clear out the cache. This can be done using the script below (SQL 2000 and SQL 2005).
*** NOTE: Do not do this on your production server. Recommended you only do this on your development server! ***
-- Remove all plans from the cache
-- Clear down the data cache
Now you have a fair, level playing field for testing each query. You then have the obvious environmental factors to consider - i.e. try to make sure the load on the server by other processes is consistent as if you run query A when nothing else is happening on the server, and query B when there are 10 other processes hitting the databse hard you could skew the results.
It's a good idea to test each query a few times each, clearing the cache each time to take an average recording of the stats. Also, after the last clean run you can run the query straight away again without clearing the cache first to see what it's like when it has cached data/plan available.