g-spot
asked on
SQL query times out. Is it too complex?
The following SQL query times out for some reason. It doesnt look too complex but perhaps the way it is structured means it cannot be used successfully.
Basically the query looks for EmailAddresses from one table, ensures they're not in another table and also that theyre not in the result of a subquery.
Doesnt sound too bad, but for some reason it times out. I know the constituent parts of the query (such as the sub query) work OK.
The COALESCE is in there because the EmailAddress fields may contain null values. The collate statement is there because the databases for some reason use a different collation setting.
Basically the query looks for EmailAddresses from one table, ensures they're not in another table and also that theyre not in the result of a subquery.
Doesnt sound too bad, but for some reason it times out. I know the constituent parts of the query (such as the sub query) work OK.
The COALESCE is in there because the EmailAddress fields may contain null values. The collate statement is there because the databases for some reason use a different collation setting.
SELECT TempBroadcast.EmailAddress, TempBroadcast.FirstName, TempBroadcast.LastName, TempBroadcast.FirstName + ' ' + TempBroadcast.LastName AS FullName
FROM TempBroadcast LEFT OUTER JOIN
EmailOptOut ON TempBroadcast.EmailAddress = EmailOptOut.EmailAddress collate SQL_Latin1_General_CP1_CI_AS
WHERE (EmailOptOut.EmailAddress IS NULL) AND TempBroadcast.EmailAddress NOT IN (
SELECT COALESCE(XXX.dbo.Quotes.EmailAddress, '') collate SQL_Latin1_General_CP1_CI_AS AS EmailAddress
FROM XXX.dbo.Quotes
UNION
SELECT COALESCE(XXX.dbo.EssentialClicks.EmailAddress, '') collate SQL_Latin1_General_CP1_CI_AS AS EmailAddress
FROM XXX.dbo.EssentialClicks
UNION
SELECT COALESCE(XXX.dbo.Customers.EmailAddress, '') collate SQL_Latin1_General_CP1_CI_AS AS EmailAddress
FROM XXX.dbo.Customers)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi BN and ATL
I'm not sure what you mean by check the execution plan? I'm running SQL Server 2008 Express so I dont think I have access to that kind of information.
I'm fairly sure I dont have indexes set up on the email address columns so i will do that first.
Thanks.
I'm not sure what you mean by check the execution plan? I'm running SQL Server 2008 Express so I dont think I have access to that kind of information.
I'm fairly sure I dont have indexes set up on the email address columns so i will do that first.
Thanks.
Have you checked the Execution Plan, this is the best way to check where the performance hit lies.
I haven't used SQL Server 2008 Express, but I would have thought you have the option to see the execution plan. In SQL Server 2005 management studio, you'd write your query and then in the toolbar go to: Query->Include Actual Execution Plan
Then when you run the query, you'll get an extra tab alongside the results, containing a diagram of the execution plan. This will show you where the time is being spent. Things to look out for are: Table Scans (Bad) - if you don't think you have indexes, then you will have table scans which aren't efficient as it has to scan through every row in the table looking for a match. "Index Scans" are better, but generally speaking you want to be seeing "index seeks".
So, adding an index to (e.g.) email address in each table if you haven't already should make a good impact, as it will result in Index Scan/Seek instead of Table Scan
Then when you run the query, you'll get an extra tab alongside the results, containing a diagram of the execution plan. This will show you where the time is being spent. Things to look out for are: Table Scans (Bad) - if you don't think you have indexes, then you will have table scans which aren't efficient as it has to scan through every row in the table looking for a match. "Index Scans" are better, but generally speaking you want to be seeing "index seeks".
So, adding an index to (e.g.) email address in each table if you haven't already should make a good impact, as it will result in Index Scan/Seek instead of Table Scan
ASKER
OK, Thanks. I used the execution plan and ran a revised query that did not have any unions in the sub query (the sub query searched in just one table - "Quotes")
The vast majority of the time (76%) was spent on a Clustered Index Scan on the "Quotes" table (its about 200,000 records)
The only index on the Quotes table is a primary key based on an ID field. Should I index the "EmailAddress" column
The vast majority of the time (76%) was spent on a Clustered Index Scan on the "Quotes" table (its about 200,000 records)
The only index on the Quotes table is a primary key based on an ID field. Should I index the "EmailAddress" column
Yes, index the EmailAddress column in TempBroadcast, EmailOptOut and Quotes tables. (Also in the other tables in the UNION when you add those back in).
High % is not necessarily bad - it's all relative to how much time the actual query took....i.e. 76% of 16ms is not a lot :)
High % is not necessarily bad - it's all relative to how much time the actual query took....i.e. 76% of 16ms is not a lot :)
yes
ASKER
OK. All indexed.
So from 3+ minutes were down to 2 seconds.
Took a little while to work out to index the EmailAddress column in TempBroadcast as this table is created on-the-fly by importing a CSV file into SQL Server using ASP.net and the index is created using SQL Management Objects in ASP.net.
Thanks.
So from 3+ minutes were down to 2 seconds.
Took a little while to work out to index the EmailAddress column in TempBroadcast as this table is created on-the-fly by importing a CSV file into SQL Server using ASP.net and the index is created using SQL Management Objects in ASP.net.
Thanks.
ASKER
Weird thing is... I changed the records in the ImportTemp table and it all got bogged down again.
However going with adathelad's suggestion of using the EXISTS clause sorted everything out:
The following code works perfectly every time:
However going with adathelad's suggestion of using the EXISTS clause sorted everything out:
The following code works perfectly every time:
SELECT TempBroadcast.EmailAddress, TempBroadcast.FirstName, TempBroadcast.LastName, TempBroadcast.FirstName + ' ' + TempBroadcast.LastName AS FullName
FROM TempBroadcast LEFT OUTER JOIN
EmailOptOut ON TempBroadcast.EmailAddress = EmailOptOut.EmailAddress collate SQL_Latin1_General_CP1_CI_AS
WHERE (EmailOptOut.EmailAddress IS NULL) AND NOT EXISTS (
SELECT COALESCE(XXX.dbo.Quotes.EmailAddress, '') collate SQL_Latin1_General_CP1_CI_AS
FROM XXX.dbo.Quotes
WHERE TempBroadcast.EmailAddress = XXX.dbo.Quotes.EmailAddress
UNION
SELECT COALESCE(XXX.dbo.EssentialClicks.EmailAddress, '') collate SQL_Latin1_General_CP1_CI_AS
FROM XXX.dbo.EssentialClicks
WHERE TempBroadcast.EmailAddress = XXX.dbo.EssentialClicks.EmailAddress
UNION
SELECT COALESCE(XXX.dbo.Customers.EmailAddress, '') collate SQL_Latin1_General_CP1_CI_AS
FROM XXX.dbo.Customers
WHERE TempBroadcast.EmailAddress = XXX.dbo.Customers.EmailAddress
)
You may also want to try using a NOT EXISTS clause instead of a NOT IN clause (see e.g. http://www.themssforum.com/SQLServerDev/EXISTS-INNER/) as I believe this could bre more efficient (possibly depends on your exact environment, indexes etc)