Link to home
Start Free TrialLog in
Avatar of g-spot
g-spot

asked on

SQL query times out. Is it too complex?

The following SQL query times out for some reason. It doesnt look too complex but perhaps the way it is structured means it cannot be used successfully.

Basically the query looks for EmailAddresses from one table, ensures they're not in another table and also that theyre not in the result of a subquery.

Doesnt sound too bad, but for some reason it times out. I know the constituent parts of the query (such as the sub query) work OK.

The COALESCE is in there because the EmailAddress fields may contain null values. The collate statement is there because the databases for some reason use a different collation setting.
SELECT     TempBroadcast.EmailAddress, TempBroadcast.FirstName, TempBroadcast.LastName, TempBroadcast.FirstName + ' ' + TempBroadcast.LastName AS FullName
FROM         TempBroadcast LEFT OUTER JOIN
                      EmailOptOut ON TempBroadcast.EmailAddress = EmailOptOut.EmailAddress collate SQL_Latin1_General_CP1_CI_AS 
WHERE     (EmailOptOut.EmailAddress IS NULL) AND TempBroadcast.EmailAddress NOT IN (
SELECT        COALESCE(XXX.dbo.Quotes.EmailAddress, '') collate SQL_Latin1_General_CP1_CI_AS AS EmailAddress
FROM             XXX.dbo.Quotes
UNION
 SELECT        COALESCE(XXX.dbo.EssentialClicks.EmailAddress, '') collate SQL_Latin1_General_CP1_CI_AS AS EmailAddress  
FROM             XXX.dbo.EssentialClicks
UNION
SELECT        COALESCE(XXX.dbo.Customers.EmailAddress, '') collate SQL_Latin1_General_CP1_CI_AS AS EmailAddress
FROM             XXX.dbo.Customers)

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Blackninja2007
Blackninja2007

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Check the execution plan for the query, see how that looks. Do you have suitable indexes? Look for where the time is being spent in the execution plan.

You may also want to try using a NOT EXISTS clause instead of a NOT IN clause (see e.g. http://www.themssforum.com/SQLServerDev/EXISTS-INNER/) as I believe this could bre more efficient (possibly depends on your exact environment, indexes etc)
Avatar of g-spot
g-spot

ASKER

Hi BN and ATL

I'm not sure what you mean by check the execution plan? I'm running SQL Server 2008 Express so I dont think I have access to that kind of information.

I'm fairly sure I dont have indexes set up on the email address columns so i will do that first.

Thanks.
Have you checked the Execution Plan, this is the best way to check where the performance hit lies.
I haven't used SQL Server 2008 Express, but I would have thought you have the option to see the execution plan. In SQL Server 2005 management studio, you'd write your query and then in the toolbar go to: Query->Include Actual Execution Plan

Then when you run the query, you'll get an extra tab alongside the results, containing a diagram of the execution plan. This will show you where the time is being spent. Things to look out for are: Table Scans (Bad) - if you don't think you have indexes, then you will have table scans which aren't efficient as it has to scan through every row in the table looking for a match. "Index Scans" are better, but generally speaking you want to be seeing "index seeks".

So, adding an index to (e.g.) email address in each table if you haven't already should make a good impact, as it will result in Index Scan/Seek instead of Table Scan
Avatar of g-spot

ASKER

OK, Thanks. I used the execution plan and ran a revised query that did not have any unions in the sub query (the sub query searched in just one table - "Quotes")

The vast majority of the time (76%) was spent on a Clustered Index Scan on the "Quotes" table (its about 200,000 records)

The only index on the Quotes table is a primary key based on an ID field. Should I index the "EmailAddress" column
Yes, index the EmailAddress column in TempBroadcast, EmailOptOut and Quotes tables. (Also in the other tables in the UNION when you add those back in).

High % is not necessarily bad - it's all relative to how much time the actual query took....i.e. 76% of 16ms is not a lot :)
Avatar of g-spot

ASKER

OK. All indexed.

So from 3+ minutes were down to 2 seconds.

Took a little while to work out to index the EmailAddress column in TempBroadcast as this table is created on-the-fly by importing a CSV file into SQL Server using ASP.net and the index is created using SQL Management Objects in ASP.net.

Thanks.
Avatar of g-spot

ASKER

Weird thing is... I changed the records in the ImportTemp table and it all got bogged down again.

However going with adathelad's suggestion of using the EXISTS clause sorted everything out:

The following code works perfectly every time:
SELECT     TempBroadcast.EmailAddress, TempBroadcast.FirstName, TempBroadcast.LastName, TempBroadcast.FirstName + ' ' + TempBroadcast.LastName AS FullName
FROM         TempBroadcast LEFT OUTER JOIN
                      EmailOptOut ON TempBroadcast.EmailAddress = EmailOptOut.EmailAddress collate SQL_Latin1_General_CP1_CI_AS 
WHERE     (EmailOptOut.EmailAddress IS NULL) AND NOT EXISTS (
SELECT        COALESCE(XXX.dbo.Quotes.EmailAddress, '') collate SQL_Latin1_General_CP1_CI_AS
FROM             XXX.dbo.Quotes
WHERE TempBroadcast.EmailAddress = XXX.dbo.Quotes.EmailAddress
UNION
 SELECT        COALESCE(XXX.dbo.EssentialClicks.EmailAddress, '') collate SQL_Latin1_General_CP1_CI_AS 
FROM             XXX.dbo.EssentialClicks
WHERE TempBroadcast.EmailAddress = XXX.dbo.EssentialClicks.EmailAddress
UNION
SELECT        COALESCE(XXX.dbo.Customers.EmailAddress, '') collate SQL_Latin1_General_CP1_CI_AS
FROM             XXX.dbo.Customers
WHERE TempBroadcast.EmailAddress = XXX.dbo.Customers.EmailAddress
)

Open in new window