asked on

Stored Procedure Hangs but Query Analyzer Flies

This has stumped my boss and I all day today. My stored procedure is almost 700 lines so I'm not going to post the whole thing here. What I'm going to post are the main chunks of code that I'm executing throughout. Everything runs very smooth, and very quick, until I hit one part of the execution. Here is what I have running...

--I create a temp table that ends up holding a list of unique zip codes and then I run some small chunks of code to add in zips from multiple sources...

Elapsed Time: instant + a couple of seconds to add the zip codes

CREATE TABLE [dbo].[#tblJobGeos] ([zip_cd] [char] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,[count_table] [tinyint] NOT NULL) ON [PRIMARY]
CREATE UNIQUE CLUSTERED INDEX [tblJobGeos_5251_sx] ON [dbo].[#tblJobGeos]([zip_cd]) WITH IGNORE_DUP_KEY ON [PRIMARY]

--I then create a custom view of all the tables that I need for the zips selected (for this example I only need table 6 out of my 10 total tables)...

Elapsed Time: instant

CREATE VIEW vKB_Count_5251 AS SELECT * FROM kBase..tblCount6_AD

--I then create the first table of qualifying records...

Elapsed Time: 6 seconds

SELECT t1.pKey, t1.hh_nbr, t1.zip_cd
INTO tblCount_5251_1_G1
FROM vKB_Count_5251 AS t1
WHERE t1.vp_flag IN ('A', 'B', 'C', 'D')
AND t1.demo_cd = 2000
AND t1.demo_value BETWEEN '18' AND '25'
AND EXISTS (SELECT *
FROM #tblJobGeos AS t3
WHERE t1.zip_cd = t3.zip_cd)

--I then create the second table of qualifying records but these records have to join to my first table in order to qualify...

Elapsed Time: 9-10 seconds

SELECT t1.pKey, t1.hh_nbr, t1.zip_cd
INTO tblCount_5251_2_G1
FROM vKB_Count_5251 AS t1
JOIN tblCount_5251_1_G1 AS t2 On t1.pkey = t2.pKey
WHERE t1.vp_flag IN ('A', 'B', 'C', 'D')
AND t1.demo_cd = 400
AND t1.demo_value = 'F'
AND EXISTS (SELECT * FROM #tblJobGeos AS t3 WHERE t1.zip_cd = t3.zip_cd)

--I drop the first table since it no longer has all of my qualifying records...

Elapsed Time: instant

DROP TABLE tblCount_5251_1_G1

--I then reuse that first table name to once again find qualifying records (you'll notice the only difference here is that I have an OR statement for demo values)...

Elapsed Time: 5 minutes 30 seconds using Stored Procedure ***HERE IS MY ISSUE

Elapsed Time: 3 seconds when I print the code from the stored procedure and run it in Query Analyzer

SELECT t1.pKey, t1.hh_nbr, t1.zip_cd
INTO tblCount_5251_1_G1
FROM vKB_Count_5251 AS t1
JOIN tblCount_5251_2_G1 AS t2 On t1.pkey = t2.pKey
WHERE t1.vp_flag IN ('A', 'B', 'C', 'D')
AND EXISTS (SELECT * FROM #tblJobGeos AS t3 WHERE t1.zip_cd = t3.zip_cd)
AND((t1.demo_cd = 1501 AND t1.demo_value = 'Y')
OR (t1.demo_cd = 1518 AND t1.demo_value = 'Y')
OR (t1.demo_cd = 1517 AND t1.demo_value = 'Y')
OR (t1.demo_cd = 1503 AND t1.demo_value = 'Y') )

--I then create a final table that will hold my uniqe records since I'm ignoring duplicate keys with my index...

Elapsed Time: instant

CREATE TABLE [dbo].[tblCount_5251_G1_Final] ([pKey] [int] NOT NULL , [hh_nbr] [int] NOT NULL , [zip_cd] [char] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ) ON [PRIMARY]
CREATE UNIQUE INDEX [tblCount_5251_G1_Final_sx] ON [dbo].[tblCount_5251_G1_Final]([hh_nbr]) WITH IGNORE_DUP_KEY ON [PRIMARY]

--I then insert the records into the new table giving me a unique list of records...

Elapsed Time: 5-10 seconds

INSERT INTO tblCount_5251_G1_Final SELECT * FROM tblCount_5251_1_G1 AS t1

--I then can get my total count...

SELECT COUNT(*) AS Total FROM tblCount_5251_G1_Final AS t1

Can anyone tell me why Query Analyzer will run the OR statement in 3 seconds like it should but when the stored procedure fires that code it takes over 5 minutes?

I've seen articles on parameter sniffing but that didn't help as I'm only passing in one parameter that isn't used throughout the SP. I've tried to recompile, that didn't work. I've even dropped the original database and recreated everything from scratch, but no luck there.

There is one thing I've yet to try since I can't do it during business hours, but I do need to upgrade SQL Server 2000 with Service Pack 4. I'm going to try that this weekend, but I'm not even sure that is the issue.

Anyways, I hope I have enough info here. If not let me know what else is needed.

-JP

ASKER CERTIFIED SOLUTION

Lowfatspread

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

AaronAbend

then reuse that first table name

The parser might have a problem with this... Remember it compiles the sp. And it may get confused.

Can you just change the name and see what happens?

rw3admin

first run all the statements inside the proc at one go in QA, you might see same performance problems as through stored proc then I would say re-start SQL services your TempDB and transaction logs are getting big cause you are doing lots of Select Into...
second wherever you are creating a temp table do you have to define NOT NULL for columns?, if not try using explicit NULL for columns, although this would mean you have to sacrifice your unique clustered indexes

rw3admin

jmp09

ASKER

--so this
--SELECT t1.pKey, t1.hh_nbr, t1.zip_cd
--INTO tblCount_5251_1_G1
--FROM vKB_Count_5251 AS t1
--JOIN tblCount_5251_2_G1 AS t2 On t1.pkey = t2.pKey
--WHERE t1.vp_flag IN ('A', 'B', 'C', 'D')
--AND EXISTS (SELECT * FROM #tblJobGeos AS t3 WHERE t1.zip_cd = t3.zip_cd)
--AND((t1.demo_cd = 1501 AND t1.demo_value = 'Y')
-- OR (t1.demo_cd = 1518 AND t1.demo_value = 'Y')
-- OR (t1.demo_cd = 1517 AND t1.demo_value = 'Y')
-- OR (t1.demo_cd = 1503 AND t1.demo_value = 'Y') )
--
--is DYNAMIC SQL ?

Yes, I have an EXEC(@SQL) running that code

--have you tried generating it as
--
--SELECT t1.pKey, t1.hh_nbr, t1.zip_cd
--INTO tblCount_5251_1_G1
--FROM vKB_Count_5251 AS t1
--JOIN tblCount_5251_2_G1 AS t2 On t1.pkey = t2.pKey
--WHERE t1.vp_flag IN ('A', 'B', 'C', 'D')
--AND EXISTS (SELECT * FROM #tblJobGeos AS t3 WHERE t1.zip_cd = t3.zip_cd)
--and t1.demo_value = 'Y'
--AND t1.demo_cd in (1501, 1518 ,1517,1503)

No, because the demo value will not always be 'Y'

--if it is dynamic code then the execution in QA will be faster as all your objects are more stable and
--can be parsed more reliably, plus presumably the data is/was in memory when you re-executed it.

I understand that, but why do my first sets of dynamic code run in under 10 seconds each?

--creating and dropping tables within an SP isn't usually a good idea...
--are these physical tables or temporary ones?
--have you considered using table variables instead?
--do you create indexes on the tables?

Point noted.
These tables are created by the SELECT INTO using tblCount + job number + 1 or 2 (depending on which one is being replaced) + G# (represents a group number). I guess that makes them physical tables, but they are eventually dropped since I only end up with the _Final table.
Not sure what you are referring to about table variables.
I do not create indexes on the tblCount_5251_1_G1 or tblCount_5251_2_G1 tables. Only on the tblCount_5251_G1_Final table.

--how many rows on each table?
It varies depeding on how many geographic regions are queried. For this example I go from:

187,659 to 106,502 to 34,719 to my final of 19,569

jmp09

ASKER

--then reuse that first table name
--
--The parser might have a problem with this... Remember it compiles the sp. And it may get confused.
--
--Can you just change the name and see what happens?

I can try this, but I didn't think it was a problem since it works when I do not have any OR demo/demo values.

Scott Pletcher

First, don't see a need for the t2 table in the query, try removing it.

SELECT t1.pKey, t1.hh_nbr, t1.zip_cd
INTO tblCount_5251_1_G1
FROM vKB_Count_5251 AS t1
--line deleted
WHERE t1.vp_flag IN ('A', 'B', 'C', 'D')
AND EXISTS (SELECT * FROM #tblJobGeos AS t3 WHERE t1.zip_cd = t3.zip_cd)
AND (t1.demo_cd IN (1501, 1518, 1517, 1503) AND t1.demo_value = 'Y'))

Next, try changing the EXISTS to an INNER JOIN. [Did them separately so you could evaluate which, if either, made a difference.]

SELECT t1.pKey, t1.hh_nbr, t1.zip_cd
INTO tblCount_5251_1_G1
FROM vKB_Count_5251 AS t1
INNER JOIN #tblJobGeos AS t3 ON t1.zip_cd = t3.zip_cd
WHERE t1.vp_flag IN ('A', 'B', 'C', 'D')
AND (t1.demo_cd IN (1501, 1518, 1517, 1503) AND t1.demo_value = 'Y'))

jmp09

ASKER

--then I would say re-start --SQLservices your TempDB and transaction logs are getting big cause you are doing lots of Select Into...

I thought SELECT INTO is preferred since it does NOT log those transactions?!?!

--second wherever you are creating a temp table do you have to define NOT NULL for columns?, if not try using explicit NULL for columns, although this would mean you have to sacrifice your unique clustered indexes

I can try as I'm not indexing until I reach the final table.

jmp09

ASKER

--First, don't see a need for the t2 table in the query, try removing it.
--
--SELECT t1.pKey, t1.hh_nbr, t1.zip_cd
--INTO tblCount_5251_1_G1
--FROM vKB_Count_5251 AS t1
--line deleted
--WHERE t1.vp_flag IN ('A', 'B', 'C', 'D')
--AND EXISTS (SELECT * FROM #tblJobGeos AS t3 WHERE t1.zip_cd = t3.zip_cd)
--AND (t1.demo_cd IN (1501, 1518, 1517, 1503) AND t1.demo_value = 'Y'))

It is needed. The reason I join these tables is because all of the demos and demo values have to apply to each record pulled.

--Next, try changing the EXISTS to an INNER JOIN. [Did them separately so you could evaluate which, if either, made a difference.]

I will try that later tonight or this weekend sometime and report back.

Scott Pletcher

The query doesn't reference T2 anywhere. But you would still need it if you need to drop rows from t1 that don't have a match in t2. Otherwise, I can't see how it's needed.

jmp09

ASKER

--The query doesn't reference T2 anywhere. But you would still need it if you need to drop rows from t1 that don't have a match in t2. Otherwise, I can't see how it's needed.

Maybe I should explain what I'm doing to see if it is needed...

The first table (tblCount_5251_1_G1) finds everyone in my geogrpahic area whose age is between 18 and 25...

SELECT t1.pKey, t1.hh_nbr, t1.zip_cd
INTO tblCount_5251_1_G1
FROM vKB_Count_5251 AS t1
WHERE t1.vp_flag IN ('A', 'B', 'C', 'D')
AND t1.demo_cd = 2000
AND t1.demo_value BETWEEN '18' AND '25'
AND EXISTS (SELECT *
FROM #tblJobGeos AS t3
WHERE t1.zip_cd = t3.zip_cd)

The second table (tblCount_5251_2_G1) gets all the females in my geographic area but I join back to first table (tblCount_5251_1_G1) to make sure these females are between 18 and 25...

SELECT t1.pKey, t1.hh_nbr, t1.zip_cd
INTO tblCount_5251_2_G1
FROM vKB_Count_5251 AS t1
JOIN tblCount_5251_1_G1 AS t2 On t1.pkey = t2.pKey
WHERE t1.vp_flag IN ('A', 'B', 'C', 'D')
AND t1.demo_cd = 400
AND t1.demo_value = 'F'
AND EXISTS (SELECT * FROM #tblJobGeos AS t3 WHERE t1.zip_cd = t3.zip_cd)

I then drop the first table (tblCount_5251_1_G1). I'm doing this to save space more than anything even though both tables will be dropped at the end of SP.

Then I recreate the first table getting all people who have Astrology OR Best Seller OR Sci Fi OR Romance reading interests. I'm joining to my second table because I want these people who have these reading interests to be females who are 18 to 25...

SELECT t1.pKey, t1.hh_nbr, t1.zip_cd
INTO tblCount_5251_1_G1
FROM vKB_Count_5251 AS t1
JOIN tblCount_5251_2_G1 AS t2 On t1.pkey = t2.pKey
WHERE t1.vp_flag IN ('A', 'B', 'C', 'D')
AND EXISTS (SELECT * FROM #tblJobGeos AS t3 WHERE t1.zip_cd = t3.zip_cd)
AND((t1.demo_cd = 1501 AND t1.demo_value = 'Y')
OR (t1.demo_cd = 1518 AND t1.demo_value = 'Y')
OR (t1.demo_cd = 1517 AND t1.demo_value = 'Y')
OR (t1.demo_cd = 1503 AND t1.demo_value = 'Y') )

I hope that makes a more sense on why I need that join.

Once again though, I'm still stumped on why my OR statement runs in 3 seconds in QA, but it takes over 5 minutes when I run it in the SP.

jmp09

ASKER

I should also mention that the final output has been correct on every count. My only issue is that it's taking 5-6 minutes when it should be done in 30-40 seconds.

jmp09

ASKER

Here's the update and the fix that we found:

We tried all the suggestions above and none of them really gave us any major time improvements. I also upgraded to Service Pack 4, but as expected that wasn't the cause of the problem. What we ended up doing was treating all of our OR demos like a normal AND demo. Instead of SELECT INTO for each one however, we looped through via our cursor and INSERTED INTO the same table for each different OR demo code. We then were able get a unique count at the end using that new table. The count ended up finishing in 34 seconds!!!

So what we ended up learning is that for some reason the SP did not like all of the OR statements, or had a harder time with the data the QA did, in our original statement.

Points are going to lowfatspread since his answer had the most suggestions and was a little more in depth compared to the others.

Thanks for your input everyone!