?
Solved

Slow query due to missing statistic - help on whether to rebuild stats

Posted on 2004-04-06
5
Medium Priority
?
735 Views
Last Modified: 2008-01-16
Situation: 8 million row table1 with 3 column concatentated clustered index (non-unique, not primary key):

table1 (c1, c2, c3, ...)

select count(*) from table1 where c1=1234
500,000 rows in 1 second

select count(*) from table1 where c1=2345
600,000 rows in 60 seconds

WHY? Plans were identical, but query analyzer indicated that the top query would take 98% of the time even though it was 60 times faster. Really weird, I know (though not for SQL Server).

Anyway, there was no statistic on C1, so I created one, and the problem went away - performance to sub-second for both queries and query optimizer showing 50% of time for both queries.

This 8 million row table is a small version of a table with 150 million rows. We are thinking of building the stat on the big table.

Can anyone suggest a reason this statistic helped? Do you need a statistic on the lead column of a clustered index? Any reason the performance was so slow on one but not the other?

Not a critical problem but a big puzzle, though the decision to add the statistic has to be made. One more thing - is it possible that the statistics were out of date but were not showing "out of date" in the query analyzer (all indexes are set to auto update statistics by the way).

Thanks

NOTE: I WILL AWARD THE POINTS ON THIS QUESTION BEFORE 1 PM TODAY AS A THANKS FOR ANY DECENT ADVICE I CAN GET!

0
Comment
Question by:AaronAbend
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 12

Assisted Solution

by:monosodiumg
monosodiumg earned 75 total points
ID: 10766679
The stats can only affect the query engine's plan. That means rebuilding your stats must have made a difference to the query plan. Maybe the pre-rebuild plan happened to be OK for c1=1234 but was poor for c1=2345, which could happen if there had beena lot of change for the c1=2345 data.

0
 
LVL 7

Accepted Solution

by:
ChrisFretwell earned 300 total points
ID: 10766794
Rebuilding your stats is a good thing if your data has changed or there is no stat available.

When the actual running percentage of time of the queries is that different from the plan as yours was, is a key indicator that statistics would help.
Pre stats: In the plan, sql prossibly thought that the data for query part 2 may still be cached, so faster. But after it ran the first one, it found this not the case and the second one ran differently.
Post stats: sql knew what the distribution of the data was so it knew more accurately the best way to approach searching for the data and was able to do so faster.

You dont need stats on fields you dont search by. But if you want the 'intellegence' that was built into determining best route, then SQL will perform best with complete information.

Think of it this way. You're planning on driving somewhere new, and you look at a map and pick the shortest map route - thinking it will be the fastest, then find out it takes you an extra 3 hours because the shortest route has major construction. But say you're planning the same trip and along with your map, you have a traffic report that says 'major delays on route b due to construction' then you have enough information to truely pick the fastest route. Simple analogy, but it gives you an idea.
0
 
LVL 10

Author Comment

by:AaronAbend
ID: 10767512
msg - I agree with your theory, and I just double checked - The plans for this query are exactly the same. It is the plans for more sophisticated queries that led to this analysis that are actually slower. These queries do perform the same, but a query with a join to another large table does not.  

Following up on some ideas in Chris's note, I checked the plan more carefully and noticed the estimated row count for the good query was 1 row while the bad query showed estimated row count of 500,000!

Order of queries had no impact. one particular constraint was always slower.

More Info: I build stats using default sample and NOTHING improved. I "sampled" the entire table and it fixed the problem. Ironically the query that returned the MOST rows (600K) ran FASTER.

Also, these queries are not inside procedures, so plans are regenerated on a regular basis (data changes constantly).

Bottom line: Whatever process determines estimated row count is seriously wrong. Any additional thoughts would be appreciated! Thanks!

0
 
LVL 10

Author Comment

by:AaronAbend
ID: 10767726
I just realized - the query plan that showed the higher row count generated the best plan because the data for the plan was essentially correct. The fact that there were more records being returned by that query was irrelevant.

I have found postings elsewhere that indicate that the processes that maintain stats are not always effective (http://www.sql-server-performance.com/statistics.asp)

I have awarded the points as I promised but more comments would be welcome.

Thanks,

Aaron
0
 
LVL 12

Expert Comment

by:monosodiumg
ID: 10776683
>Whatever process determines estimated row count is seriously wrong
Not necessarily. As I said, it could be that you have a very uneven distribution of index values in the data, leading, through bad luck, to a very poor sample for stats calculation.
The fact that rebuilding the stats over the full set eliminates the problem supports that idea.
For more details:
http://www.sql-server-performance.com/statistics.asp
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/statquery.asp
http://www.winnetmag.com/SQLServer/Article/ArticleID/5660/5660.html
0

Featured Post

Optimize your web performance

What's in the eBook?
- Full list of reasons for poor performance
- Ultimate measures to speed things up
- Primary web monitoring types
- KPIs you should be monitoring in order to increase your ROI

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Viewers will learn how to use the UPDATE and DELETE statements to change or remove existing data from their tables. Make a table: Update a specific column given a specific row using the UPDATE statement: Remove a set of values using the DELETE s…
Suggested Courses

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question