Slow query due to missing statistic - help on whether to rebuild stats

Posted on 2004-04-06
Last Modified: 2008-01-16
Situation: 8 million row table1 with 3 column concatentated clustered index (non-unique, not primary key):

table1 (c1, c2, c3, ...)

select count(*) from table1 where c1=1234
500,000 rows in 1 second

select count(*) from table1 where c1=2345
600,000 rows in 60 seconds

WHY? Plans were identical, but query analyzer indicated that the top query would take 98% of the time even though it was 60 times faster. Really weird, I know (though not for SQL Server).

Anyway, there was no statistic on C1, so I created one, and the problem went away - performance to sub-second for both queries and query optimizer showing 50% of time for both queries.

This 8 million row table is a small version of a table with 150 million rows. We are thinking of building the stat on the big table.

Can anyone suggest a reason this statistic helped? Do you need a statistic on the lead column of a clustered index? Any reason the performance was so slow on one but not the other?

Not a critical problem but a big puzzle, though the decision to add the statistic has to be made. One more thing - is it possible that the statistics were out of date but were not showing "out of date" in the query analyzer (all indexes are set to auto update statistics by the way).



Question by:AaronAbend
  • 2
  • 2
LVL 12

Assisted Solution

monosodiumg earned 25 total points
ID: 10766679
The stats can only affect the query engine's plan. That means rebuilding your stats must have made a difference to the query plan. Maybe the pre-rebuild plan happened to be OK for c1=1234 but was poor for c1=2345, which could happen if there had beena lot of change for the c1=2345 data.


Accepted Solution

ChrisFretwell earned 100 total points
ID: 10766794
Rebuilding your stats is a good thing if your data has changed or there is no stat available.

When the actual running percentage of time of the queries is that different from the plan as yours was, is a key indicator that statistics would help.
Pre stats: In the plan, sql prossibly thought that the data for query part 2 may still be cached, so faster. But after it ran the first one, it found this not the case and the second one ran differently.
Post stats: sql knew what the distribution of the data was so it knew more accurately the best way to approach searching for the data and was able to do so faster.

You dont need stats on fields you dont search by. But if you want the 'intellegence' that was built into determining best route, then SQL will perform best with complete information.

Think of it this way. You're planning on driving somewhere new, and you look at a map and pick the shortest map route - thinking it will be the fastest, then find out it takes you an extra 3 hours because the shortest route has major construction. But say you're planning the same trip and along with your map, you have a traffic report that says 'major delays on route b due to construction' then you have enough information to truely pick the fastest route. Simple analogy, but it gives you an idea.
LVL 10

Author Comment

ID: 10767512
msg - I agree with your theory, and I just double checked - The plans for this query are exactly the same. It is the plans for more sophisticated queries that led to this analysis that are actually slower. These queries do perform the same, but a query with a join to another large table does not.  

Following up on some ideas in Chris's note, I checked the plan more carefully and noticed the estimated row count for the good query was 1 row while the bad query showed estimated row count of 500,000!

Order of queries had no impact. one particular constraint was always slower.

More Info: I build stats using default sample and NOTHING improved. I "sampled" the entire table and it fixed the problem. Ironically the query that returned the MOST rows (600K) ran FASTER.

Also, these queries are not inside procedures, so plans are regenerated on a regular basis (data changes constantly).

Bottom line: Whatever process determines estimated row count is seriously wrong. Any additional thoughts would be appreciated! Thanks!

LVL 10

Author Comment

ID: 10767726
I just realized - the query plan that showed the higher row count generated the best plan because the data for the plan was essentially correct. The fact that there were more records being returned by that query was irrelevant.

I have found postings elsewhere that indicate that the processes that maintain stats are not always effective (

I have awarded the points as I promised but more comments would be welcome.


LVL 12

Expert Comment

ID: 10776683
>Whatever process determines estimated row count is seriously wrong
Not necessarily. As I said, it could be that you have a very uneven distribution of index values in the data, leading, through bad luck, to a very poor sample for stats calculation.
The fact that rebuilding the stats over the full set eliminates the problem supports that idea.
For more details:

Featured Post

Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question