• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 742
  • Last Modified:

Slow query due to missing statistic - help on whether to rebuild stats

Situation: 8 million row table1 with 3 column concatentated clustered index (non-unique, not primary key):

table1 (c1, c2, c3, ...)

select count(*) from table1 where c1=1234
500,000 rows in 1 second

select count(*) from table1 where c1=2345
600,000 rows in 60 seconds

WHY? Plans were identical, but query analyzer indicated that the top query would take 98% of the time even though it was 60 times faster. Really weird, I know (though not for SQL Server).

Anyway, there was no statistic on C1, so I created one, and the problem went away - performance to sub-second for both queries and query optimizer showing 50% of time for both queries.

This 8 million row table is a small version of a table with 150 million rows. We are thinking of building the stat on the big table.

Can anyone suggest a reason this statistic helped? Do you need a statistic on the lead column of a clustered index? Any reason the performance was so slow on one but not the other?

Not a critical problem but a big puzzle, though the decision to add the statistic has to be made. One more thing - is it possible that the statistics were out of date but were not showing "out of date" in the query analyzer (all indexes are set to auto update statistics by the way).



  • 2
  • 2
2 Solutions
The stats can only affect the query engine's plan. That means rebuilding your stats must have made a difference to the query plan. Maybe the pre-rebuild plan happened to be OK for c1=1234 but was poor for c1=2345, which could happen if there had beena lot of change for the c1=2345 data.

Rebuilding your stats is a good thing if your data has changed or there is no stat available.

When the actual running percentage of time of the queries is that different from the plan as yours was, is a key indicator that statistics would help.
Pre stats: In the plan, sql prossibly thought that the data for query part 2 may still be cached, so faster. But after it ran the first one, it found this not the case and the second one ran differently.
Post stats: sql knew what the distribution of the data was so it knew more accurately the best way to approach searching for the data and was able to do so faster.

You dont need stats on fields you dont search by. But if you want the 'intellegence' that was built into determining best route, then SQL will perform best with complete information.

Think of it this way. You're planning on driving somewhere new, and you look at a map and pick the shortest map route - thinking it will be the fastest, then find out it takes you an extra 3 hours because the shortest route has major construction. But say you're planning the same trip and along with your map, you have a traffic report that says 'major delays on route b due to construction' then you have enough information to truely pick the fastest route. Simple analogy, but it gives you an idea.
AaronAbendAuthor Commented:
msg - I agree with your theory, and I just double checked - The plans for this query are exactly the same. It is the plans for more sophisticated queries that led to this analysis that are actually slower. These queries do perform the same, but a query with a join to another large table does not.  

Following up on some ideas in Chris's note, I checked the plan more carefully and noticed the estimated row count for the good query was 1 row while the bad query showed estimated row count of 500,000!

Order of queries had no impact. one particular constraint was always slower.

More Info: I build stats using default sample and NOTHING improved. I "sampled" the entire table and it fixed the problem. Ironically the query that returned the MOST rows (600K) ran FASTER.

Also, these queries are not inside procedures, so plans are regenerated on a regular basis (data changes constantly).

Bottom line: Whatever process determines estimated row count is seriously wrong. Any additional thoughts would be appreciated! Thanks!

AaronAbendAuthor Commented:
I just realized - the query plan that showed the higher row count generated the best plan because the data for the plan was essentially correct. The fact that there were more records being returned by that query was irrelevant.

I have found postings elsewhere that indicate that the processes that maintain stats are not always effective (http://www.sql-server-performance.com/statistics.asp)

I have awarded the points as I promised but more comments would be welcome.


>Whatever process determines estimated row count is seriously wrong
Not necessarily. As I said, it could be that you have a very uneven distribution of index values in the data, leading, through bad luck, to a very poor sample for stats calculation.
The fact that rebuilding the stats over the full set eliminates the problem supports that idea.
For more details:
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now