Slow query due to missing statistic - help on whether to rebuild stats

Posted on 2004-04-06
Last Modified: 2008-01-16
Situation: 8 million row table1 with 3 column concatentated clustered index (non-unique, not primary key):

table1 (c1, c2, c3, ...)

select count(*) from table1 where c1=1234
500,000 rows in 1 second

select count(*) from table1 where c1=2345
600,000 rows in 60 seconds

WHY? Plans were identical, but query analyzer indicated that the top query would take 98% of the time even though it was 60 times faster. Really weird, I know (though not for SQL Server).

Anyway, there was no statistic on C1, so I created one, and the problem went away - performance to sub-second for both queries and query optimizer showing 50% of time for both queries.

This 8 million row table is a small version of a table with 150 million rows. We are thinking of building the stat on the big table.

Can anyone suggest a reason this statistic helped? Do you need a statistic on the lead column of a clustered index? Any reason the performance was so slow on one but not the other?

Not a critical problem but a big puzzle, though the decision to add the statistic has to be made. One more thing - is it possible that the statistics were out of date but were not showing "out of date" in the query analyzer (all indexes are set to auto update statistics by the way).



Question by:AaronAbend
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
LVL 12

Assisted Solution

monosodiumg earned 25 total points
ID: 10766679
The stats can only affect the query engine's plan. That means rebuilding your stats must have made a difference to the query plan. Maybe the pre-rebuild plan happened to be OK for c1=1234 but was poor for c1=2345, which could happen if there had beena lot of change for the c1=2345 data.


Accepted Solution

ChrisFretwell earned 100 total points
ID: 10766794
Rebuilding your stats is a good thing if your data has changed or there is no stat available.

When the actual running percentage of time of the queries is that different from the plan as yours was, is a key indicator that statistics would help.
Pre stats: In the plan, sql prossibly thought that the data for query part 2 may still be cached, so faster. But after it ran the first one, it found this not the case and the second one ran differently.
Post stats: sql knew what the distribution of the data was so it knew more accurately the best way to approach searching for the data and was able to do so faster.

You dont need stats on fields you dont search by. But if you want the 'intellegence' that was built into determining best route, then SQL will perform best with complete information.

Think of it this way. You're planning on driving somewhere new, and you look at a map and pick the shortest map route - thinking it will be the fastest, then find out it takes you an extra 3 hours because the shortest route has major construction. But say you're planning the same trip and along with your map, you have a traffic report that says 'major delays on route b due to construction' then you have enough information to truely pick the fastest route. Simple analogy, but it gives you an idea.
LVL 10

Author Comment

ID: 10767512
msg - I agree with your theory, and I just double checked - The plans for this query are exactly the same. It is the plans for more sophisticated queries that led to this analysis that are actually slower. These queries do perform the same, but a query with a join to another large table does not.  

Following up on some ideas in Chris's note, I checked the plan more carefully and noticed the estimated row count for the good query was 1 row while the bad query showed estimated row count of 500,000!

Order of queries had no impact. one particular constraint was always slower.

More Info: I build stats using default sample and NOTHING improved. I "sampled" the entire table and it fixed the problem. Ironically the query that returned the MOST rows (600K) ran FASTER.

Also, these queries are not inside procedures, so plans are regenerated on a regular basis (data changes constantly).

Bottom line: Whatever process determines estimated row count is seriously wrong. Any additional thoughts would be appreciated! Thanks!

LVL 10

Author Comment

ID: 10767726
I just realized - the query plan that showed the higher row count generated the best plan because the data for the plan was essentially correct. The fact that there were more records being returned by that query was irrelevant.

I have found postings elsewhere that indicate that the processes that maintain stats are not always effective (

I have awarded the points as I promised but more comments would be welcome.


LVL 12

Expert Comment

ID: 10776683
>Whatever process determines estimated row count is seriously wrong
Not necessarily. As I said, it could be that you have a very uneven distribution of index values in the data, leading, through bad luck, to a very poor sample for stats calculation.
The fact that rebuilding the stats over the full set eliminates the problem supports that idea.
For more details:

Featured Post

Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In the first part of this tutorial we will cover the prerequisites for installing SQL Server vNext on Linux.
A Stored Procedure in Microsoft SQL Server is a powerful feature that it can be used to execute the Data Manipulation Language (DML) or Data Definition Language (DDL). Depending on business requirements, a single Stored Procedure can return differe…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

724 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question