Solved

Auto-Update Statistics versus Update Statistics - are they the same?

Posted on 2004-08-25
22
671 Views
Last Modified: 2008-02-01
If I run update statistics (with a sample) it samples a "representative sample" of the data.

If I turn autoupdate statistics on, does that simply run "update statistics" when the threshholds (roughly 20% rows changed) are reached, OR, does it sample the changed data to determine how the stats should be modified.

The question is important because my client has been running with autostats OFF for some time, and the question is whether we should recalc statistics at 100% before turning autoupdate on.

I will award extra points if someone can tell me whether the 20% factor can be changed
0
Comment
Question by:AaronAbend
  • 9
  • 9
  • 4
22 Comments
 
LVL 68

Expert Comment

by:Qlemo
ID: 11894479
AFAIK autoupdate triggers an update statistics which does "only" rescan by sampling. If you perform a complete scan you should not use autoupdate statistics.
0
 
LVL 10

Author Comment

by:AaronAbend
ID: 11894569
You mean - if I do a complete scan on a regular basis.  If I do a full scan today but do not plan on doing another one for a month, auto update should be turned on as far as I know.

0
 
LVL 34

Expert Comment

by:arbert
ID: 11895143
That internal factor that forces the update of the statistics can't be changed....

How much data is in the database?  
0
 
LVL 10

Author Comment

by:AaronAbend
ID: 11895718
Thanks Brett...

The database has a total of 1 billion rows - 3 tables are 200million rows each (an issue I am trying to address separately). The bulk of the rest is distributed among 6 80 million row tables and a reasonable bunch of small tables. DB is a bit overnormalized, but that is a long term issue. The data has a useful life of 180 days max, so we have to purge on a regular basis.

They were defragging indexes every day. I have suggested they simply update stats on a regular basis rather than defragging (I'd be interested in your take on that).

0
 
LVL 34

Expert Comment

by:arbert
ID: 11895737
dang!  I thought I had some big stuff....So how many updates take place on this data?  Is it pretty static like a data warehouse?
0
 
LVL 10

Author Comment

by:AaronAbend
ID: 11896278
Actually it is a lot of inserts and updates. After 180 days, virtually all of the data in the largest 6 tables turns over.

Most of the records have a startdate and an enddate (necessary given the nature of the business). The enddate goes in null and gets updated when the transaction is completed. Most of the updates are based on the primary key (prior DBA put clusters on all PKs, all of which are IDENTITY columns - I am recommending clusters that support important queries and nonclustered PK indexes). Deletes/purges are based on dates.

Can you validate (T or F is fine) the following statements (these guys are really nervous - I have only been consulting here about 2 weeks):

1. Turning autoupdate statistics ON at the database level has no immediate impact. The impact will be felt when 20% threshholds are reached for individual tables, at which time the stats will be updated

2. We can turn off autoupdate and stop that from happening at any time (I know the answer but they keep asking me!)

3. Is there a way to modify an index to turn off autoupdate (for selected tables)? Does MSSQL have an Alter Index... command?  I have not seen any documentation but it would be nice.

Thanks!

Aaron
0
 
LVL 34

Assisted Solution

by:arbert
arbert earned 250 total points
ID: 11896669
1.  True and false (lol).  You won't have any overhead until the 20% th kicks in.  HOWEVER, having the statistics constantly updated (theoretically) will give you the best query plans possible and should keep your query plans/run times consistant.

2.  True and false (lol again).  You will get rid of the overhead when the stats build, but you can  also get lousy query plans.

3.  No, you can't.

Do you have partitioning on any of the above tables--that would help in this situation a lot.  You could also turn off the stats and update them every night.  We haven't really found an instance where the overhead of stats creation is a totally killer though (as long as you aren't doing a sample of 100%).

Brett
0
 
LVL 10

Author Comment

by:AaronAbend
ID: 11896724
So, the original question - is there any difference between running autoupdate and doing it manually when it is due (at 20%). There is a feeling here that auto update is smart because it samples the changed data, whereas a manual stats update is not smart since it looks at all of the data....

(my last "one more thing..")
0
 
LVL 68

Accepted Solution

by:
Qlemo earned 250 total points
ID: 11896861
It#s not more than a ffeeling ... Autoupdate does perform a normal sampling, ignoriing the "changed bitmap".
0
 
LVL 10

Author Comment

by:AaronAbend
ID: 11896947
Clemens, I think you are correct. Do you have a reference on that statement handy? If so post it. Thanks to both of you.

0
 
LVL 34

Expert Comment

by:arbert
ID: 11897000
"because it samples the changed data, whereas a manual stats update is not smart since it looks at all of the data...."

That's totally false--it's a random sampling.  SQL Server doesn't know what data change (just at the table level), it just knows the threshhold has been hit...
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 34

Expert Comment

by:arbert
ID: 11897081
"That's totally false--it's a random sampling"  (look at profiler when both methods run and see how the stats are updated--that will show you the definitive answer ) :)
0
 
LVL 10

Author Comment

by:AaronAbend
ID: 11897093
Excellent. Thanks again.
0
 
LVL 68

Expert Comment

by:Qlemo
ID: 11910667
SQLServer DOES know which records were changed, it maintains a bitmap therefor, which is used for replication.
[Ken Henderson: The Guru's Guide to SQL Server Architecture and Internals]

Theoretically this bitmap could be used to update statistics, but this would lead to overemphasis of new data.
For example, if you have one million rows scanned, and change 200 000 of them, those changed rows would have a five times higher influence then the other 800 000...

By the way: Often there is no statistical relevant change while updating non-key attributes; in this case an update of statistics will not provide you with better query plans.
0
 
LVL 10

Author Comment

by:AaronAbend
ID: 11911849
I agree that SQL knows what records are changed, but why can't this information be used to modify distributions in statistics? There is absolutely no mathematical reason that the new data could not be used to update statistics assuming the orginal scan was 100%.  

As for "often there is not statistical relevant change while updating non-key attributes" I disagree.  How many databases use current dates? Almost every database I have worked with has them, and they represent the new data, and without updates of the statistics new dates are not included in the statistics, right? (Correct me if I am wrong about that).  And those dates are frequently used as constraints. In my specific situation, the most recent data is 10 times more important than old data - and the stats do not support its retrieval it seems.

On one query I am working on the optimizer consistently chooses a plan that it predicted would be 5 times faster than my forced plan. In fact, my query was 8 times faster - the optimizer was off by a factor of 40. I am not at all sure this is a statistics problem since I generated full statistics for all tables and got the same results. I believe the real problem is in poor indexing (which is a much more common problem than bad stats if you ask me).
0
 
LVL 34

Expert Comment

by:arbert
ID: 11913837
" which is used for replication."


Exactly--for replication.....
0
 
LVL 68

Expert Comment

by:Qlemo
ID: 11914575
Statistics on compound indexes (containing more than one column) are stored as histogramm for the first column only. For the remaining columns SQLServer does only store a density value ("how much rows I will get with the same column value?").
From the viewpoint of maintaining statistics it's better to have many single-column-indexes than having a single, "big" one. It's said that SQLServer can effectly combine index results even if one index would not cover the query (it will calculate higher costs, though).
And: if an index will give you more than 10 % on a single key value, this index will most likely not be used.

Ad update statistics: I suggest switching of autoupdate, and maintain manually full-scan statistics every night or alike.
0
 
LVL 10

Author Comment

by:AaronAbend
ID: 11941329

"3. Is there a way to modify an index to turn off autoupdate (for selected tables)? Does MSSQL have an Alter Index... command?  I have not seen any documentation but it would be nice. "

Correct answer is: yes - sp_autostats can do this... (syntax is not alter index...but the effect is the same)

AA
0
 
LVL 34

Expert Comment

by:arbert
ID: 11948553
Sorry, I'm totally confused that you said index--the original question was talking about update column statistics--not index statistics.  Why would you want to disable your index statistics?  That's a little bit different that disabling "regular" column statistics
0
 
LVL 10

Author Comment

by:AaronAbend
ID: 11959813
Index statistics versus column statistics? As far as I know, the only difference is that there are some stats that are not built to support indexes - they are usually built ad-hoc when someone hits a column that does not have any indexing and SQL decides it needs to know more before it continues. But I regard them as the same for all practical purposes. Is that incorrect? Are there more subtleties than that? I am going to post this as a  new question by the way.
0
 
LVL 34

Expert Comment

by:arbert
ID: 11959939
Index statistics are always maintained by SQL Server--100% sampling (it is the index)...Column statistics are based upong the sampling rate (unless you specify full scan) and aren't mantained (until, of course, the threshhold is met and an auto update would kick in)
0
 
LVL 34

Expert Comment

by:arbert
ID: 11959972
I'm sure you already saw this link:

http://www.sql-server-performance.com/statistics.asp
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

JSON is being used more and more, besides XML, and you surely wanted to parse the data out into SQL instead of doing it in some Javascript. The below function in SQL Server can do the job for you, returning a quick table with the parsed data.
The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Via a live example, show how to shrink a transaction log file down to a reasonable size.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now