• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 701
  • Last Modified:

Auto-Update Statistics versus Update Statistics - are they the same?

If I run update statistics (with a sample) it samples a "representative sample" of the data.

If I turn autoupdate statistics on, does that simply run "update statistics" when the threshholds (roughly 20% rows changed) are reached, OR, does it sample the changed data to determine how the stats should be modified.

The question is important because my client has been running with autostats OFF for some time, and the question is whether we should recalc statistics at 100% before turning autoupdate on.

I will award extra points if someone can tell me whether the 20% factor can be changed
0
AaronAbend
Asked:
AaronAbend
  • 9
  • 9
  • 4
2 Solutions
 
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
AFAIK autoupdate triggers an update statistics which does "only" rescan by sampling. If you perform a complete scan you should not use autoupdate statistics.
0
 
AaronAbendAuthor Commented:
You mean - if I do a complete scan on a regular basis.  If I do a full scan today but do not plan on doing another one for a month, auto update should be turned on as far as I know.

0
 
arbertCommented:
That internal factor that forces the update of the statistics can't be changed....

How much data is in the database?  
0
What Kind of Coding Program is Right for You?

There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.

 
AaronAbendAuthor Commented:
Thanks Brett...

The database has a total of 1 billion rows - 3 tables are 200million rows each (an issue I am trying to address separately). The bulk of the rest is distributed among 6 80 million row tables and a reasonable bunch of small tables. DB is a bit overnormalized, but that is a long term issue. The data has a useful life of 180 days max, so we have to purge on a regular basis.

They were defragging indexes every day. I have suggested they simply update stats on a regular basis rather than defragging (I'd be interested in your take on that).

0
 
arbertCommented:
dang!  I thought I had some big stuff....So how many updates take place on this data?  Is it pretty static like a data warehouse?
0
 
AaronAbendAuthor Commented:
Actually it is a lot of inserts and updates. After 180 days, virtually all of the data in the largest 6 tables turns over.

Most of the records have a startdate and an enddate (necessary given the nature of the business). The enddate goes in null and gets updated when the transaction is completed. Most of the updates are based on the primary key (prior DBA put clusters on all PKs, all of which are IDENTITY columns - I am recommending clusters that support important queries and nonclustered PK indexes). Deletes/purges are based on dates.

Can you validate (T or F is fine) the following statements (these guys are really nervous - I have only been consulting here about 2 weeks):

1. Turning autoupdate statistics ON at the database level has no immediate impact. The impact will be felt when 20% threshholds are reached for individual tables, at which time the stats will be updated

2. We can turn off autoupdate and stop that from happening at any time (I know the answer but they keep asking me!)

3. Is there a way to modify an index to turn off autoupdate (for selected tables)? Does MSSQL have an Alter Index... command?  I have not seen any documentation but it would be nice.

Thanks!

Aaron
0
 
arbertCommented:
1.  True and false (lol).  You won't have any overhead until the 20% th kicks in.  HOWEVER, having the statistics constantly updated (theoretically) will give you the best query plans possible and should keep your query plans/run times consistant.

2.  True and false (lol again).  You will get rid of the overhead when the stats build, but you can  also get lousy query plans.

3.  No, you can't.

Do you have partitioning on any of the above tables--that would help in this situation a lot.  You could also turn off the stats and update them every night.  We haven't really found an instance where the overhead of stats creation is a totally killer though (as long as you aren't doing a sample of 100%).

Brett
0
 
AaronAbendAuthor Commented:
So, the original question - is there any difference between running autoupdate and doing it manually when it is due (at 20%). There is a feeling here that auto update is smart because it samples the changed data, whereas a manual stats update is not smart since it looks at all of the data....

(my last "one more thing..")
0
 
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
It#s not more than a ffeeling ... Autoupdate does perform a normal sampling, ignoriing the "changed bitmap".
0
 
AaronAbendAuthor Commented:
Clemens, I think you are correct. Do you have a reference on that statement handy? If so post it. Thanks to both of you.

0
 
arbertCommented:
"because it samples the changed data, whereas a manual stats update is not smart since it looks at all of the data...."

That's totally false--it's a random sampling.  SQL Server doesn't know what data change (just at the table level), it just knows the threshhold has been hit...
0
 
arbertCommented:
"That's totally false--it's a random sampling"  (look at profiler when both methods run and see how the stats are updated--that will show you the definitive answer ) :)
0
 
AaronAbendAuthor Commented:
Excellent. Thanks again.
0
 
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
SQLServer DOES know which records were changed, it maintains a bitmap therefor, which is used for replication.
[Ken Henderson: The Guru's Guide to SQL Server Architecture and Internals]

Theoretically this bitmap could be used to update statistics, but this would lead to overemphasis of new data.
For example, if you have one million rows scanned, and change 200 000 of them, those changed rows would have a five times higher influence then the other 800 000...

By the way: Often there is no statistical relevant change while updating non-key attributes; in this case an update of statistics will not provide you with better query plans.
0
 
AaronAbendAuthor Commented:
I agree that SQL knows what records are changed, but why can't this information be used to modify distributions in statistics? There is absolutely no mathematical reason that the new data could not be used to update statistics assuming the orginal scan was 100%.  

As for "often there is not statistical relevant change while updating non-key attributes" I disagree.  How many databases use current dates? Almost every database I have worked with has them, and they represent the new data, and without updates of the statistics new dates are not included in the statistics, right? (Correct me if I am wrong about that).  And those dates are frequently used as constraints. In my specific situation, the most recent data is 10 times more important than old data - and the stats do not support its retrieval it seems.

On one query I am working on the optimizer consistently chooses a plan that it predicted would be 5 times faster than my forced plan. In fact, my query was 8 times faster - the optimizer was off by a factor of 40. I am not at all sure this is a statistics problem since I generated full statistics for all tables and got the same results. I believe the real problem is in poor indexing (which is a much more common problem than bad stats if you ask me).
0
 
arbertCommented:
" which is used for replication."


Exactly--for replication.....
0
 
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
Statistics on compound indexes (containing more than one column) are stored as histogramm for the first column only. For the remaining columns SQLServer does only store a density value ("how much rows I will get with the same column value?").
From the viewpoint of maintaining statistics it's better to have many single-column-indexes than having a single, "big" one. It's said that SQLServer can effectly combine index results even if one index would not cover the query (it will calculate higher costs, though).
And: if an index will give you more than 10 % on a single key value, this index will most likely not be used.

Ad update statistics: I suggest switching of autoupdate, and maintain manually full-scan statistics every night or alike.
0
 
AaronAbendAuthor Commented:

"3. Is there a way to modify an index to turn off autoupdate (for selected tables)? Does MSSQL have an Alter Index... command?  I have not seen any documentation but it would be nice. "

Correct answer is: yes - sp_autostats can do this... (syntax is not alter index...but the effect is the same)

AA
0
 
arbertCommented:
Sorry, I'm totally confused that you said index--the original question was talking about update column statistics--not index statistics.  Why would you want to disable your index statistics?  That's a little bit different that disabling "regular" column statistics
0
 
AaronAbendAuthor Commented:
Index statistics versus column statistics? As far as I know, the only difference is that there are some stats that are not built to support indexes - they are usually built ad-hoc when someone hits a column that does not have any indexing and SQL decides it needs to know more before it continues. But I regard them as the same for all practical purposes. Is that incorrect? Are there more subtleties than that? I am going to post this as a  new question by the way.
0
 
arbertCommented:
Index statistics are always maintained by SQL Server--100% sampling (it is the index)...Column statistics are based upong the sampling rate (unless you specify full scan) and aren't mantained (until, of course, the threshhold is met and an auto update would kick in)
0
 
arbertCommented:
I'm sure you already saw this link:

http://www.sql-server-performance.com/statistics.asp
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

What Kind of Coding Program is Right for You?

There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.

  • 9
  • 9
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now