Solved

select statman killing performance

Posted on 2008-10-04
20
3,543 Views
Last Modified: 2011-10-19
My problem emerges after a maintenance plan runs.  I think the plan blows away the statistics or something, but not everytime it runs (only sometimes--maybe every tenth time).  The maitenance plan runs once a month.  

The issue affects only one stored procedure.  The produre is fairly normal, but it does contain a cursor, uses the goto statement and has single temp table.

I read that I should remove all temp tables and dynamic sql from the stored procedure.  I did this (except for the one temp table I couldn't get rid of) and it didn't work.  The only thing that seems to work is time, i.e., after about three days everything goes back to normal.  The query goes from taking 300 MINUTES to run back down to its normal 30 minutes.

already 24 hours into this :-(
0
Comment
Question by:tomandlis
20 Comments
 

Author Comment

by:tomandlis
Comment Utility
Sorry, I forgot to include what I see in sql profiler.  I see the following type of message repeatedly an about five or six tables.  There seems to be no pattern in the tables in that statman is updating statistics for.  All the tables are well structured: they have primary keys, clustered indexes and regular indexes, constraints etc.

--***********************
--***********************
SELECT StatMan([SC0],[SC1]) FROM (SELECT TOP 100 PERCENT [i] AS [SC0],[j] AS [SC1] FROM [dbo].[clusterYD] WITH
(READUNCOMMITTED,SAMPLE 1.000000e+002 PERCENT)  ORDER BY [SC0],[SC1]) AS _MS_UPDSTATS_TBL
--***********************
--***********************
SELECT StatMan([SC0],[SC1],[SC2]) FROM (SELECT TOP 100 PERCENT [i] AS [SC0],[distance] AS [SC1],[j] AS [SC2] FROM
[dbo].[clusterYD] WITH (READUNCOMMITTED,SAMPLE 1.000000e+002 PERCENT)  ORDER BY [SC0],[SC1],[SC2]) AS _MS_UPDSTATS_TBL
--***********************
--***********************
SELECT StatMan([SC0]) FROM (SELECT TOP 100 PERCENT [i] AS [SC0] FROM [dbo].[clusterYV] WITH (READUNCOMMITTED,SAMPLE
1.000000e+002 PERCENT)  ORDER BY [SC0]) AS _MS_UPDSTATS_TBL
--***********************
--***********************
etc...
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
Comment Utility
could you try to add the WITH RECOMPILE option to the stored procedure?
0
 
LVL 21

Expert Comment

by:mastoo
Comment Utility
If that doesn't fix it, I would capture the "slow" execution plan, and then in a few days compare it to the fast plan.  Along with looking at the difference between them, I would guess the slow plan is going to show some area with a dramatic difference between estimated and actual row count.  This can give you insight into what is going wrong and what can be done to fix it.
0
 

Author Comment

by:tomandlis
Comment Utility
The WITH RECOMPILE looks very promising.  I'm running the test right now.  I saw this article and it describes my sproc almost exactly.  I looks like SQL has cached bad query plan.  See the link below for more info

http://articles.techrepublic.com.com/5100-10878_11-5662581.html
0
 

Author Comment

by:tomandlis
Comment Utility
Sorry, WITH RECOMPILE didn't work and I'm looking for new suggestions.  Procedure time is still 300 minutes (versus 30).

I did see something in the profiler that might be a clue.  It appears that the WITH RECOMPILE is only recompiling the statistics on the procedure (not the underlying tables).  It appears that sql is gathing stats on almost every table in the query.  Thus I am seeing the lines shown below in SQL Profiler.  

Profiler is repeatedly showing SQL Server is sampling the tables 'clusterYD', 'clusterYNN', 'clusterYV'.  These tables are what i would call 'table variables' more that tables.  That is, they describe a data structure but only abstractly.  By 'abstract', I mean sometimes when I run this procedure those tables might have tens of thousands of records and other times they might have just a few hundred.  I am guessing that this is the source of the problem, i.e., the statistics are being saved for the case where there were just a few hundred and now I'm running the case where there are tens of thousands and thus the stats are invalid.

I am guessing these underlying tables need to have their stats recompiled, but WITH RECOMPILE doesn't accomplish that.

I am going to try blowing away all the tables that STATMAN is sampling and recreating them from scratch.  There is probably an easier way to do this, but its all I can think of.


SELECT StatMan([SC0],[SC1]) FROM (SELECT TOP 100 PERCENT [i] AS [SC0],[j] AS [SC1] FROM [dbo].[clusterYD] WITH (READUNCOMMITTED,SAMPLE
1.000000e+002 PERCENT)  ORDER BY [SC0],[SC1]) AS _MS_UPDSTATS_TBL

SELECT StatMan([SC0],[SC1]) FROM (SELECT TOP 100 PERCENT [j] AS [SC0],[i] AS [SC1] FROM [dbo].[clusterYD] WITH (READUNCOMMITTED,SAMPLE
1.000000e+002 PERCENT)  ORDER BY [SC0],[SC1]) AS _MS_UPDSTATS_TBL

SELECT StatMan([SC0],[SC1],[SC2]) FROM (SELECT TOP 100 PERCENT [i] AS [SC0],[distance] AS [SC1],[j] AS [SC2] FROM [dbo].[clusterYD] WITH
(READUNCOMMITTED,SAMPLE 1.000000e+002 PERCENT)  ORDER BY [SC0],[SC1],[SC2]) AS _MS_UPDSTATS_TBL

SELECT StatMan([SC0],[SC1],[SC2]) FROM (SELECT TOP 100 PERCENT [distance] AS [SC0],[i] AS [SC1],[j] AS [SC2] FROM [dbo].[clusterYD] WITH
(READUNCOMMITTED,SAMPLE 1.000000e+002 PERCENT)  ORDER BY [SC0],[SC1],[SC2]) AS _MS_UPDSTATS_TBL

SELECT StatMan([SC0]) FROM (SELECT TOP 100 PERCENT [i] AS [SC0] FROM [dbo].[clusterYNN] WITH (READUNCOMMITTED,SAMPLE 1.000000e+002 PERCENT)  
ORDER BY [SC0]) AS _MS_UPDSTATS_TBL
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Interesting query... the readuncommitted is certainly interesting, and the use of tablesample. If you want a random selection can also use new_id().

Think that it only really works on 2005 (and upwards) because of the way table statistics are now calculated - so might need to check compatability level...

Can imagine that after an optimisation it will take a while to get a random sample set. So, not sure that the recompile will help in this case...

From books online :

You can use TABLESAMPLE to quickly return a sample from a large table when either of the following conditions is true:
  - The sample does not have to be a truly random sample at the level of individual rows.
  - Rows on individual pages of the table are not correlated with other rows on the same page.

And that last point is what is probably hurting....
0
 

Author Comment

by:tomandlis
Comment Utility
well, it appears that my attempt to hardcode individual tables (as opposed to using a generic table structure) is failing.  profiler is still loaded with 'statman' statements during execution.

mark_wills: I am selecting a random sample of rows.  The query I am using is shown below, would you please explain how I would rewrite the query?

      select top 3 val
      from dbo.[clusterYVstat_history]
      order by newid()
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Well, there are examples in TABLESAMPLE : http://technet.microsoft.com/en-us/library/ms189108(SQL.90).aspx which does it a bit differently,

A while back, after some frustrations and investigation on a very large table found this link in msdn, have a good look at : http://msdn.microsoft.com/en-us/library/cc441928.aspx   -- particularly the stats down the bottom using the different techniques.


0
 

Author Comment

by:tomandlis
Comment Utility

OK, some more information.  I plugged in the TABLESAMPLE function and got rid of the newid() to select a random sample of rows, but it appears that that isn't going to do the trick as STATMAN continues to fill the Profiler trace.  FYI, here is a description of how to use the TABLESAMPLE function:

http://technet.microsoft.com/en-us/library/ms189108(SQL.90).aspx

Hardcoding all the temp tables didn't do it either (they aren't really 'temp' tables, they are just the abstract tables that i referred to earlier.  Hardcoding the tables just resulted in STATMAN statements running on the new tables.

I also tried getting rid of the cursor in the sproc.  I switched to a 'cursorless' cursor setup which is described at the link below.  This didn't work either.

http://www.eggheadcafe.com/PrintSearchContent.asp?LINKID=529
http://www.eggheadcafe.com/articles/20010823.asp


I do have some information that might be helpful however.  I checked an old trace I had of the proc (when it was running correctly) and there is NO DIFFERENCE.  That is, the old trace was filled up with STATMAN statements too and yet it ran in 30 minutes versus 300 (as it is running now).  This leads me to believe there is something else as the root cause, i.e., it isn't STATMAN.  

As I mentioned at the beginning of this thread the performance issue began after a DB backup.  I am going to try to repeat the DB backup process and see if that does anything.  If that doesn't work I'm clueless.
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Can we see what STATMAN is doing ?
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
And I should also apologise, because I thought that the code you had in your first post was part of a stored procedure, not the results of the PROFILER....

Can we please see what you are actually doing ?
0
 

Author Comment

by:tomandlis
Comment Utility
well as quickly as it started it has stopped.  The sproc is part of a nightly job and after one week of running at 300 minutes suddenly the execution time dropped back down to 30 minutes.  I'll try to do a log forensic and let you know what happened.
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Still don't want to share that proc ? By the looks of the outputs it is (obviously) updating statistics, and traditionally see it associated with auto-update-statistics, and/or part of update statistics in a maintenance plan.  My original thought was you were trying to replicate that activity in your stored proc...

But if it is a nightly procedure, then would like to see what you are running. IE do you have statistics plans in place, creating collections, updating, dropping, using dbcc or t_sql or using the stored procedures, are you checking stats_date etc etc.

AngelIII and mastoo might be able to continue, but I cannot really advise one way or the other unless I know what is in that stored procedure.

0
 

Author Comment

by:tomandlis
Comment Utility
Mark, thanks for your help with this.  I'd share the proc, but it would likely just confuse the matter.  It's not doing anything to update SQL stats--it is merely caching some business reports during the off-peak hours so as to save processor power during the peak hours.   The reports are statisical in nature (they contain a cluster analysis), but the cluster analysis in the reports and SQL Server intrinsic stats are entirely unrelated.

I tried to analyze the logs, but I can see no differences between the good runs (30 minute runs) and the bad runs (300 minute runs).  I'm not really sure what I should be looking for, but I think I'm supposed to be looking in the 'reads' column for excessive reads.  Is that right?  
0
 
LVL 51

Assisted Solution

by:Mark Wills
Mark Wills earned 500 total points
Comment Utility
Then I would suggest you have auto stats turned on and the first few times after the "clean" it is continually looking at the stats plan given the nature of the data being accessed and has a small fit while trying to do auto -stats updtes. It would also explain why you are seeing statman. Though would also expect to see it in normal operation as well.

Slow or long elapsed time is not necessarily Disk IO alone, there can be many factors and really need to gather some statistics during that "heavy" period to see what all the resources are doing. Having said that, then if there is huge numbers of IO activity during that period, and generally speaking IO is the slowest part of the system, then yes, it is a worthy indicator (but that is all it is unless backed by stats).

Could it be coincidental that the clean up is periodic (ie monthly) and the SP is reporting stats that may take into account month-end type reporting/calculations/bulk processing (etc) ?

Might still be worth having a quick look at that SP...

I would suggest you need to look at CPU activity and Memory activity as well as IO activity. There are some good articles and guides in MSDN such as :
http://technet.microsoft.com/en-au/library/ms166575(SQL.90).aspx and http://technet.microsoft.com/en-au/library/ms187830(SQL.90).aspx  and http://www.microsoft.com/technet/prodtechnol/sql/2005/tsprfprb.mspx  (going from "lightest" to "heaviest").

Performance can be a big thing, so best read up a little, get some perfmon stats. There are even "Health Check" tools from Microsoft to help you :
http://www.microsoft.com/downloads/details.aspx?familyid=eedd10d6-75f7-4763-86de-d2347b8b5f89&displaylang=en





0
 

Accepted Solution

by:
tomandlis earned 0 total points
Comment Utility
Mark, thanks again.  I think I can rule out coincidence as the problem persists for days and then suddenly stops.  I have autostats turned on already.

I think it must be that the stats on the tables are blown away during the maintenance process.  Immediately after the process I run the proc it runs with one set of parameters/data and that builds the table stats in such a way that the tables will run very slowly with a different set of parameters data.  

The problem self-corrects as long as auto-stats is turned on because  after running the proc for a while with a different sets of data/parameters results in stat updates and that results in quicker runs.

workaround: I hardcoded three different procs and each deals with its own set of parameters and data.  i also harcoded the proc to their own set of calculation tables.
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Just a thought, but there are considerations for auto-stats, and might want to manage it a bit differently...

Might be worth considering specifically / deliberately updating those statistics, and possibly now turning off auto-stats.
Then use a regular procedure to update "known" statistics / query plans. Auto stats are great when building or sampling a database, but not as desirable when there is some "database maturity" in terms of size and distribution of keys...  

Also, in 2005 there is a new feature : The ALTER DATABASE option AUTO_UPDATE_STATISTICS_ASYNC - meaning it will fire off in background, whilst the query that would caused the auto-update uses the original (old) query plan. and on large tables, a plan only really becomes "out of date" if row sampling changes by about 20%.

So, it there is reasonable and distributed activity, possibly doesn't make too much difference. If there are lots of "additions" so long as it reflects same type of pattern, then the query plans are porbably OK, if there are lots of deletes and changes, then maybe index defrags are required.

Becuase you notice it in a SP after a full optimisie, then it could be the query parameters themselves. If the parameter values of SP's change at time of running, then it is likely that the query plan has been optimised using different statistical basis, and therefore invokes the "heavy" auto-stats, so, changing stats to ASYNC mode will help, but in reality, you probably already have an optimised plan which query optimiser is ignoring becuase the parameters have changed... This is often the case when a stored procedure could be run for a user-requested period, or if null, runs for the default period. That flexibility is great, but really need to have the stored procedure using a "known" plan regardless of parameters. It is called parameter sniffing. Yep, there is a term for it. Sometimes the recompile will help (hence the previous suggestions), but really does not address the cardinality of the (key) values used with the parameter (because it's real use is largely unknown).

Which is probably the reason why your workaround is working for you... So, it might not be a workaround as much as "better" stored procedure practices to avoid pparameter sniffing.
0
 

Author Comment

by:tomandlis
Comment Utility
Thanks again Mark. You are correct about the parameter sniffing.  The proc runs entirely differently when its parameters change--both internally, e.g., if-else, and the amount of data that it pushes around in the tables.
0
 

Author Comment

by:tomandlis
Comment Utility
FYI, after more analysis I found the problem had nothing to do with STATMAN or SQL Server Statistics and was entirely due to a poor design decision by yours truly.  I tried to repurpose some code and it had the unintended consequence of adding many runs to the queue.  The flaw caused the process to run 10,000 times instead of about 300 times.  Once I corrected the design flaw and unloaded the queue the process went back to its usual run times.  How embarassing.
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
And how very gallant of you to share and enlighten !!  

The vast majority of performance issues are code based / application (see below), and the most gratifying thing is that you not only found it, but could also fixed it. Well Done.
performance-gains.jpg
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Join & Write a Comment

I am showing a way to read/import the excel data in table using SQL server 2005... Suppose there is an Excel file "Book1" at location "C:\temp" with column "First Name" and "Last Name". Now to import this Excel data into the table, we will use…
Use this article to create a batch file to backup a Microsoft SQL Server database to a Windows folder.  The folder can be on the local hard drive or on a network share.  This batch file will query the SQL server to get the current date & time and wi…
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, Just open a new email message.  In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now