Update statistics and cpu

I have a 11.9.2 database (size about 200 GB) running on HP-UX 11.0 with 4 cpu's. Update statistics for this database lasts about 16 hours.
Looking at some monitoring I notice that while update statistics runs 1 of the 4 cpu's is 100% busy, the other 3 are doing absolutely nothing. The Sybase server is configured for parallel querying and this works, too.

My question now is, how comes update statistics doesn't use the other cpu's? Is that not possible or is there another configuration setting, Sybase or Unix, necessary?
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Joe WoodhousePrincipal ConsultantCommented:
Update statistics is single-threaded by default - any one update statistics job will only ever run on one ASE engine.

However even in ASE 11.9.2 it is possible to force parallelism using the "with consumers = N" clause, but this can only be used with the "named column" form of update statistics, i.e.:

update statistics table (col1, col2)
with consumers = 4

This usually won't be the most effective form of update stats. (It also requires that parallel processing be enabled in the usual way, i.e. "number of worker processes", "max parallel degree" and "max scan parallel degree" must all be set higher than 1.)

If you're wanting to bring down the end to end time of your update statistics job, you can run multiple update stats commands on different tables concurrently. The useful limit here is probably not more than the number of engines. Note you say that your server has four CPUs but as you probably know that isn't the same as having four ASE engines, which you much separately configure. This will scale almost linearly, assuming you can divide the workload up between all engines evenly.

Another related idea is to have a policy of running update statistics, say, once per week on any one table, but rather than running it for all tables in the same window of opportunity, run it on one seventh of the tables each night. Assuming you can divide the work up evenly, this should also scale fairly linearly.

Both approaches can of course be combined, so in a perfect world (all tables are the same size), you should be able to run four concurrent update statistics streams on one seventh of your tables each night, for an approximate 28 times improvement in end to end time. That would take you down to about 35 minutes each night, again assuming all tables are the same size and the work can be divided up evenly.

If you're interested, I have a shell script written in ksh that will do this for you - perform any arbitrary table-level operation in an arbitrary number of concurrent streams. It's useful for table-level dbcc's, reorgs, and so on.

You should also be aware that starting in ASE, it's possible to run update statistics specifying a sample percentage. This doesn't quite do the same amount of work (it won't update the density, for example), but most of the time you really don't need to read every single row to still get a pretty good feel for what the data looks like. I've had a lot of success with 25% sampling (on large tables), which isn't four times faster but is certainly more than twice as fast. Combining this with the earlier approaches would have you doing your update stats in maybe 20 minutes a day!

I should also mention that update statistics will make use of large I/O buffer pools, if they are correctly configured, the data isn't too fragmented, and the data hasn't already been loaded into smaller buffer pools in cache.

Good luck!

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
ahoorAuthor Commented:
Thanks so far. This may be usefull, though the fact is that the largest table in the database is about 20 times as big as the 2nd largest table. Still, should I run update statistics on the big table in 1 job and on the other tables in a simultaneous job than the total running time is equal to the runtime of the job on the big table, which is about 10-12 hours, which still means I win 4-6 hours.

I have indeed configured the ASE to use all 4 cpu's (max online engines = 4).

Still one question, should I run 2 simultaneous jobs on different tables, can a problem occur with the logsegment or tempdb? I would say not, but is that true?
Joe WoodhousePrincipal ConsultantCommented:
You can carry out the same divide & conquer approach even within a table. You could run multiple update statistics jobs, one per index, on the main table concurrently - effectively giving you parallelism on the main table. You might then consider running 3 streams on the big table, and one stream for everything else, assuming the big table has at least three indexes.

Failing that, you *can* run the "with consumers" clause (again, specifying three consumers) if you specify a column list that is equal to the indexed column(s).

You shouldn't have problems with the log - the vast majority of what update statistics does is read-only, it only writes when it is updating the relatively few rows in systabstats and sysstatistics.

As for tempdb, this depends on what you are running. If you are asking for update stats on unindexed columns, or update all statistics, there could be some worktable use, otherwise it reads from the indexes.

We should mention for completeness that upgrading to or higher would give you the greatest speedups here. 8->

Good luck!
Cloud Class® Course: Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

ahoorAuthor Commented:
Thanks again, I'll try it this weekend and evaluate what happened.

I know, Sybase 12.5, we were going to upgrade a year ago but you know it goes. Too many projects in too little time, and the first project to be dropped is upgrading Sybase. Hopefully next year.
ahoorAuthor Commented:
Ok, it worked.
The first batch now took 14 and a half hours. During 3 hours another batch ran simulatenously. Last week the total was 16 and a half hours, so it's 2 hours in the pocket. I will try and devide the batches even more so I can win more time, but I'll figure that out myself.
Joe WoodhousePrincipal ConsultantCommented:
You're welcome, glad we could improve on the timings for you.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Sybase Database

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.