Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Update statistics and cpu

Posted on 2004-11-04
6
Medium Priority
?
1,726 Views
Last Modified: 2008-02-01
I have a 11.9.2 database (size about 200 GB) running on HP-UX 11.0 with 4 cpu's. Update statistics for this database lasts about 16 hours.
Looking at some monitoring I notice that while update statistics runs 1 of the 4 cpu's is 100% busy, the other 3 are doing absolutely nothing. The Sybase server is configured for parallel querying and this works, too.

My question now is, how comes update statistics doesn't use the other cpu's? Is that not possible or is there another configuration setting, Sybase or Unix, necessary?
0
Comment
Question by:ahoor
  • 3
  • 3
6 Comments
 
LVL 24

Accepted Solution

by:
Joe Woodhouse earned 800 total points
ID: 12492292
Update statistics is single-threaded by default - any one update statistics job will only ever run on one ASE engine.

However even in ASE 11.9.2 it is possible to force parallelism using the "with consumers = N" clause, but this can only be used with the "named column" form of update statistics, i.e.:

update statistics table (col1, col2)
with consumers = 4

This usually won't be the most effective form of update stats. (It also requires that parallel processing be enabled in the usual way, i.e. "number of worker processes", "max parallel degree" and "max scan parallel degree" must all be set higher than 1.)

If you're wanting to bring down the end to end time of your update statistics job, you can run multiple update stats commands on different tables concurrently. The useful limit here is probably not more than the number of engines. Note you say that your server has four CPUs but as you probably know that isn't the same as having four ASE engines, which you much separately configure. This will scale almost linearly, assuming you can divide the workload up between all engines evenly.

Another related idea is to have a policy of running update statistics, say, once per week on any one table, but rather than running it for all tables in the same window of opportunity, run it on one seventh of the tables each night. Assuming you can divide the work up evenly, this should also scale fairly linearly.

Both approaches can of course be combined, so in a perfect world (all tables are the same size), you should be able to run four concurrent update statistics streams on one seventh of your tables each night, for an approximate 28 times improvement in end to end time. That would take you down to about 35 minutes each night, again assuming all tables are the same size and the work can be divided up evenly.

If you're interested, I have a shell script written in ksh that will do this for you - perform any arbitrary table-level operation in an arbitrary number of concurrent streams. It's useful for table-level dbcc's, reorgs, and so on.

You should also be aware that starting in ASE 12.5.0.3, it's possible to run update statistics specifying a sample percentage. This doesn't quite do the same amount of work (it won't update the density, for example), but most of the time you really don't need to read every single row to still get a pretty good feel for what the data looks like. I've had a lot of success with 25% sampling (on large tables), which isn't four times faster but is certainly more than twice as fast. Combining this with the earlier approaches would have you doing your update stats in maybe 20 minutes a day!

I should also mention that update statistics will make use of large I/O buffer pools, if they are correctly configured, the data isn't too fragmented, and the data hasn't already been loaded into smaller buffer pools in cache.

Good luck!
0
 
LVL 3

Author Comment

by:ahoor
ID: 12493860
Thanks so far. This may be usefull, though the fact is that the largest table in the database is about 20 times as big as the 2nd largest table. Still, should I run update statistics on the big table in 1 job and on the other tables in a simultaneous job than the total running time is equal to the runtime of the job on the big table, which is about 10-12 hours, which still means I win 4-6 hours.

I have indeed configured the ASE to use all 4 cpu's (max online engines = 4).

Still one question, should I run 2 simultaneous jobs on different tables, can a problem occur with the logsegment or tempdb? I would say not, but is that true?
0
 
LVL 24

Expert Comment

by:Joe Woodhouse
ID: 12497659
You can carry out the same divide & conquer approach even within a table. You could run multiple update statistics jobs, one per index, on the main table concurrently - effectively giving you parallelism on the main table. You might then consider running 3 streams on the big table, and one stream for everything else, assuming the big table has at least three indexes.

Failing that, you *can* run the "with consumers" clause (again, specifying three consumers) if you specify a column list that is equal to the indexed column(s).

You shouldn't have problems with the log - the vast majority of what update statistics does is read-only, it only writes when it is updating the relatively few rows in systabstats and sysstatistics.

As for tempdb, this depends on what you are running. If you are asking for update stats on unindexed columns, or update all statistics, there could be some worktable use, otherwise it reads from the indexes.

We should mention for completeness that upgrading to 12.5.0.3 or higher would give you the greatest speedups here. 8->

Good luck!
0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 3

Author Comment

by:ahoor
ID: 12502600
Thanks again, I'll try it this weekend and evaluate what happened.

I know, Sybase 12.5, we were going to upgrade a year ago but you know it goes. Too many projects in too little time, and the first project to be dropped is upgrading Sybase. Hopefully next year.
0
 
LVL 3

Author Comment

by:ahoor
ID: 12523225
Ok, it worked.
The first batch now took 14 and a half hours. During 3 hours another batch ran simulatenously. Last week the total was 16 and a half hours, so it's 2 hours in the pocket. I will try and devide the batches even more so I can win more time, but I'll figure that out myself.
Thanks.
0
 
LVL 24

Expert Comment

by:Joe Woodhouse
ID: 12527677
You're welcome, glad we could improve on the timings for you.
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Are you a startup company? Being a startup, you may be using shared hosting, or maybe even dedicated hosting. But have you ever given a thought to using cloud computing now? Yes, don’t be surprised, it is possible for startups to opt for cloud compu…
Why WooCommerce is one of the majorly favored choices when it comes to having an eCommerce store. This article will acquaint you with some reasons that I believe make it one of the best eCommerce platforms available.
Loops Section Overview
As many of you are aware about Scanpst.exe utility which is owned by Microsoft itself to repair inaccessible or damaged PST files, but the question is do you really think Scanpst.exe is capable to repair all sorts of PST related corruption issues?
Suggested Courses

572 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question