Solved

Sybase Cache Management and Query Performance

Posted on 2011-02-24
32
2,288 Views
Last Modified: 2012-05-11
I have a Sybase ASE database - 15GB RAM 6 COREs of IA64, running Red Hat Linux 2.4 (64Bit) Kernel, RAID 1 disk 2GB Fibre Channel Disks (bit old now), Single database approx 30GB, with 10GB of Shared Memory most of that data is not usually access - and should be in a DataMart. I am activey archiving data - reducing to 2 years of history - this will reduce the table sizes by 2/3rds. System is used to create Orders and do basic account management, Names / Address and Order Inquiries.

I have an order table with 3mil rows and order lines with 4mil rows and a few other related tables similar row counts used for inquiry purposes. I have allocated these tables (1GB and 500MB) their own significant caches and separate ones for their indexes (500MB).

Now the issue: we see a small number of  queries that are _very_slow - 20 - 30 seconds accessing data that should be in cache.  If I re-run the query (we have a system to trap slow SQL statements) directly the query is _always_ very fast - have checked for Deadlocks - none.

I assumed it was a cache issue, hence allocated a significant cache for the order related tables, The queries all use indexes - no table scans.

Just wondered if any one has had a similar experience. Trying to get some consistent performance. Could be a DISK IO issue, could be data is rolling out of cache - going to use the Relaxed LRU mode for the caches.
0
Comment
Question by:kyleitvss
  • 19
  • 13
32 Comments
 
LVL 24

Expert Comment

by:Joe Woodhouse
Comment Utility
sp_sysmon has a pretty clear cache management section. It has enough information to guide most cache tuning exercises. See if you can get an sp_sysmon output running during one of these slow queries and post the whole output here.

Many sites specifically "pre-warm" the cache to avoid this sort of thing - perhaps you could pre-run these slow queries yourself?

Still, it'd be better to figure out what's actually happening and if there's a cache issue. Can we also have the output of sp_monitorconfig "all" so we can check some of the "low-hanging fruit"?
0
 

Author Comment

by:kyleitvss
Comment Utility
Hi Joe -  "pre-warm" the cache - that is what I  have been testing - and it works -. Since the post I have done a "reorg rebuild" on all the indexes for the affected tables.

Also have run sysmon for an hour (yesterday) - output attached

What I think I am seeing is the data 'rolls out' of the cache after a short while. I am assuming there is a high turn over - hence I increase the cache sizes for the table and the indexes and gave them exclusive names caches.
sp-sysmon.txt
sp-monitorconfig-all-sales.xls
0
 

Author Comment

by:kyleitvss
Comment Utility
Here is the monitorconfig-all file as a tab delimited file sp-monitorconfig-all-sales.csv
0
 
LVL 24

Expert Comment

by:Joe Woodhouse
Comment Utility
That's a lot of data caches. Any particular reason you split them up like that? Can we get the output of sp_helpcache and sp_cacheconfig too please?

These days I tend to think multiple named caches is not a great idea. The original motivation for this was to spread spinlock contention since each cache was protected by a single spinlock. However since it's been possible to create cache partitions (each with its own spinlock) the spinlock argument no longer holds as a reason for multiple caches.

Remember any one object can only use one cache, so when you create multiple caches you are making the total cache available to any object smaller...

You also seem to have a data cache called procedure cache which is puzzling me. It also doesn't seem to be used.

It would also be useful to see the sp_configure values for "global async prefetch limit", "open objects", "open indexes" and "procedure cache" please.

0
 

Author Comment

by:kyleitvss
Comment Utility
Sybase Engineer set up the originals and I added a few more. I'll do some research on your comments though - re the number

Files attached
sp-helpcache.csv
sp-cacheconfig.csv
sp-configure-values.csv
0
 
LVL 24

Accepted Solution

by:
Joe Woodhouse earned 500 total points
Comment Utility
There are two motivations for named caches - spinlock contention (as above, no longer a motivation, use cache partitions instead) and nailing hot objects into memory.

This is a tricky business requiring ongoing maintenance. Broadly speaking we want to see the utilisation of each cache (as reported by sp_sysmon or MDA tables) match the proportion of total data cache given to that cache.

ie. if default data cache shows it is getting 90% of the total cache requests then it should have 90% of the total data cache memory.

The danger as above with "too many" caches is that it reduces the cache available to any object. Unless there are very specific tuning things going on it is almost always better to have no more than two or three named caches: one for the transaction logs in your user databases, optionally one for tempdb (hard to say whether this is a good idea or not without benchmarking), and default data cache for everything else. Use an appropriate number of cache partitions (the smallest power of 2 closest to your number of online engines seems to work well) but otherwise trust the ASE cache management to do a good job of keeping frequently used pages in memory.

Based on a combination of sp_sysmon, sp_monitorconfig and sp_configure outputs, it looks to me like you need to raise some values:

procedure cache size - small increase, maybe 5%
open objects - increase to 1000
open indexes - increase to 600
global async prefetch limit - decrease to 8

That last is not intuitively obvious. Asynchronous prefetch or APF reserves a percentage of all data cache pools to do a kind of speculative prefetch. sp_sysmon reports on how often this was attempted at all, and how often of those attempts was it actually useful. Your sp_sysmon output shows APF requests are a tiny fraction of all data cache requests and yet APF has reserved 10% of data cache for these. Per the principle above (make the allocation proportional to the utilisation) we should decrease it. I'm being conservative and decreasing to 8% for now, but I suspect 5% is a better number.  
0
 

Author Comment

by:kyleitvss
Comment Utility
Bit confused with some of the numbers

Open Objects is already set to a Run Value of 2000
Open Indexed is set to a  Run Value of  750
   
0
 
LVL 24

Expert Comment

by:Joe Woodhouse
Comment Utility
So they are. Ignore those recommendations. I misread the columns I think. :)
0
 

Author Comment

by:kyleitvss
Comment Utility
Ok I'll consolidate the number of caches down - should I have

1. Data
2. Index
3. Log

And i have one for tempdb already
0
 

Author Comment

by:kyleitvss
Comment Utility
Found these:

execute sp_countmetadata "open objects"
execute sp_countmetadata "open indexes"
      
1      There are 2300 user objects in all database(s), requiring 4633 Kbytes of memory. The 'open objects' configuration parameter is currently set to 3000.

2      There are 1512 user indexes in all database(s), requiring 2718 Kbytes of memory. The 'open indexes' configuration parameter is currently set to 2000.
0
 
LVL 24

Expert Comment

by:Joe Woodhouse
Comment Utility
Unless you have specific benchmark results proving performance improves with an index cache vs without, I don't see the point. If it's a page you need often, it will tend to remain in cache anyway.

Log cache is a good idea. Make sure you create it as a log_only cache. It should be small - 100Mb is plenty for most sites unless you have extremely heavy trigger use and replication - and don't forget to bind syslogs in your user databases to it. (Requires single user mode per database binding.) 90% of its memory should be in a "two-page" pool - you seem to have a 2Kb page server, so that means 90Mb of 4Kb pool and 10Mb of 2K.

Tempdb cache is also something I'll only do after benchmarking proves it helps.

You should have a repeatable benchmark you can run before making any changes at all so you can prove this helped. Pretty much the only way to make cache change tests repeatable is to always start with the same cache state - ie a cold cache, ie an ASE reboot prior to each benchmark run.

I am not guaranteeing you these changes will help!! :) We can't know without testing them.
0
 
LVL 24

Expert Comment

by:Joe Woodhouse
Comment Utility
Also metadata caches seem ok, no need for any changes.
0
 

Author Comment

by:kyleitvss
Comment Utility
Ok I will have a go at this evening
0
 

Author Comment

by:kyleitvss
Comment Utility
Also Thank you for your help in reviewing my question
0
 
LVL 24

Expert Comment

by:Joe Woodhouse
Comment Utility
No problem, that's why we sign up to EE. :)

Remember you can't drop a cache while objects are bound to it, and you can't bind system tables (like syslogs) unless their database is in single-user mode. Good luck!
0
 

Author Comment

by:kyleitvss
Comment Utility
Have updated the 1st server and things are a lot better - will do the second server Sunday Night.

I as suspecting now that there is an physical I/O issue to consider too. Over all performance is good - but queries are still taking way to long - and they are fine once the data is actually in the cache.

I'll Run a sysmon for a few hours
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 24

Expert Comment

by:Joe Woodhouse
Comment Utility
FYI a few hours may be too long for any one sp_sysmon as it uses integer counters which overflow with large samples. If it does, try it with 30 or even 20 minute samples.

If you can identify a good sample period when the server was as its busiest and performance was at its worst, you're welcome to post the entire sysmon output here as an attachment. I'll see if anything (not just cache management) leaps out at me as needing attention.

Glad to hear you're seeing good results so far.

Re. I/Os you'll need to ask some questions at the O/S and storage layers to get a good feel for what's going on there. In my experience you have to start looking into obscure things like firmware updates for HBA cards connected to the SAN etc etc etc. Usually way beyond my competence! :)
0
 

Author Comment

by:kyleitvss
Comment Utility
Attached is full 20 min sysmon output.

I did some reading and saw a comment re  Address Lock Contention and Merge Lock Requests

  Address Lock Contention        4602.7           6.7     8284838     81.9 %

  Merge Lock Requests             per sec      per xact       count  % of total
  -------------------------  ------------  ------------  ----------  ----------
    Network Buffer Merge Locks                                                  
      Granted with no wait         3297.7           4.8     5935828      41.7 %
      Granted after wait           4602.7           6.7     8284801      58.3
        
        

sysmon-28022011---20mins.txt
0
 

Author Comment

by:kyleitvss
Comment Utility
This information would suggest there are no I/O issues?

  I/Os Delayed by
    Disk I/O Structures               n/a           n/a           0       n/a  
    Server Config Limit               n/a           n/a           0       n/a  
    Engine Config Limit               n/a           n/a           0       n/a  
    Operating System Limit            n/a           n/a           0       n/a  


  Total Requested Disk I/Os         452.1           0.7      813711            

  Completed Disk I/O's
    Engine 0                         45.9           0.1       82564      10.1 %
    Engine 1                         33.7           0.0       60696       7.5 %
    Engine 2                         32.4           0.0       58366       7.2 %
    Engine 3                         50.6           0.1       91152      11.2 %
    Engine 4                         58.7           0.1      105643      13.0 %
    Engine 5                         42.2           0.1       75962       9.3 %
    Engine 6                         61.1           0.1      110040      13.5 %
    Engine 7                         39.6           0.1       71287       8.8 %
    Engine 8                         43.6           0.1       78560       9.7 %
    Engine 9                         44.0           0.1       79207       9.7 %
  -------------------------  ------------  ------------  ----------            
  Total Completed I/Os              451.9           0.7      813477
0
 
LVL 24

Expert Comment

by:Joe Woodhouse
Comment Utility
The main things that stick out for me:

As you say, Address Lock Contention, but also Exclusive Address locks. These are both about locks on index pages blocking things needing to be done to data pages. The standard cure is datapages locking. Run sp_object_stats for 20-30 minutes during a busy period and it will help identify tables. Good candidates are small tables with high contention.

With your data caches, you still have one called "procedure cache" which appears to be doing nothing at all. You can get that memory back by deleting it.

Data_cache1 is looking after 25% of the total cache requests and hitting almost all of them in cache. You could possibly even make this smaller but you don't seem memory bound in your main default data cache so you can probably leave it alone. Maybe try switching it to "relaxed" mode since you're hitting close to 100% hit rate in it.

dwh_data_cache1 is doing almost nothing. I hope it is small. Perhaps it isn't needed at all.

Your index_cache is look after only 3% of the work. I hope it doesn't have more than 5% of the total cache memory. It could also be switched to "relaxed", but I would probably just delete it.

Your default data cache is showing 14% spinlock contention even with these named caches. I would set up cache partitions - try starting with 4. More importantly you have no large I/O capability here and this is likely hurting you. Try converting 25% of the 2Kb pool to 16Kb. Perhaps add 10Mb of 4Kb pool too.

Your log cache doesn't have a 4Kb pool. This is hurting you. Make sure your databases have had their syslogs tables bound to this cache. Switch the cache to both "relaxed" and "log only".

Your tempdb cache has no large I/O. This will also hurt. Switch one third of its 2Kb pool to 16Kb.


Nothing else screams for attention; fix the above and see what that does for you. Good luck!

0
 

Author Comment

by:kyleitvss
Comment Utility
And this that we are not waiting for the network

Network I/Os Delayed              0.0           0.0           0       0.0 %
0
 

Author Comment

by:kyleitvss
Comment Utility
I reduced the number of Worker Threads - they where 15 - I reduced them to 5 and the

Max parallel degree from 4 to 2
Max scan parallel degree from 4 to 2

re -ran sysmon and Address Lock Contention reduced to 17.2%

    Address Lock Contention         168.8          11.0      303827      17.2 %

That is a good thing?
0
 
LVL 24

Expert Comment

by:Joe Woodhouse
Comment Utility
I/Os Delayed simply means did you hit any configurable limits on how many outstanding I/Os can be open at once. You need O/S level stats to say for sure there were no queue waits on disk. sp_sysmon can't tell us that. It can imply it - look at Disk I/O Checks, Checks Returning I/O: 26.8% of the time ASE asked if a disk I/O was done yet, the answer was no. This suggests (but does not prove) that ASE thinks the disks are slow.

Network I/Os Delayed unfortunately doesn't mean no network waits, just no events within ASE itself that delayed network activity. Again, you need O/S level stats for this.

Reducing Address Lock Contention is indeed a very good thing, but you may have hurt more than you helped by constraining parallel query. You'd want to benchmark that before and after to see. I still think sp_object_stats is a good thing to try under load. Do not rush immediately to datarows locking! Try datapages cautiously first on the smallest half of the top 20 most-contended tables.
0
 

Author Comment

by:kyleitvss
Comment Utility
OK - have updated the caches as suggested - other that adding partitions to Default Cache - will need to re-start the server later tonight for that one. I'll re-run sysmon for 30 mins now and then again tomorrow after updating the defaullt cache.

I did bind the syslog to the syslog cache - I deleted the procedure cache - and added the memory to the syslog_cache,  I'll delete the Index Cache tonight and add the memory to data cache 1 ? or Default ?
0
 

Author Comment

by:kyleitvss
Comment Utility
Most of our locking is already DataPages - few Datarows - Datarows does not seem to add much value.

Checks Returning I/O: 26.8% of the time ASE asked if a disk I/O was done yet, the answer was no. This suggests (but does not prove) that ASE thinks the disks are slow.

I too  think to disks are slow - if I run a dd if=/dev/raw/raw33 of=/dev/null bs=1024kb - the server gets very slow - am looking to use pcp - a Reh Hat performance tool along with my SGI service engineer.
0
 
LVL 24

Expert Comment

by:Joe Woodhouse
Comment Utility
I'd say default, data_cache1 is doing fine for cache misses, default data cache is the one that needs it the most.
0
 
LVL 24

Expert Comment

by:Joe Woodhouse
Comment Utility
Probably questions on anything other than data caches should go into a new question topic, BTW. EE prefer new questions not be added. My bad - I like looking at sysmons. :)
0
 

Author Comment

by:kyleitvss
Comment Utility
Will run sp_object_stats now...
0
 

Author Comment

by:kyleitvss
Comment Utility
Probably questions on anything other than data caches should go into a new question topic, BTW. EE prefer new questions not be added. My bad - I like looking at sysmons. :)

Ok - I'll leave that point (DISK I/O) for another question.

I am sure the changes are making a difference - however I am starting to really question the I/O as the root cause.
0
 
LVL 24

Expert Comment

by:Joe Woodhouse
Comment Utility
Caching ultimately is just a hack for dealing with slow I/O. If all your disks were enterprise SSDs you wouldn't need data cache at all. So yes, in a way it is impossible to sensibly discuss caching without talking about disk - good caching tuning can mask disk problems, for instance.
0
 

Author Comment

by:kyleitvss
Comment Utility
sp_object_stats:

DBCC execution completed. If DBCC printed error messages, contact a user with System Administrator (SA) role.         
1 row(s) affected.         
DBCC execution completed. If DBCC printed error messages, contact a user with System Administrator (SA) role.         
----------------------------         
No contention on any tables!         
0
 

Author Comment

by:kyleitvss
Comment Utility
Just to complete this - we have checked the storage and found that the Disk Array is operating at Max capacity and needs to be updated.

So thanks for all your help - its was very useful and will be put to good use when we get a new storage system
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Suggested Solutions

Marketing can be an uncomfortable undertaking, especially if your material is technology based. Luckily, we’ve compiled some simple and (relatively) painless tips to put an end to your trepidation and start your path to success.
In this article, you will read about the trends across the human resources departments for the upcoming year. Some of them include improving employee experience, adopting new technologies, using HR software to its full extent, and integrating artifi…
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now