asked on

Sybase - Slow response, possible network problem?

Hi,

My customer has been complaining of slow response time. Looking at the sysmon report, we noticed that when Engine CPU Busy is high (80-99), the context switch network packet sent count is also high (7000-11000, 65-84%). On the times when engine CPU busy is low (< 50%), the network packet sent count is also lower (< 5000, ). The cache searches misses is mostly at < 1%.

There is a similar question posted earlier (Q_21722680.html). But I want to make sure that the solution recommended there is also applicable to my problem.

tcp no delay (from 1 to 0) - is there any memory increase with this change? Is it safe to change it now and just wait for the ASE recycle for it to take effect?

increase default packet size - currently, it is set to 512. When you say, the client has to be changed also, where is that? At ODBC? Do I also have to increase the 'additional network memory' ? It is currently set to 100352 bytes.

Lastly, is there anything else at sysmon that I can check for this slow response?

BTW, this is for ASE 12.5.3, Windows 2000 Server.

Thanks.

Joe Woodhouse

There's no memory increase with toggling "tcp no delay".

The "default network packet size" is misnamed. It says "reserve this much memory for use by network connections, and connect using this packet size for any Sybase to Sybase server to server communication". You *must* set a network packet size from the client side. In ODBC this is under the "Advanced Tab", where each number is a multiple of 512 bytes. (So a setting of "4" = 2048.)

You don't need to set "additional network memory" if you're increasing "default network packet size". That's used for when "max network packet size" is larger than default.

Mind you, I suspect all you're doing here is treating a symptom. Sybase Engine CPU up at 99% is unusually high - what's making this ASE work so hard? Since you're on ASE 12.5.3 you have access to the "monitor" or "MDA" tables (you have to install them with a supplied script) which are a bit tricky to use but can tell you all sorts of things like which objects are getting the most I/O, which procedures are causing you the most load, etc.

Or put another way, playing with packet sizes at best will deal with a bad situation better. The best way to do some work faster is to do less work! Maybe you have some code that can be rewritten. Maybe you have some named cache bindings that no longer make sense. Maybe there are some simple sp_configure parameters that are set wrong...

sp_sysmon, sp_object_stats and the MDA tables are where I'd begin with this...

maria_rossi

ASKER

Thanks for the reponse.

Couple more questions:
1. on MDA, how do I find out which SP's are used the most? Would running select from monProcessProcedures repeatedly, work?

2. also on MDA, at monCachedProcedures , there are several occurences of same SP's with different compile dates. Why are there several copies cached and when are they replaced?

3. at sp_sysmon, under Cache Management, what is the difference between 'Buffers Grabbed' and 'Cache Search Misses'? I thought they would match or be close.

4. We will be increasing our max memory from 512 to 1.5 GB, so have memory to play with. We have only a 'default data cache' at 330MB. Would increasing this or creating a new data cache (then bind tables to it) help? However, the context switch Cache Search Misses has been consistently < 0.1%. That's why I did not think of looking at the data cache.

Thanks again.
Maria

Joe Woodhouse

Sorry for taking a while to respond.

1) Some MDA tables report on a cumulative total since ASE was last booted, but most just give a total since the last time they were queried. Running that SELECT repeatedly would just show you what happened in that interval, which may or may not be what you're after. A better answer might be to run it (say) once a day for a week to build up a picture of what gets run routinely. It won't capture those big end of month/quarter/year reports, but it will see most other things.

2) A "compile" date doesn't mean what it sounds like - this means the date a query plan was generated and loaded into procedure cache, not the date the procedure was created. This is because ASE procedure cache is not re-entrant - a query plan is not shared between processes. If two processes both run the same procedure, they will use their own private copy of a query plan in cache. If a plan is already in cache and unused, it will be reused, otherwise a new query plan generated ( = "compiled") and loaded into cache. What you're seeing are the results of this.

3) A "buffer grab" is when a page is inserted into the data cache MRU-LRU chain (usually at the MRU end, but there are some cache replacement strategies that replace in the middle of the chain instead). This is the case even when a page was already in cache, but by virtue of being reused, has been moved to the head of the queue again. This scenario is *not* a "cache miss" - since what we wanted was in cache - but it *is* a "buffer grab" since a buffer (page) was grabbed and inserted (probably at the Most Recently Used end).

4) While "cache search misses" in context switches is important, the only good measure of whether you're hurting for memory or not is in the Data Cache section, specifically "cache misses per second". This is what shows how hard your physical disks are working due to not enough memory for cache. Data cache at only 330Mb strikes me as very small unless your databases are also quite small. The magic number to look for is somewhere in the range of 100-125 I/Os per second per physical disk - so if you have a RAID group of 5 disks, then cache misses per second of over 500 implies too much I/O and that you're memory bound. It is possible to be in this state even though cache hits (as a percentage) look very high!

I would think that almost certainly it would be better to give the memory to default data cache rather than play with named caches. Named caches are the luxury of "lots of memory" and even with 1.5Gb I don't think you qualify. 8-> Perhaps one exception would be a cache dedicated to transaction logs only (and create it with the "logonly" option), as this can be relatively small. ie. for one database without replication or triggers, even 20Mb is probably enough. For several databases where there's plenty of replication and/or triggers, I'd want 50Mb. This is a complex topic and if in doubt, just add memory to default data cache - it is never wrong to do so.

Good luck!

JMattias

Hi,

A few more questions.
1. Have you checked to see if there is enough space left in the databasefiles?
2. Do you have a queue to the processor even if the CPU isn't heavily loaded?
3. How many user connections?
4. How are the physical disks working?

Regards
/Mattias

maria_rossi

ASKER

Hi,

Sorry for not responding soon. I am stil working on the fallout from a Sev 1 problem last Monday (I'll get to that in a little bit).

First, Joe, thanks for the explanation on those items, especially on the cache. Unfortunately, I read your email after I have made the config changes this past weekend (more on this)

Second, to answer Mattias' questions:
1. all the databases have 30% or more free space
2. I don't know. How do I check that?
3. There are 200 user connections, consistently about 50% used
4. Not sure where to look (sorry) sysmon shows the DB devices have I/O's

Now, for my problem last Monday. Over the weekend, we did a couple of config changes as follows:
1. added memory to sybase
2. increased default data cache (to 600MB) and created 8 KB I/O pool for default
4. created new named data cache (called mil_data_cache, 400 MB) and created 8KB I/O pool for this also
5. bound 'table1' to mil_data_cache. table1 size is about 4GB

Even though all these changes are dynamic, we recycled the ASE anyway.

Initial tests showed all is ok, even though the 1 SP that we've been testing seemed to improved only slightly. After the test, I queried monCachedObject and saw 'table1' at both default data cache and mil_data_cache. Is this normal??? I would have expected 'table1' to be only at the mil_data_cache where it was bound. Anyway, I did not do anything about that. Left the server for the weekend.

Monday, the plant is open and they complained of 'extremely' slow response. Queries that used to take seconds are now taking 5 mins. Noticed that the Engine-I/O busy was always high, between 50-92%. So, we backed-out the changes and response time went back to normal and the Engine-I/O Busy also went down to < 10%.

I think, processes were doing a lot of disk I/O. But why? and which ones? Unfortunately, I was not able to capture the culprit queries or tables. The 'table1' was not used as much on Monday, becuase 1 of the mfg lines using that table was down for maintenance.

Any ideas on what went wrong? Was it the new named data cache? If yes, why? Where else should I have checked? Does anyone have a quick query to determine the 'heaviest' query or stored procedure or the table that is hit most? I just started looking at MDA tables, so I am not familiar with it.

Interestingly, I made similar changes on another server, ie add memory to ASE and increase the default data cache (but did not create a new named data cache), and this server is ok, no problems so far.

Sorry for the many questions.

Any other tips on what to look for in this problem would be greatly appreciated.

Thanks a lot.
Maria

ASKER CERTIFIED SOLUTION

Joe Woodhouse

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

maria_rossi

ASKER

Thanks for the response.

We're going to give it a try again, this time without the named data cache. Just increase the memory and the default data cache, then create a 8Kb or 16 Kb I/O pool. I learned something from this problem stay away from named data cache :). In my case, the table that was bound to the named DC was huge, only 10% would have fit.

I am still unclear what caused the Engine I/O to go up?

Maria

Joe Woodhouse

Named caches aren't always bad, and can be very good when used well. But you need plenty of memory to use them most times, and it's usually logs, indexes or small lookup tables that benefit the most from them.

Unclear why Engine I/O went up. If you'd had plenty of memory to begin with I'd say it was because of the named cache - but you didn't, so I'm not really sure...

maria_rossi

ASKER

Thanks for all the tips.

Joe Woodhouse

Did we actually fix the problem? 8-) How'd it go without the named cache?

maria_rossi

ASKER

Hi,
Sorry for the delayed response. I did not know that this will be locked after 7 days of PAQ.
Anyway, without the named data cache, it went well. We just added the space to the default data cache. We did encounter a problem on a different issue, after creating a 16KB buffer pool. I posted a new question on that.
Thanks again for your help.
Maria

maria_rossi

ASKER

Hi,
Am sorry, I apologize. I did not know there was a difference between reopen and unlock. I meant, unlock, as I wanted to post a reply. So, pls re-accept Joe_Woodhouse's answer.
Thanks.
Maria