Link to home
Start Free TrialLog in
Avatar of paulqna
paulqnaFlag for Netherlands

asked on

Howto get rid of oracle "message from client" wait time?

Case: Websphere talking to Oracle server (both unix) at its busiest both not more then 20% CPU in use each.

What parameter should we tweak/tune to get rid of the "76807.81 Total Waited" in 2,5 hours?
call     count       cpu    elapsed       disk      query    current        rows
 
------- ------  -------- ---------- ---------- ---------- ----------  ----------
 
Parse      202      0.09       0.10          0          0          0           0
 
Execute 100525     43.63      44.31         47      55742     757408       50858
 
Fetch    77564    206.83     211.67       3273    6049646         91      304311
 
------- ------  -------- ---------- ---------- ---------- ----------  ----------
 
total   178291    250.55     256.09       3320    6105388     757499      355169
 
 
 
Misses in library cache during parse: 48
 
Misses in library cache during execute: 6
 
 
 
Elapsed times include waiting on following events:
 
  Event waited on                             Times   Max. Wait  Total Waited
 
  ----------------------------------------   Waited  ----------  ------------
 
  SQL*Net message to client                  144677        0.00          0.21
 
  SQL*Net message from client                144675       29.43      76807.81
 
  log file sync                                 838        0.02          1.62
 
  latch free                                    134        0.02          0.48
 
  buffer busy waits                              51        0.02          0.25
 
  db file sequential read                      1659        0.02          0.25
 
  SQL*Net more data to client                   298        0.00          0.01
 
  SQL*Net more data from client                1182        0.05          0.08
 
  direct path write (lob)                        68        0.00          0.00
 
  SQL*Net break/reset to client                  74        0.00          0.03
 
  db file scattered read                        678        0.05          2.35
 
  log file switch completion                      2        0.00          0.01
 
  log buffer space                               11        0.01          0.08

Open in new window

Avatar of Sean Stuber
Sean Stuber

The message is just what it sounds like.

The database is waiting on the client.
The most common cause of it is "people time" or "think time"

If I connect to the database, pull some data back to my screen and then spend 10 minutes reading it you'll accumulate 600 seconds of SQL*Net message from client, because the database is idle and is simply waiting on me to do something.

It can also happen if the client is slow to do processing.
And it can happen in loops, where you pull one row from the database do something to it then pull the next row and do something to it.  Each one action might be fast, but if you do it 144 thousand times the accumulation of all those tiny pauses can be significant.

Given your "to" and "from" wait counts it looks like your application probably works in a loop like that.
The database is able to send data to your app very quickly, so even though you sent data from the db to the client 144K times, your client then paused to do something with that data, most of the time that pause was tiny, with a few spikes, one of those spikes was 29 seconds.  So, in general the client processed each trip quickly but it did it 144675 times.

If you can transmit your data in one large chunk or at least a few large chunks that will help.    Each individual wait will be longer but you won't have to do thousands and thousands of handshaking steps.
Even better, check what the client side of the application is doing.  Is there any processing that could be left on the database?  Far too many applications try to pull data out under the mistaken assumption they can "control" the operation on the client side better when all that really accomplishes is adding an extra performance bottleneck, extra handshaking steps and an extra break point.
Avatar of paulqna

ASKER

Ok, but on the application side we see the same waiting times in this 2,5 hour timespan, why are they (websphere and oracle) waiting for eachother?

It does confirm the 80% CPU idle time at both ends...

Is it the sqlnet/tcp handshaking 144K times?
it could be network lag,  eliminating that or at least reducing the number of handshakes will help.

There is no "parameter" to adjust, it's simple mathematics of number of steps.  Reduce the steps and you'll be much faster.
Avatar of paulqna

ASKER

We have just tested on a test environment...
 - same job,
 - same job data
 - slower servers! (both websphere and oracle)
 - same os version
 - same webphere version
 - same application version
 - same oracle version

But there it takes only 30 minutes... CPU load again 20% both on websphere and oracle...

Same result in the stats, 80% "wait for client" again...

Copying files directly between the servers is as fast as should be...

I'm out of options... any clue...?
if you were doing the exact same work in both environments then it must be your network lag
Looks like  you are using "connection pooling". is it?
If you have connection pools, the connections will stay open and idle, the idle time will add to the SQL*Net message from client event.

This wait event you see is not as that harmful as such and with a connection pool, I dont think it can be avoided as well.
yes, good catch, that's another great way to artificially inflate the number
wow sean, still awake? (lol)
ha, just sat down a little bit ago to do some catchup, I've got a few questions open I need to try to wrap up or at least point them in the right direction.  I really am cutting back on my EE time, I swear!  :)
Avatar of paulqna

ASKER

I use the connection pooling from websphere.

However I don't understand while both servers have 80% idle time, while waiting for eachother, this "artificially inflate" is harmless and unavoidable...?
Yes, the idle time resulted by the "idle" connections in the pool are harmless. In this case it is "just" a number indicating the idle time.

But you need to make sure that your network has a good response time as well.
Avatar of paulqna

ASKER

I hope all contributors agree that none of the comments have explained how to resolve the issue and I hereby close the question.
ASKER CERTIFIED SOLUTION
Avatar of Sean Stuber
Sean Stuber

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of paulqna

ASKER

One can't always get what one wants.

My conclusion is that this question is not appropriate for answering via a forum.

Thanks to the experts for the thoughts and time spent on this one.

Case closed.
I suggest rather than giving a C to me alone, you should split the points to everyone, I suggest a higher grade too

Split because others contributed to the answer.
Higher grade because, while you didn't get the answer you wanted, we did answer the question.
If you think a penalty grade is in order, please explain how we failed.
Avatar of paulqna

ASKER

The answer to this initial question:
"What parameter should we tweak/tune to get rid of the "76807.81 Total Waited" in 2,5 hours?"

Is:
Install oracle bugfix for the CBO in Oracle 9i