paulqna
asked on
Howto get rid of oracle "message from client" wait time?
Case: Websphere talking to Oracle server (both unix) at its busiest both not more then 20% CPU in use each.
What parameter should we tweak/tune to get rid of the "76807.81 Total Waited" in 2,5 hours?
What parameter should we tweak/tune to get rid of the "76807.81 Total Waited" in 2,5 hours?
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 202 0.09 0.10 0 0 0 0
Execute 100525 43.63 44.31 47 55742 757408 50858
Fetch 77564 206.83 211.67 3273 6049646 91 304311
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 178291 250.55 256.09 3320 6105388 757499 355169
Misses in library cache during parse: 48
Misses in library cache during execute: 6
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 144677 0.00 0.21
SQL*Net message from client 144675 29.43 76807.81
log file sync 838 0.02 1.62
latch free 134 0.02 0.48
buffer busy waits 51 0.02 0.25
db file sequential read 1659 0.02 0.25
SQL*Net more data to client 298 0.00 0.01
SQL*Net more data from client 1182 0.05 0.08
direct path write (lob) 68 0.00 0.00
SQL*Net break/reset to client 74 0.00 0.03
db file scattered read 678 0.05 2.35
log file switch completion 2 0.00 0.01
log buffer space 11 0.01 0.08
Even better, check what the client side of the application is doing. Is there any processing that could be left on the database? Far too many applications try to pull data out under the mistaken assumption they can "control" the operation on the client side better when all that really accomplishes is adding an extra performance bottleneck, extra handshaking steps and an extra break point.
ASKER
Ok, but on the application side we see the same waiting times in this 2,5 hour timespan, why are they (websphere and oracle) waiting for eachother?
It does confirm the 80% CPU idle time at both ends...
Is it the sqlnet/tcp handshaking 144K times?
It does confirm the 80% CPU idle time at both ends...
Is it the sqlnet/tcp handshaking 144K times?
it could be network lag, eliminating that or at least reducing the number of handshakes will help.
There is no "parameter" to adjust, it's simple mathematics of number of steps. Reduce the steps and you'll be much faster.
There is no "parameter" to adjust, it's simple mathematics of number of steps. Reduce the steps and you'll be much faster.
ASKER
We have just tested on a test environment...
- same job,
- same job data
- slower servers! (both websphere and oracle)
- same os version
- same webphere version
- same application version
- same oracle version
But there it takes only 30 minutes... CPU load again 20% both on websphere and oracle...
Same result in the stats, 80% "wait for client" again...
Copying files directly between the servers is as fast as should be...
I'm out of options... any clue...?
- same job,
- same job data
- slower servers! (both websphere and oracle)
- same os version
- same webphere version
- same application version
- same oracle version
But there it takes only 30 minutes... CPU load again 20% both on websphere and oracle...
Same result in the stats, 80% "wait for client" again...
Copying files directly between the servers is as fast as should be...
I'm out of options... any clue...?
if you were doing the exact same work in both environments then it must be your network lag
Looks like you are using "connection pooling". is it?
If you have connection pools, the connections will stay open and idle, the idle time will add to the SQL*Net message from client event.
This wait event you see is not as that harmful as such and with a connection pool, I dont think it can be avoided as well.
If you have connection pools, the connections will stay open and idle, the idle time will add to the SQL*Net message from client event.
This wait event you see is not as that harmful as such and with a connection pool, I dont think it can be avoided as well.
yes, good catch, that's another great way to artificially inflate the number
wow sean, still awake? (lol)
ha, just sat down a little bit ago to do some catchup, I've got a few questions open I need to try to wrap up or at least point them in the right direction. I really am cutting back on my EE time, I swear! :)
ASKER
I use the connection pooling from websphere.
However I don't understand while both servers have 80% idle time, while waiting for eachother, this "artificially inflate" is harmless and unavoidable...?
However I don't understand while both servers have 80% idle time, while waiting for eachother, this "artificially inflate" is harmless and unavoidable...?
Yes, the idle time resulted by the "idle" connections in the pool are harmless. In this case it is "just" a number indicating the idle time.
But you need to make sure that your network has a good response time as well.
But you need to make sure that your network has a good response time as well.
ASKER
I hope all contributors agree that none of the comments have explained how to resolve the issue and I hereby close the question.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
One can't always get what one wants.
My conclusion is that this question is not appropriate for answering via a forum.
Thanks to the experts for the thoughts and time spent on this one.
Case closed.
My conclusion is that this question is not appropriate for answering via a forum.
Thanks to the experts for the thoughts and time spent on this one.
Case closed.
I suggest rather than giving a C to me alone, you should split the points to everyone, I suggest a higher grade too
Split because others contributed to the answer.
Higher grade because, while you didn't get the answer you wanted, we did answer the question.
If you think a penalty grade is in order, please explain how we failed.
Split because others contributed to the answer.
Higher grade because, while you didn't get the answer you wanted, we did answer the question.
If you think a penalty grade is in order, please explain how we failed.
ASKER
The answer to this initial question:
"What parameter should we tweak/tune to get rid of the "76807.81 Total Waited" in 2,5 hours?"
Is:
Install oracle bugfix for the CBO in Oracle 9i
"What parameter should we tweak/tune to get rid of the "76807.81 Total Waited" in 2,5 hours?"
Is:
Install oracle bugfix for the CBO in Oracle 9i
The database is waiting on the client.
The most common cause of it is "people time" or "think time"
If I connect to the database, pull some data back to my screen and then spend 10 minutes reading it you'll accumulate 600 seconds of SQL*Net message from client, because the database is idle and is simply waiting on me to do something.
It can also happen if the client is slow to do processing.
And it can happen in loops, where you pull one row from the database do something to it then pull the next row and do something to it. Each one action might be fast, but if you do it 144 thousand times the accumulation of all those tiny pauses can be significant.
Given your "to" and "from" wait counts it looks like your application probably works in a loop like that.
The database is able to send data to your app very quickly, so even though you sent data from the db to the client 144K times, your client then paused to do something with that data, most of the time that pause was tiny, with a few spikes, one of those spikes was 29 seconds. So, in general the client processed each trip quickly but it did it 144675 times.
If you can transmit your data in one large chunk or at least a few large chunks that will help. Each individual wait will be longer but you won't have to do thousands and thousands of handshaking steps.