Solved

Oracle DB performance impact due to swapping

Posted on 2013-06-14
5
517 Views
Last Modified: 2013-06-20
Hi Gurus,

I have an 11202 DB running on RHEL 5.5 with 32G RAM and 16 CPUs,recently one of the our prod db started seeing slowness and was not even able to log on to the DB server.
 

DB server has two db's with SGA_MAX_SIZE=12G and PGA 1,602.9 MB ,no AMM its all manual.

Top Output during slowness:

top - 11:11:42 up 107 days,  9:47,  2 users,  load average: 30.43, 27.08, 18.62
Tasks: 937 total,   3 running, 934 sleeping,   0 stopped,   0 zombie
Cpu(s): 12.7%us, 10.5%sy,  0.0%ni, 42.3%id, 32.9%wa,  0.6%hi,  0.9%si,  0.0%st
Mem:  32947876k total, 24643800k used,  8304076k free,   393672k buffers
Swap: 33554424k total, 10501052k used, 23053372k free, 14716436k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6461 oracle    15   0 12.0g 562m 559m S 24.7  1.7   0:00.62 oraclestarprd (LOCAL=N
 6524 oracle    15   0 12.0g 374m 360m S 22.8  1.2   0:13.90 oraclestarprd (LOCAL=N
28371 oracle    15   0 12.0g 1.3g 1.3g S 18.9  4.2   0:15.18 oraclestarprd (LOCAL=N
 7501 oracle    15   0 12.0g 250m 245m S 16.9  0.8   0:10.40 oraclestarprd (LOCAL=N

Open in new window


vmstat o/p:

zzz ***Mon Jun 10 11:00:32 EDT 2013
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2 19 10867280 11304740 392508 13840024    2    2    27     8    0    0  6  1 93  0  0
 2 27 10867112 11287496 392504 13840040  760 3336   808  3485 3651 5119  4 11 37 47  0
 2 21 10867016 11296792 392504 13842724  368 5664  2427  6513 3203 4555  4  8 36 52  0
zzz ***Mon Jun 10 11:01:07 EDT 2013
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 4 19 10797688 11266820 392176 13874740    2    2    27     8    0    0  6  1 93  0  0
 1 19 10794504 11248248 392188 13875948 9556    0  9586   280 2446 8382  6 11 53 31  0
24 23 10784216 11243788 392180 13880396 9912    0 10076   204 3259 10334 22 11 33 34  0
zzz ***Mon Jun 10 11:01:42 EDT 2013
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3 17 10680348 10977856 392216 13969504    2    2    27     8    0    0  6  1 93  0  0
 6 16 10676020 10972060 392216 13972020 7556    0  7571   120 2189 8740 15 10 37 38  0
12 21 10673760 10944348 392212 13973796 7340    0  7344   232 2709 9836 16  9 42 32  0
zzz ***Mon Jun 10 11:02:16 EDT 2013	
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 5 21 10569892 10812588 392256 14059872    2    2    27     8    0    0  6  1 93  0  0
 1 20 10566772 10778756 392268 14063688 8048    8  8071   354 5512 10531 15 11 36 38  0
 1 20 10563268 10772288 392264 14068272 7720    8  7720   288 4976 9670  3  8 50 38  0

Open in new window


What is swapping exactly and What causes swapping ?How can i identify which process or query is causing curently or in the past ?

Does AWR report  identify sql statements causing swapping?

How can swapping be minimize.

Thanks
Syed
0
Comment
Question by:monto1
  • 3
  • 2
5 Comments
 
LVL 15

Expert Comment

by:Franck Pachot
ID: 39252727
Hi,
Basically, when there is not enough physical memory for all processes, then the OS will swap some processes memory to disk in order to free some physical memory.
When the process is active again, ot has to read its memory from disk.

The 'top' output is difficult to interpret here. You have free physical memory ('free' is immediatly free, 'buffers' and 'cached' can be reclaimed).
And you have a high load average but still a lot of CPU available.

When you have the pid of the Oracle processes, the you can check V$PROCESS and V$SESSION to check them. But when you already can't log to the system, that not easy...

AWR report has no information about sessions. You have SQL statements, but nothing that directly concerns memory. However, from the AWR SQL sections, the most cosuming ones are probably those that uses a lot of memory as well.

So I'm really not sure you experience swapping issue. And what puzzles me is that you say you have 16 CPU. How can use use less than 25% of CPU with a load average higher than the number of CPU ? Please give full information about CPU, CORES, virtualization,...


Regards,
Franck.
0
 

Author Comment

by:monto1
ID: 39253359
Thanks Sir,All i saw were oracle user processes at the time of slowness as top processes.



$ cat /proc/cpuinfo | egrep "processor" |wc -l
16

$ cat /proc/cpuinfo | egrep "cores"
cpu cores       : 4

Trying to get to the root cause of the slowness.What query could have caused the damaged.



Thanks
0
 
LVL 15

Expert Comment

by:Franck Pachot
ID: 39254130
The number of physical CPU is given by cat /proc/cpuinfo | egrep "physical id" |wc -l
and the number of cores by cat /proc/cpuinfo | egrep "core id" |wc -l

>> What query could have caused the damaged.
From AWR I think that the SQL ordered by CPU will give you that
you can post the AWR report if you want.
0
 

Author Comment

by:monto1
ID: 39254330
$ cat /proc/cpuinfo | egrep "physical id" |wc -l
16
$ cat /proc/cpuinfo | egrep "core id" |wc -l
16
$

This is the for the same issue that i had created a thread "http://www.experts-exchange.com/Database/Oracle/Q_28150392.html".

I had created a SR and Oracle said that the root cause is swapping ,we have failed-over to another node in the veritas cluster with bigger RAM(64GB) and i increased the Shared_pool to 4G from 2G along with session_cached_cursors=100, since then we are not seeing any concurrency issues atleast for last 3 days where as this was happening everyday earlier.

Thanks
0
 
LVL 15

Accepted Solution

by:
Franck Pachot earned 500 total points
ID: 39255480
Ok. Then probably increasing shared pool has helped. And maybe having more RAM has helped as well.

About swapping, maybe linux experts can give some other information.

From what is there (the top output), I can't say that you experienced a high swapping issue.

But the numbers here may be misleading: 'top' show averages, samples over time, but you have an activity of very quick events (huge rate of connections - creating processes and allocating memory). The statistics shown by 'top' don't catch that.
I see the same about CPU: CPU usage seems to be low but load average is far over available CPU.

It's very difficult to catch relevant information about sessions coming and going quickly. The same in AWR. If statements come and go from shared pool quickly they won't be catched by AWR.

To have a clear idea about swapping, I usually use 'pmap' on all processes and add memory (we need pmap so that we don't add hared memory several times). But if you have a lot of process creation and memory allocation activity you can't catch something consistent.
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Join & Write a Comment

Suggested Solutions

Why doesn't the Oracle optimizer use my index? Querying too much data Most Oracle developers know that an index is useful when you can use it to restrict your result set to a small number of the total rows in a table. So, the obvious sideā€¦
From implementing a password expiration date, to datatype conversions and file export options, these are some useful settings I've found in Jasper Server.
This video explains at a high level with the mandatory Oracle Memory processes are as well as touching on some of the more common optional ones.
Via a live example, show how to restore a database from backup after a simulated disk failure using RMAN.

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now