apunkabollywood
asked on
Need Urgent Help on Performance issue having Oracle DB on RHEL 5.2
The situation is critical today as an application. All the application users are disappointed by the performances and are not able to work with it. The application is not stable:
- the performance is very low.
- the system is very slow and freezing all the time
- some time application crashing or user not able to login
We have three server:
1. Prod Server(RHEL 5.2)
2. App Prod server(RHEL 5.2)
3. DB Server(RHEL 5.2)
Things tried:
1. Raise up memory on application system
2. Kernel parameters changes
3. Increasing the swap space on Application Server
4. vm.nr_hugepages setting to be changed on prod server
5. Set up the memlock user limit
6. Upgrade omnivision version on all 3 servers to 7.21
7. Clear the Alert Log and diagnostic files on the Preprod server
8. Extend disk space from 3.4 GB to 10 GB on Prod DB server
Linux platform with Oracle databases mounted
still not able to diagnose the root cause and the troubleshooting continues. Till then the temporary workaround is to bounce the DB and reboot the server after 10-15 days. Need your help in:
1. Server performance needs to be looked into.
2. Network Communication between (Prod DB) and (Prod Apps) needs to be checked.
Need your help to diagnose the ROOT CAUSE and resolve this issue.
Pls check the logs attached with this case:
recommendations.txt
- the performance is very low.
- the system is very slow and freezing all the time
- some time application crashing or user not able to login
We have three server:
1. Prod Server(RHEL 5.2)
2. App Prod server(RHEL 5.2)
3. DB Server(RHEL 5.2)
Things tried:
1. Raise up memory on application system
2. Kernel parameters changes
3. Increasing the swap space on Application Server
4. vm.nr_hugepages setting to be changed on prod server
5. Set up the memlock user limit
6. Upgrade omnivision version on all 3 servers to 7.21
7. Clear the Alert Log and diagnostic files on the Preprod server
8. Extend disk space from 3.4 GB to 10 GB on Prod DB server
Linux platform with Oracle databases mounted
still not able to diagnose the root cause and the troubleshooting continues. Till then the temporary workaround is to bounce the DB and reboot the server after 10-15 days. Need your help in:
1. Server performance needs to be looked into.
2. Network Communication between (Prod DB) and (Prod Apps) needs to be checked.
Need your help to diagnose the ROOT CAUSE and resolve this issue.
Pls check the logs attached with this case:
recommendations.txt
Things to try:
1. Run an AWR report (awrrpt.sql)
2. Some systems run AWR reports on a schedule. If you have an AWR report from a period where performance was good, use awrddrpt.sql to compare the AWR reports to find the differences.
3. A high db file sequential read in the AWR report = poor/missing indexes, and/or outdated db stats
4. Is fragmentation a problem? Any table that has a fragmentation count that is more than 5% of the # of records in the table will impact performance
select * from all_tables where chain_cnt > 0
5. What is the "Out Of Order Ratio %" for your largest tables? Run this, replacing <your_table_name> with the name of each of your largest tables.
If the value of "out of order ratio %" > 20 % you need to follow the document 1085012.1 Reording Columns in Oracle For Demantra Performance
1. Run an AWR report (awrrpt.sql)
2. Some systems run AWR reports on a schedule. If you have an AWR report from a period where performance was good, use awrddrpt.sql to compare the AWR reports to find the differences.
3. A high db file sequential read in the AWR report = poor/missing indexes, and/or outdated db stats
4. Is fragmentation a problem? Any table that has a fragmentation count that is more than 5% of the # of records in the table will impact performance
select * from all_tables where chain_cnt > 0
5. What is the "Out Of Order Ratio %" for your largest tables? Run this, replacing <your_table_name> with the name of each of your largest tables.
select (round(((select count(*) as cnt
from (select item_id, location_id, sales_date, relative_fno, block_number, row_number, data_row,
(lag(data_row) over(partition by relative_fno, block_number order by row_number)) as prev_data_row
from (select item_id, location_id, sales_date
,relative_fno, block_number, row_number
,(dense_rank() over(partition by relative_fno, block_number order by item_id, location_id, sales_date)) as data_row
from (select item_id, location_id, sales_date
,dbms_rowid.rowid_relative_fno(rowid) relative_fno
,dbms_rowid.rowid_block_number(rowid) as block_number
,dbms_rowid.rowid_row_number(rowid) as row_number
from <your_table_name>
) c
) b
) a
where data_row != prev_data_row
and data_row != prev_data_row + 1) / (select count(*) from sales_data)),3)*100) as "out of order ratio %"
from dual;
If the value of "out of order ratio %" > 20 % you need to follow the document 1085012.1 Reording Columns in Oracle For Demantra Performance
Oracle Tuning Guide can be found at the following location:
https://access.redhat.com/sites/default/files/oracle_tuning_guide_1-1_11-28-2007.pdf
https://access.redhat.com/sites/default/files/oracle_tuning_guide_1-1_11-28-2007.pdf
ASKER
Please check the below facts from DB server and suggest are these right or wrong:
Page cache :
pagecahe 100 (Current)
Default
1 10 15
DB Server setting
ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 1000baseT/Full
Supports auto-negotiation: No
Advertised link modes: Not reported
Advertised auto-negotiation: No
Speed: Unknown! (10000)
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: off
Supports Wake-on: uag
Wake-on: d
Link detected: yes
Prod Server settings
ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000007 (7)
Link detected: yes
On DB Server
less /proc/meminfo
MemTotal: 32960172 kB
MemFree: 9254140 kB
Buffers: 1471736 kB
Cached: 399360 kB
SwapCached: 0 kB
Active: 1626732 kB
Inactive: 468776 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 32960172 kB
LowFree: 9254140 kB
SwapTotal: 25165816 kB
SwapFree: 25165816 kB
Dirty: 48 kB
Writeback: 0 kB
AnonPages: 244324 kB
Mapped: 55356 kB
Slab: 135700 kB
PageTables: 13804 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 36400972 kB
Committed_AS: 648460 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 276452 kB
VmallocChunk: 34359461623 kB
HugePages_Total: 5122
HugePages_Free: 49
HugePages_Rsvd: 48
Hugepagesize: 2048 kB
oracle : PPRVDB1 ]> ulimit -l
30000000
less /proc/sys/vm/swappiness
60
Request you all to plss suggest kernel paramneters or oracle tuning how to check and increase performance of DB server
Thanks
Page cache :
pagecahe 100 (Current)
Default
1 10 15
DB Server setting
ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 1000baseT/Full
Supports auto-negotiation: No
Advertised link modes: Not reported
Advertised auto-negotiation: No
Speed: Unknown! (10000)
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: off
Supports Wake-on: uag
Wake-on: d
Link detected: yes
Prod Server settings
ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000007 (7)
Link detected: yes
On DB Server
less /proc/meminfo
MemTotal: 32960172 kB
MemFree: 9254140 kB
Buffers: 1471736 kB
Cached: 399360 kB
SwapCached: 0 kB
Active: 1626732 kB
Inactive: 468776 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 32960172 kB
LowFree: 9254140 kB
SwapTotal: 25165816 kB
SwapFree: 25165816 kB
Dirty: 48 kB
Writeback: 0 kB
AnonPages: 244324 kB
Mapped: 55356 kB
Slab: 135700 kB
PageTables: 13804 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 36400972 kB
Committed_AS: 648460 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 276452 kB
VmallocChunk: 34359461623 kB
HugePages_Total: 5122
HugePages_Free: 49
HugePages_Rsvd: 48
Hugepagesize: 2048 kB
oracle : PPRVDB1 ]> ulimit -l
30000000
less /proc/sys/vm/swappiness
60
Request you all to plss suggest kernel paramneters or oracle tuning how to check and increase performance of DB server
Thanks
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you so much Sdstuber and Bigschmuh for you valuable advice - I have collected AWR report from DB server - but not much good to analyse that kindly look and give your suggestion accordingly.attached with this comment
report.doc
report.doc
Appart from the 3 SQL starting with "SELECT oo.object_id FROM nc_ob..." shown in "SQL ordered by gets" which query plan should be reviewed as they are issuing about 16300 gets per execution, I don't see any real problem in the report.
Although, the "Operating system statistics" clearly shows that :
-Too much CPU on this db server is available (IDLE_TIME is huge against others)
-IO wait time looks acceptable
On the application server, did you check the CPU and swap levels ?
Although, the "Operating system statistics" clearly shows that :
-Too much CPU on this db server is available (IDLE_TIME is huge against others)
-IO wait time looks acceptable
On the application server, did you check the CPU and swap levels ?
ASKER
Here is the output: App Server
var/log/syslog/message.log | grep error
snmpd[18889]: error on subcontainer '' insert (-1)
snmpd[32587]: error on subcontainer '' insert (-1)
Jul 22 06:19:43 appserver snmpd[3431]: warning: /etc/hosts.allow, line 8: can't verify hostname: getaddrinfo(appserver, AF_INET) failed
Jul 22 06:20:08 appserver snmpd[3431]: warning: /etc/hosts.allow, line 8: can't verify hostname: getaddrinfo(appserver) failed
Jul 22 06:20:10 appserver snmpd[3431]: warning: /etc/hosts.allow, line 8: can't verify hostname: getaddrinfo(appserver) failed
Kernel Logs:
Jun 2 03:23:27 appserver kernel: [<ffffffff8005dde9>] error_exit+0x0/0x84
Jun 5 03:14:17 appserver kernel: [<ffffffff8005dde9>] error_exit+0x0/0x84
Jun 8 03:15:33 appserver kernel: [<ffffffff8005dde9>] error_exit+0x0/0x84
Jun 21 11:44:27 appserver kernel: omv_mng[11997]: segfault at 0000000000000000 rip 0000000008082478 rsp 00000000ffee93a4 error 4
Jul 29 03:21:50 appserver kernel: [<ffffffff8005dde9>] error_exit+0x0/0x84
Jul 29 03:21:52 appserver kernel: [<ffffffff8005dde9>] error_exit+0x0/0x84
kernel: Kernel logging (proc) stopped.
kernel: Kernel logging (proc) stopped.
nel: Kernel logging (proc) stopped.
nel: Kernel logging (proc) stopped.
nel: [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x 1ff
sce kernel: Kernel logging (proc) stopped.
sce kernel: [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x 1ff
sce kernel: Kernel logging (proc) stopped.
sce kernel: Kernel logging (proc) stopped.
acpiphp: Slot [260] registered
acpiphp: Slot [261] registered
acpiphp: Slot [262] registered
acpiphp: Slot [263] registered
VMware memory control driver initialized
e1000: eth0: e1000_set_tso: TSO is Enabled
e1000: eth1: e1000_set_tso: TSO is Enabled
e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
e1000: eth1: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
Vendor: VMware Model: Virtual disk Rev: 1.0
Type: Direct-Access ANSI SCSI revision: 02
target0:0:2: Beginning Domain Validation
target0:0:2: Domain Validation skipping write tests
target0:0:2: Ending Domain Validation
target0:0:2: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 127)
SCSI device sdc: 20971520 512-byte hdwr sectors (10737 MB)
sdc: Write Protect is off
sdc: cache data unavailable
sdc: assuming drive cache: write through
SCSI device sdc: 20971520 512-byte hdwr sectors (10737 MB)
sdc: Write Protect is off
sdc: cache data unavailable
sdc: assuming drive cache: write through
sdc: unknown partition table
sd 0:0:2:0: Attached scsi disk sdc
sd 0:0:2:0: Attached scsi generic sg2 type 0
SCSI device sdc: 20971520 512-byte hdwr sectors (10737 MB)
sdc: Write Protect is off
sdc: cache data unavailable
sdc: assuming drive cache: write through
sdc: sdc1
SCSI device sdc: 20971520 512-byte hdwr sectors (10737 MB)
sdc: Write Protect is off
sdc: cache data unavailable
sdc: assuming drive cache: write through
sdc: sdc1
free -g
total used free shared buffers cached
Mem: 7 1 6 0 0 0
-/+ buffers/cache: 0 6
Swap: 3 1 2
Linux 2.6.18-164.2.1.el5 (app-server)
avg-cpu: %user %nice %system %iowait %steal %idle
0.35 0.00 0.69 0.39 0.00 98.57
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 6.49 98.49 120.63 152738948 187075862
sda1 0.00 0.02 0.00 31564 906
sda2 6.49 98.47 120.63 152705504 187074956
sdb 1.04 21.40 13.14 33182340 20371901
dm-0 2.66 9.03 20.34 14011266 31549472
dm-1 0.82 7.54 6.02 11698282 9341416
dm-2 0.00 0.00 0.00 3838 2124
dm-3 0.00 0.00 0.00 6594 6552
dm-4 2.94 9.02 22.61 13988026 35061568
dm-5 0.91 4.62 5.46 7157226 8466880
dm-6 1.86 0.20 3.71 304350 5759042
dm-7 0.00 0.01 0.00 10666 2252
dm-8 4.62 20.24 16.76 31389168 25986920
dm-9 0.11 2.96 0.23 4593410 351992
dm-10 0.00 0.00 0.01 4744 9266
dm-11 0.19 6.55 0.43 10154618 665896
dm-12 10.42 38.29 45.06 59377344 69871576
dm-13 1.97 13.72 11.80 21283098 18293405
dm-14 0.28 7.67 1.34 11889810 2078496
avg-cpu: %user %nice %system %iowait %steal %idle
1.95 0.00 1.95 4.88 0.00 91.22
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 24.51 1505.88 0.00 1536 0
sda1 0.00 0.00 0.00 0 0
sda2 24.51 1505.88 0.00 1536 0
sdb 0.00 0.00 0.00 0 0
dm-0 1.96 23.53 0.00 24 0
dm-1 0.00 0.00 0.00 0 0
dm-2 0.00 0.00 0.00 0 0
dm-3 0.00 0.00 0.00 0 0
dm-4 0.00 0.00 0.00 0 0
dm-5 27.45 1443.14 0.00 1472 0
dm-6 0.00 0.00 0.00 0 0
dm-7 0.00 0.00 0.00 0 0
dm-8 0.00 0.00 0.00 0 0
dm-9 0.00 0.00 0.00 0 0
dm-10 0.00 0.00 0.00 0 0
dm-11 0.00 0.00 0.00 0 0
dm-12 0.00 0.00 0.00 0 0
APP.pdf
var/log/syslog/message.log
snmpd[18889]: error on subcontainer '' insert (-1)
snmpd[32587]: error on subcontainer '' insert (-1)
Jul 22 06:19:43 appserver snmpd[3431]: warning: /etc/hosts.allow, line 8: can't verify hostname: getaddrinfo(appserver, AF_INET) failed
Jul 22 06:20:08 appserver snmpd[3431]: warning: /etc/hosts.allow, line 8: can't verify hostname: getaddrinfo(appserver) failed
Jul 22 06:20:10 appserver snmpd[3431]: warning: /etc/hosts.allow, line 8: can't verify hostname: getaddrinfo(appserver) failed
Kernel Logs:
Jun 2 03:23:27 appserver kernel: [<ffffffff8005dde9>] error_exit+0x0/0x84
Jun 5 03:14:17 appserver kernel: [<ffffffff8005dde9>] error_exit+0x0/0x84
Jun 8 03:15:33 appserver kernel: [<ffffffff8005dde9>] error_exit+0x0/0x84
Jun 21 11:44:27 appserver kernel: omv_mng[11997]: segfault at 0000000000000000 rip 0000000008082478 rsp 00000000ffee93a4 error 4
Jul 29 03:21:50 appserver kernel: [<ffffffff8005dde9>] error_exit+0x0/0x84
Jul 29 03:21:52 appserver kernel: [<ffffffff8005dde9>] error_exit+0x0/0x84
kernel: Kernel logging (proc) stopped.
kernel: Kernel logging (proc) stopped.
nel: Kernel logging (proc) stopped.
nel: Kernel logging (proc) stopped.
nel: [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x
sce kernel: Kernel logging (proc) stopped.
sce kernel: [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x
sce kernel: Kernel logging (proc) stopped.
sce kernel: Kernel logging (proc) stopped.
acpiphp: Slot [260] registered
acpiphp: Slot [261] registered
acpiphp: Slot [262] registered
acpiphp: Slot [263] registered
VMware memory control driver initialized
e1000: eth0: e1000_set_tso: TSO is Enabled
e1000: eth1: e1000_set_tso: TSO is Enabled
e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
e1000: eth1: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
Vendor: VMware Model: Virtual disk Rev: 1.0
Type: Direct-Access ANSI SCSI revision: 02
target0:0:2: Beginning Domain Validation
target0:0:2: Domain Validation skipping write tests
target0:0:2: Ending Domain Validation
target0:0:2: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 127)
SCSI device sdc: 20971520 512-byte hdwr sectors (10737 MB)
sdc: Write Protect is off
sdc: cache data unavailable
sdc: assuming drive cache: write through
SCSI device sdc: 20971520 512-byte hdwr sectors (10737 MB)
sdc: Write Protect is off
sdc: cache data unavailable
sdc: assuming drive cache: write through
sdc: unknown partition table
sd 0:0:2:0: Attached scsi disk sdc
sd 0:0:2:0: Attached scsi generic sg2 type 0
SCSI device sdc: 20971520 512-byte hdwr sectors (10737 MB)
sdc: Write Protect is off
sdc: cache data unavailable
sdc: assuming drive cache: write through
sdc: sdc1
SCSI device sdc: 20971520 512-byte hdwr sectors (10737 MB)
sdc: Write Protect is off
sdc: cache data unavailable
sdc: assuming drive cache: write through
sdc: sdc1
free -g
total used free shared buffers cached
Mem: 7 1 6 0 0 0
-/+ buffers/cache: 0 6
Swap: 3 1 2
Linux 2.6.18-164.2.1.el5 (app-server)
avg-cpu: %user %nice %system %iowait %steal %idle
0.35 0.00 0.69 0.39 0.00 98.57
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 6.49 98.49 120.63 152738948 187075862
sda1 0.00 0.02 0.00 31564 906
sda2 6.49 98.47 120.63 152705504 187074956
sdb 1.04 21.40 13.14 33182340 20371901
dm-0 2.66 9.03 20.34 14011266 31549472
dm-1 0.82 7.54 6.02 11698282 9341416
dm-2 0.00 0.00 0.00 3838 2124
dm-3 0.00 0.00 0.00 6594 6552
dm-4 2.94 9.02 22.61 13988026 35061568
dm-5 0.91 4.62 5.46 7157226 8466880
dm-6 1.86 0.20 3.71 304350 5759042
dm-7 0.00 0.01 0.00 10666 2252
dm-8 4.62 20.24 16.76 31389168 25986920
dm-9 0.11 2.96 0.23 4593410 351992
dm-10 0.00 0.00 0.01 4744 9266
dm-11 0.19 6.55 0.43 10154618 665896
dm-12 10.42 38.29 45.06 59377344 69871576
dm-13 1.97 13.72 11.80 21283098 18293405
dm-14 0.28 7.67 1.34 11889810 2078496
avg-cpu: %user %nice %system %iowait %steal %idle
1.95 0.00 1.95 4.88 0.00 91.22
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 24.51 1505.88 0.00 1536 0
sda1 0.00 0.00 0.00 0 0
sda2 24.51 1505.88 0.00 1536 0
sdb 0.00 0.00 0.00 0 0
dm-0 1.96 23.53 0.00 24 0
dm-1 0.00 0.00 0.00 0 0
dm-2 0.00 0.00 0.00 0 0
dm-3 0.00 0.00 0.00 0 0
dm-4 0.00 0.00 0.00 0 0
dm-5 27.45 1443.14 0.00 1472 0
dm-6 0.00 0.00 0.00 0 0
dm-7 0.00 0.00 0.00 0 0
dm-8 0.00 0.00 0.00 0 0
dm-9 0.00 0.00 0.00 0 0
dm-10 0.00 0.00 0.00 0 0
dm-11 0.00 0.00 0.00 0 0
dm-12 0.00 0.00 0.00 0 0
APP.pdf
The server looks like a vm, is the esx host busy?
ASKER
yes its a VM not a hardware server..
ASKER
One more thing want to add:
Is it possible CPU in the reports and logs shows is diffrent from CPU utilization shows of VM Machines on V Center --it shows they are utilizing 100% cpu ? :(
Is it possible CPU in the reports and logs shows is diffrent from CPU utilization shows of VM Machines on V Center --it shows they are utilizing 100% cpu ? :(
Do you have VMWare tools installed.
You may suffer from a CPU over-provisioning situation.
When a VM need 4 vCPU, it waits for 4 fully available cores before sending control to the VM.
For example, a db server with 4 vCPU using less than 20% cpu charge would get a real latency bonus by running one 1 vCPU only.
I googled this "5 Big Considerations for vCPU Provisioning" to get you some feedback about this.
When a VM need 4 vCPU, it waits for 4 fully available cores before sending control to the VM.
For example, a db server with 4 vCPU using less than 20% cpu charge would get a real latency bonus by running one 1 vCPU only.
I googled this "5 Big Considerations for vCPU Provisioning" to get you some feedback about this.
Few things;
Are you also running eBusiness Suite?
Have you recently upgraded from 10g to 11g?
If yes then;
Verify your hidden parameters (They start with "_". There are five parameters that can affect your performance to a dead stop speed. Out of all of them _sort_elimiation_cost_rati o parameter has biggest impact on performance.
_b_tree_bitmap_plans boolean FALSE
_fast_full_scan_enabled boolean FALSE
_index_join_enabled boolean FALSE
_like_with_bind_as_equalit y boolean TRUE
_optimizer_autostats_job boolean FALSE
_sort_elimination_cost_rat io integer 5
_system_trig_enabled boolean TRUE
Are you also running eBusiness Suite?
Have you recently upgraded from 10g to 11g?
If yes then;
Verify your hidden parameters (They start with "_". There are five parameters that can affect your performance to a dead stop speed. Out of all of them _sort_elimiation_cost_rati
_b_tree_bitmap_plans boolean FALSE
_fast_full_scan_enabled boolean FALSE
_index_join_enabled boolean FALSE
_like_with_bind_as_equalit
_optimizer_autostats_job boolean FALSE
_sort_elimination_cost_rat
_system_trig_enabled boolean TRUE
Start at the top of the SQL lists and work your way down.
Here's a few of the big hitters.
guyt7vgmcqbxh - took over an hour to finish - can this be optimized? or as suggested above, run at a different time?
c52vystffk6mf - this ran 11,093 times, each one was fast, but cumulatively it consumed about 1/8 of all cpu - can this be run fewer times? or, can it be tuned to be lighter weight? Knock a tenth of a second off it and you'll get 10 minutes back
9rrat0ynrdpbt - this consumed 70% of all Gets in your 6 hour window and took over 18 minutes.
With a 6 hour window it's hard to say exactly what was bad when. The three candidates I listed above might not be a problem at all. If they ran sequentially and finished in the first couple hour and then people had problems after them then they working on them won't help.
Try generating a report for a much smaller window. 10-15 minutes usually works best.
Here's a few of the big hitters.
guyt7vgmcqbxh - took over an hour to finish - can this be optimized? or as suggested above, run at a different time?
c52vystffk6mf - this ran 11,093 times, each one was fast, but cumulatively it consumed about 1/8 of all cpu - can this be run fewer times? or, can it be tuned to be lighter weight? Knock a tenth of a second off it and you'll get 10 minutes back
9rrat0ynrdpbt - this consumed 70% of all Gets in your 6 hour window and took over 18 minutes.
With a 6 hour window it's hard to say exactly what was bad when. The three candidates I listed above might not be a problem at all. If they ran sequentially and finished in the first couple hour and then people had problems after them then they working on them won't help.
Try generating a report for a much smaller window. 10-15 minutes usually works best.
ASKER
Hi All,
Thanks all for your valuable advice:
@bigscumuh - I am planing to increase no of cpu from 4 to 8 but still not sure is this the only problem because vm showing 100% on monitoring tool on vmware but inside cpu utilization okay as u have seen logs.
@ Mike_omDBA - I am checking all status of cpu and memory continous from past 3 weeks but all three cpu utilization same 100% as per vm monitoring tools
@ kparikh001 - no its a new build 6 mnth before installed
@ sdstuber - will pull out and paste it soon
Some more points to look:
1. Some strange error on weblogic :
<Apr 2, 2012 3:05:24 PM CEST> <Error> <Server> <BEA-002608> <The ListenThread because of an error: java.lang.OutOfMemoryError : Java heap space
java.lang.OutOfMemoryError : Java heap space
2. No of active user max by 5
Thanks all for your valuable advice:
@bigscumuh - I am planing to increase no of cpu from 4 to 8 but still not sure is this the only problem because vm showing 100% on monitoring tool on vmware but inside cpu utilization okay as u have seen logs.
@ Mike_omDBA - I am checking all status of cpu and memory continous from past 3 weeks but all three cpu utilization same 100% as per vm monitoring tools
@ kparikh001 - no its a new build 6 mnth before installed
@ sdstuber - will pull out and paste it soon
Some more points to look:
1. Some strange error on weblogic :
<Apr 2, 2012 3:05:24 PM CEST> <Error> <Server> <BEA-002608> <The ListenThread because of an error: java.lang.OutOfMemoryError
java.lang.OutOfMemoryError
2. No of active user max by 5
I was suggesting to DOWNGRADE vCPU for the dbserver from 4 to 1 to avoid the latency ... not to increase it to 8 !!!
Pls check to make sure VMWare tools are installed, I didn't see your response
ASKER
Yes they vmware tools are installed
ASKER
Please check the lates hour report for DB servers and advice something to improve performance after upgarding 2 to 8 vcpus cpu utlization decrease to 70% but still not upto the mark
AWR-1-hour.doc
AWR-1-hour.doc
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
@sdstuber:Perhaps there is no index on CUSTOMER_NAME and it's doing FTS. :p
Start at the top of the SQL lists and work your way down.
delete from PRV_DISCREPANCY_RESULTS where CUSTOMER_NAME = :1
why is that so expensive?
The rest all points to un-optimized SQL.
Again, try setting parameter _sort_elimination_cost_rat io integer 5
This is a dynamic parameter and help your query find better execution plan.
This is a dynamic parameter and help your query find better execution plan.
ASKER
Hello SDBUSTER , Please explain me it in details and @Mike I didnt understand your point ? please explain
I don't have any details to explain.
Look at the awr reports you posted. In them there are lists of the top queries by time, by cpu, by io etc. I've already identified a few of them. Pick one and examine it. Why does take so long?
MikeOM_DBA took a guess at one of them, maybe you don't have an index. awr reports are only summaries we can't tell you specifically what is a problem or why. You need to do some digging on your end and post what you find.
what are the explain plans for the expensive operations? for the ones that execute multiple times, do they need to? confirm that the expensive ones are actually a problem. Did they run when people were having problems or not? Also, if they were slow, are they culprit, or the victim of something else?
Look at the awr reports you posted. In them there are lists of the top queries by time, by cpu, by io etc. I've already identified a few of them. Pick one and examine it. Why does take so long?
MikeOM_DBA took a guess at one of them, maybe you don't have an index. awr reports are only summaries we can't tell you specifically what is a problem or why. You need to do some digging on your end and post what you find.
what are the explain plans for the expensive operations? for the ones that execute multiple times, do they need to? confirm that the expensive ones are actually a problem. Did they run when people were having problems or not? Also, if they were slow, are they culprit, or the victim of something else?
ASKER
Thnak u all
Have you been collecting performance stats using ps -eaf/atop/iostat/vmstat/sa
Is the database the bottleneck? I'm not sure what tools oracle has to monitor table performance... it might be you need to look at index performance.
If it's not the database then look at memory/disk/network atop can record the stats over a number of days and you can replay them to find hotspots.
Performance is one of those things that can be quite subjective. You need to be collecting data and acting on the data, not just blindly adding memory/disk.
We will need more information, you can post ps output or attach atop files (if you have them) and we might be able to have a look.