Solved

Urgent Oracle 11g DBMS_ALERT CPU spike issue on production server

Posted on 2011-03-18
29
2,038 Views
Last Modified: 2013-11-11
This morning we performed an upgrade of a production system from Oracle 9i to Oracle 11g and are now seeing a HUGE performance hit from the following SQL commands:

INSERT INTO DBMS_ALERT_INFO VALUES (UPPER(:B2 ), :B1 , 'N', NULL)

begin DBMS_ALERT.Register('DISPINQ');
DBMS_ALERT.Register('CFL');
DBMS_ALERT.Register('COM1');
DBMS_ALERT.Register('COM2');
DBMS_ALERT.Register('COM3');
DBMS_ALERT.Register('COM4');
DBMS_ALERT.Register('CDEL');
DBMS_ALERT.Register('NODEL');
DBMS_ALERT.Register('DISPACK');
DBMS_ALERT.Register('E911');
DBMS_ALERT.Register('911OFF');
DBMS_ALERT.Register('COMINUSE');
DBMS_ALERT.Register('DISPOFF');
DBMS_ALERT.Register('COMOUT');
DBMS_ALERT.Register('TRAFTEXT');
DBMS_ALERT.Register('BLOT');
DBMS_ALERT.Register('MASTNAME');
DBMS_ALERT.Register('CF');
DBMS_ALERT.Register('CS');
DBMS_ALERT.Register('AS');
DBMS_ALERT.Register('US');
DBMS_ALERT.Register('EV');
DBMS_ALERT.Register('CD');
DBMS_ALERT.Register('ALEG');
DBMS_ALERT.Register('MASTCASE');
DBMS_ALERT.Register('NCRS');
DBMS_ALERT.Register('SHUT');
DBMS_ALERT.Register('SAPRS');
DBMS_ALERT.Register('UU');
DBMS_ALERT.Register('LRSP');
DBMS_ALERT.Register('FH_UPDATE');
DBMS_ALERT.Register('FH_INSERT');
DBMS_ALERT.Register('FH_DELETE');
DBMS_ALERT.Register('FU');
DBMS_ALERT.WaitAny(:Name, :Message, :Status, :TimeOut); end;

Before the migration, we had no problem with 20 or so users using the same application on 9i.  Now, with just 5 or 6 connected, the CPU on the server spikes at 100%.
0
Comment
Question by:CodySystems
  • 16
  • 6
  • 2
  • +2
29 Comments
 
LVL 2

Expert Comment

by:niaz
ID: 35166682
Have you tried updating stats after upgrade.
0
 

Author Comment

by:CodySystems
ID: 35166855
We did an export / import of the data from 9i to 11g.  What stats are you referring to?
0
 
LVL 2

Expert Comment

by:niaz
ID: 35167406
0
 

Author Comment

by:CodySystems
ID: 35167965
Could you explain what the optimizer statistics have to do with a DBMS_ALERT.register event causing the CPU to spike to 100% on the server?
0
 

Author Comment

by:CodySystems
ID: 35167996
Here is a screenshot illustrating the performance hit of these two SQL statements compared to the next most costly SQL.

oracle screenshot
0
 
LVL 16

Expert Comment

by:rolutola
ID: 35179979
In whose session is this proc running? Were certain objects registered for alerts in 9i?
If you identify the affected session, then there is a remove method of the procedure.

R.
0
 

Author Comment

by:CodySystems
ID: 35180654
The application is a content management application that we use to keep track of calls for service and whatnot.  Anytime anyone opens the application, it registers the above DBMS_ALERTS.  While the application is open, it periodically re-registers.  The problem isn't isolated with just one session or one user.  It happens whenever the application is used.

There have been no changes in the application between the 9i and 11g upgrade.  We did an export of the data and imported into 11g and immediately started seeing CPU issues on the server when users started to use the application.  A roll-back is not an option so we have to get it working on 11g.  I read a decent amount about the DBMS_ALERT package and have a good understanding of how it works but I am not familiar how to troubleshoot it for performance issues.   If you can guide me through the process or at least give me some SQL to run or some things to check, that would be very helpful.  This is our first experience with 11g.
0
 
LVL 29

Expert Comment

by:MikeOM_DBA
ID: 35180831
give me some SQL to run or some things to check

Follow niaz's and rotula's suggestion.
0
 

Author Comment

by:CodySystems
ID: 35181001
I don't understand what Rotula is asking.  I provided some additional information about the situation to try and answer his question about sessions.  Naiz made a blanket suggestion but didn't provide an explanation as to what it had to do with the DBMS_ALERT.register package.  This is a production system and I am new to 11g so I am not going to just run stuff without at least an explanation of how it is related to my problem.
0
 
LVL 16

Expert Comment

by:rolutola
ID: 35181227
I agree that rollback is not an option. We've all been there.

I strongly suspect that your application logs onto the database as a single user each time any application user (context user) runs the application. So the Alert must be set in the application user session.

Check your v$session and check for unique schemaname or something like that to show you the user the application is logging in as.

Then you may try to use the remove method of the DBMS_ALERT proc.

Hope this is clearer.

R.
0
 

Author Comment

by:CodySystems
ID: 35182270
Yes the DBMS_ALERT.register is being executed by the user's session.  The application uses DBMS_ALERTS to trigger certain events like checking tables for updates.

If I remove the alerts for these sessions, the application will not work correctly because it will no longer be registered to receive alerts.  What I am trying to understand is why it worked fine in 9i and does not in 11g.
0
 

Author Comment

by:CodySystems
ID: 35182521
There are roughly 500 or so rows in the DBMS_ALERT_INFO table.  
0
 
LVL 16

Expert Comment

by:rolutola
ID: 35187093
To cover all aspwcts, it is good practice to ensure all objects are VALID. HAve you done this?
Also, update statistics as already suggested.

Next, what makes you think what you're seeing is a 'performance hit'? are the users reporting unusual performance experience? There may not be a problem in fact.

R.
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 

Author Comment

by:CodySystems
ID: 35189382
Yes all objects are valid.  I have made sure that everything compiles properly.  Statistics have been updated as well.

On the old Oracle 9i server.  We had a single 2.4 Ghz processor, 1 GB of RAM and everyone (roughly 20 users) could use the program with no problems.  Average CPU usage on the server was between 2-15% with Oracle using about 500 MB of RAM.

On 11g, we have 4 3.6 GHz processors and 4 GB of RAM and only 3 users can use the application before it spikes to 100% across all four procesoors.  RAM usage is about 800 MB.

As you can see from the screenshots above, the two DBMS_ALERT SQL statements that are running are using a HUGE amount of CPU time.  Even with only 3 users in the system, they are complaining about the application "hanging" and generally being unresponsive at times.
0
 
LVL 16

Expert Comment

by:rolutola
ID: 35191844
I wonder if it's worth raising a tar with support on this since it is production and impacting on your users so profoundly.

R.
0
 

Author Comment

by:CodySystems
ID: 35207773
I created a TAR with support and have been working with them since Friday to try and resolve this.  I've spent close to 20 hours on the phone and just getting someone knowledgable who isn't going to tell me to do the same stuff over and over is what I need.  I've requested a Tier 3 performance tuning tech but haven't gotten a call yet.
0
 
LVL 2

Expert Comment

by:LeeMiller
ID: 35285842
Bug 9437010 from Metalink
0
 

Author Comment

by:CodySystems
ID: 35286002
The Oracle tech actually cited:  

Bug 6904068: HIGH CPU UTILIZATION DURING MUTEX WAIT EVENTS (10GR2 CURSORS)

He said that a patch was made available for most major operating systems EXCEPT for windows and that there is no time frame for this being resolved in Windows oracle.

I checked into Bug 9437010 and the Oracle tech working on the problem said that it was unrelated because we were not querying X$KGLOB.

Does anyone else have any input on a work around or potential SQL redesign for the SQL we are running.  This is a production system and we are dead in the water.
0
 
LVL 16

Expert Comment

by:rolutola
ID: 35291864
Clone to Linux platform.

Did you change machines when you upgraded to 11g?
Is there an option to go back to your 9i and then rebuild the new machines to Linux, and migrate the database to the new 11g on the Linux box?

If you didn't migrate to new machines then this process will be more involving.

R.
0
 

Author Comment

by:CodySystems
ID: 35320592
Cloning to Linux is not an option because our products do not run on Linux.  This production system is an internal use system, however we are a software vendor and this environment represents a testing bed for our products before they are released to production for our clients.

All of our clients utilize Windows based servers and Linux is simply not an option.

We did migrate to a new "machine".  Both our old system and new system are VMs.  The old VM has been archived, but now we have several weeks of data in 11g and a mandate from management that we are not to roll back to 9i.

Does anyone have specific experence with dealing with this issue or DBMS_ALERTS in general.  Perhaps there are additional things we can do programtically to adress the problem.  
0
 
LVL 16

Expert Comment

by:rolutola
ID: 35322736
This sounds like a major issue because of the lack of solution. I'm sorry I haven't experienced this issue so I can't help any further.
R.
0
 

Author Comment

by:CodySystems
ID: 35323533
The reason why rolling back to 9i cannot be a solution is because 9i does not install and is not supported on Windows 2008 server.  Clients are purchasing new hardware and OEM manufacturers are no longer including Windows 2003 pre-installed on them.  Furthermore, Oracle has stopped releasing CPU updates for Oracle 9i and be choosing to continue to force this database platform on clients, we also close the road for CPU updates if they are required to address security concerns.  We have been agressively pursuing this issue with Oracle but have gotten little to no support from them in the form of a patch to correct the issue on the Windows platform.
0
 

Author Comment

by:CodySystems
ID: 35483597
Oracle still has not released a patch for this issue.  Do any of the other Oracle DBAs on this site have experience with performance tuning as it relates specifically to DBMS_ALERTS?
0
 

Accepted Solution

by:
CodySystems earned 0 total points
ID: 35816187
Oracle has not released a patch for this issue and is really dragging their feet.  We did some hard tracing in the source code of the components we use and found a bug that included the DBMS_ALERT.register block at the beginning of this post to run on a timer every few seconds.  This created incredible UL contention on the DBMS_ALERT_INFO system table which then caused the CPU to spike with wait events every time the DMBS_ALERT.signal was invoked.  

In the end, we hired an Oracle performance tuning guy to help us sort it out.  
0
 

Author Comment

by:CodySystems
ID: 35816200
So in the end, there is definately a bug with oracle that made the problem worse since this was not an issue with 9i, but there was also some stuff we could do on our end at the component level that impacted the issue as well.
0
 

Author Closing Comment

by:CodySystems
ID: 35865608
We eventually figured this out on our own.  The "experts" that assisted with troubleshooting the issue were few and far between and did not provide much in terms of suggestions towards helping diagnose the issue. Our overall experience was poor at best with what was clearly an intensely difficult issue for us.
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

CCModeler offers a way to enter basic information like entities, attributes and relationships and export them as yEd or erviz diagram. It also can import existing Access or SQL Server tables with relationships.
Creating and Managing Databases with phpMyAdmin in cPanel.
This video shows how to copy a database user from one database to another user DBMS_METADATA.  It also shows how to copy a user's permissions and discusses password hash differences between Oracle 10g and 11g.
This video shows how to copy an entire tablespace from one database to another database using Transportable Tablespace functionality.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now