Link to home
Start Free TrialLog in
Avatar of Keithburnham
Keithburnham

asked on

After 3 errors in Oracle Enterprise Manager database access is very slow in certain cases

After an event last week on our Windows 2003 Oracle 10g server (10.2.0.1) we are experiencing very slow retrieval of certain data. Searches appear to be fine, but going into cases in our application can take 20- 30 secs instead of near instantaneous. The errors were as follows:

 Alert Log Generic Alert Log Error ORA-error stack ORA-error stack (07445[ACCESS_VIOLATION]) logged in E:\ORACLE\PRODUCT\10.1.0\ADMIN\ORCL\BDUMP\alert_orcl.log. 17-Nov-2010 15:48:15

 Alert Log Generic Alert Log Error ORA-error stack (00600[12761], 00604, 04030) logged in E:\ORACLE\PRODUCT\10.1.0\ADMIN\ORCL\BDUMP\alert_orcl.log. 17-Nov-2010 15:48:15

 Alert Log Generic Alert Log Error ORA-error stack (04030, 00600[12761], 00604, 04030) logged in E:\ORACLE\PRODUCT\10.1.0\ADMIN\ORCL\BDUMP\alert_orcl.log. 17-Nov-2010

Checking the ORA-errors it appears to be a memory or OS problem, but we haven't experience them since that time. Restarts of the server and addition of memory / disk checks have not helped things at all. I am an Oracle noob, so my analysis of the problem centered around trying to find errors in the database (which I'm told is very unlikely in Oracle).

Is there a sure fire way of error checking in this situation and do my errors look to be something fixable? This system is soon to be decomissioned and I am loathed to start up any more Oracle support as the system will be defuncy by the beginning of the new year. Many thanks in advance.
Avatar of jocave
jocave
Flag of United States of America image

It is exceptionally unlikely that a one-time memory error is causing performance problems a week later and following a restart.

It sounds like there is a single query that is slow and that the slowness is relatively new.  Do you know what SQL statement is slow?  If so, do you know how to generate the query plan for this statement?

What edition of Oracle are you using (standard or enterprise)?  Are you licensed to use the AWR?  If not, do you have Statspack installed?
Avatar of Keithburnham
Keithburnham

ASKER

What edition of Oracle are you using (standard or enterprise)?  Are you licensed to use the AWR?  If not, do you have Statspack installed?

It's enterprise edition. I have no idea what AWR is I'm afraid, is there a way I can check? I don't know about the Statspack either so I'm not much help for you.

The SQL statement could be this one:

Action Run SQL Tuning Advisor on the SQL statement with SQL_ID "gt9jtvd37nqar".
 SQL TextUPDATE DIARY SET CLASSTORUN=:1, DATETIMEDUE=:2, BYDIARY=:3, TITLE=:4, BODY=:5, S...
SQL IDgt9jtvd37nqar
 
 
Rationale SQL statement with SQL_ID "gt9jtvd37nqar" was executed 1947804 times and had an average elapsed time of 0.000048 seconds.

That didn't seem too bad to me, but I don't know what it's supposed to look like (again, I am a total noob for Oracle as you can tell).

As far as it being related to the memory message, it has been slow ever since that event (and yes it just could be a massive coincidence really).

thanks for the quick response!
AWR is the Automatic Workload Repository.  It is always installed in 10.2 but if you have not licensed the performance and tuning packs, you cannot access the data legally.

In your original message, you indicated that searches were slow.  I would tend to assume that a search would be doing primarily queries, not updates like the SQL statement you posted.  
As I said in the opening statement, searches appear to be the one function that operates fine; it's the opening of cases (which pulls back data from various tables) that appeared to be slow.

AWR I will have to look at, as Oracle was purchased in 2005 by the software provider.
Sorry, you're right.  I would still expect that opening a case would primarily involve issuing SELECT statements, though, not updates.  

Is this a client/ server application?  Or a three tier application?
It has Weblogic at the front end talking straight to a seperate Oracle server and that's it I think. There was a seperate workflow server that gets used for certain automation tasks but not on standard case retrieval.

It should be select statements I'd imagine, just grabbing data from the various areas to display on screen.
OK, but the end users are using a web browser to access the application, right?  That's going to make tracing the session to figure out the SQL statements that are slow problematic.

What does

SELECT dbms_stats.get_stats_history_retention()
  FROM dual

Open in new window


return?  Potentially, since opening cases was running well, you could try reverting the statistics that the optimizer uses back a day when performance was good (which sounds like it was the 17th, correct?).  Reverting the statistics has the potential to change query plans throughout the system so it is a bit dangerous from that standpoint.  But since we'd just be restoring old statistics, the risk that some plan will go catastrophically bad would be relatively low.  Is there a test system that has a similar data volume that we could test the effect of reverting statistics on?
I've got this in VMware so have an entire copy of the whole sstem to play with (what did we used to do?!?!?).

That statment returned 31 as a value.

What would I need to do as described? again, thanks for this as I've had no luck trying to search myself on the subject.
OK, cool.

The simplest thing that could possibly work would be to run

BEGIN
  dbms_stats.restore_schema_stats( <<schema name>>, systimestamp - interval '7' day, false, false );
END;

Open in new window


for each schema that the application uses (I'm guessing that there is probably only one schema that owns all the tables).  This will restore the object statistics in that schema to their state on the morning of the 15th.  This will invalidate all the query plans in the system, so the system will have to do some hard parsing initially.  But ideally, after warming up the statement cache again, you'll see performance similar to what you had on the 15th.
ERROR at line 2:
ORA-06550: line 2, column 36:
PLS-00201: identifier 'OCEAN_CALMS' must be declared
ORA-06550: line 2, column 3:
PL/SQL: Statement ignored

Getting this message. Do I simply remove <<schema name>> and replace with our schema name, or is it supposed to be in quotes or something similar?
It needs to be a quoted identifier, i.e.

BEGIN
  dbms_stats.restore_schema_stats( 'SCOTT', systimestamp - interval '7' day, false, false );
END;

Open in new window


to restore the statistics for the SCOTT schema.
AH yes, that did it. It appears to have ran successfully (PL/SQL procedure successfully completed) but the times are still quite high.

Oddly, with just me performing one case retrieval I see 30% CPU usage for the oracle service in the test environment. It then drops to nothing as soon as the case is retrieved.

The test Oracle server has 4GB RAM and 4 x CPUs (Xeon 5550s) and should be damn quick. This has got me totally stumped.
Also, I saw this appear on the test server an hour ago:

~Errors in file e:\oracle\product\10.1.0\admin\orcl\bdump\orcl_cjq0_3556.trc:~ORA-00604: error occurred at recursive SQL level 1~ORA-04031: unable to allocate 16 bytes of shared memory ("shared pool","select job, nvl2(last_date, ...","sql area","kglhin: temp")~
OK.  In the test environment, can you drill in to see what SQL statements are using so much CPU?  
SELECT DISTINCT TASKID FROM TASKINDEX WHERE TASKID IN (SELECT TASKID FROM TASKINDEX WHERE KEYNAME='wf-app-code' AND KEYVALUE='OM' AND TASKID IN (SELECT TASKID FROM TASKINDEX WHERE KEYNAME='a-case-number' AND KEYVALUE='70497' ))

This appears to be the biggest offender.
ACtually maybe this one as well, although this doesn't use as much time according to the report.
OK.  Then in SQL*Plus, can you run

SELECT /*+ gather_plan_statistics */ DISTINCT TASKID 
  FROM TASKINDEX 
 WHERE TASKID IN (
    SELECT TASKID 
      FROM TASKINDEX 
     WHERE KEYNAME='wf-app-code' 
       AND KEYVALUE='OM' 
       AND TASKID IN (
          SELECT TASKID 
            FROM TASKINDEX 
           WHERE KEYNAME='a-case-number' 
             AND KEYVALUE='70497' ));

SELECT *
  FROM TABLE( dbms_xplan.display_cursor( null, null, 'ALLSTATS LAST' ));

Open in new window


and post the output of the second statement?
 
ERROR at line 2:
ORA-00942: table or view does not exist

Apparently, it believes that the TASKINDEX table/view is wrong. Odd considering it's mentioned in the SQL statement I posted.
I've just checked and TASKINDEX appears in a different schema called OCEAN_CATALYST. Would that make a difference to the SQL statement above.
Are you logged in as the owner of the TaskIndex table?  If not, you'll probably have to prefix the table name with the schema name, i.e.

SELECT /*+ gather_plan_statistics */ DISTINCT TASKID 
  FROM Ocean_Calms.TASKINDEX 
 WHERE TASKID IN (
    SELECT TASKID 
      FROM Ocean_Calms.TASKINDEX 
     WHERE KEYNAME='wf-app-code' 
       AND KEYVALUE='OM' 
       AND TASKID IN (
          SELECT TASKID 
            FROM Ocean_Calms.TASKINDEX 
           WHERE KEYNAME='a-case-number' 
             AND KEYVALUE='70497' ));

Open in new window

I think this is what you mean:

SQL_ID 31wrjurs8dsxd, child number 0  
-------------------------------------  
SELECT /*+ gather_plan_statistics */ DISTINCT TASKID FROM OCEAN_CATALYST.TASKINDEX WHERE TASKID IN (  
SELECT TASKID FROM OCEAN_CATALYST.TASKINDEX WHERE KEYNAME='wf-app-code' AND KEYVALUE='OM'  
AND TASKID IN ( SELECT TASKID FROM OCEAN_CATALYST.TASKINDEX WHERE  
KEYNAME='a-case-number' AND KEYVALUE='70497' ))  
   
Plan hash value: 1933957753  
   
--------------------------------------------------------------------------------------------------------------------  
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |  
--------------------------------------------------------------------------------------------------------------------  
| 1 | HASH UNIQUE | | 1 | 1 | 20 |00:00:40.97 | 15M|  
|* 2 | TABLE ACCESS BY INDEX ROWID | TASKINDEX | 1 | 1 | 40 |00:00:27.21 | 15M|  
| 3 | NESTED LOOPS | | 1 | 1 | 32M|00:00:00.11 | 4670K|  
| 4 | NESTED LOOPS | | 1 | 1 | 1551K|00:00:07.76 | 1567K|  
|* 5 | TABLE ACCESS BY INDEX ROWID| TASKINDEX | 1 | 1 | 775K|00:00:00.78 | 10289 |  
|* 6 | INDEX RANGE SCAN | IND_TASKINDEX_KEYVALUE | 1 | 1 | 775K|00:00:00.01 | 2020 |  
|* 7 | INDEX RANGE SCAN | IND_TASKINDEX_TASKID | 775K| 2 | 1551K|00:00:02.35 | 1556K|  
|* 8 | INDEX RANGE SCAN | IND_TASKINDEX_KEYVALUE | 1551K| 1 | 31M|00:00:04.35 | 3103K|  
--------------------------------------------------------------------------------------------------------------------  
   
Predicate Information (identified by operation id):  
---------------------------------------------------  
ASKER CERTIFIED SOLUTION
Avatar of jocave
jocave
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
That is the one! It is enormously quicker now. Many, many thanks for your help today.

I take it that 7 day change won't affect actual data?

Correct, the 7 day change won't affect data, just statistics.

One problem that you may have, however, is that by default, there is a nightly job that runs to compute statistics on tables where the statistics are stale.  It's not clear why the statistics on that table got so out of whack.  Hopefully, it was a one-time hiccup so that if statistics are gathered on TaskIndex table & indexes, the query plan won't go south again.  You could lock the statistics for that table to prevent Oracle from gathering statistics again

BEGIN
  dbms_stats.lock_table_stats( 'OCEAN_CATALYST', 'TASKINDEX' );
END;

Open in new window


or lock the statistics in the entire schema

BEGIN
  dbms_stats.lock_schema_stats( 'OCEAN_CATALYST');
END;

Open in new window


That's not something I would normally suggest doing, though.  Most of the time, gathering fresh statistics is good for performance.  If this is an end-of-life application, though, it should be pretty safe.