Keithburnham
asked on
After 3 errors in Oracle Enterprise Manager database access is very slow in certain cases
After an event last week on our Windows 2003 Oracle 10g server (10.2.0.1) we are experiencing very slow retrieval of certain data. Searches appear to be fine, but going into cases in our application can take 20- 30 secs instead of near instantaneous. The errors were as follows:
Alert Log Generic Alert Log Error ORA-error stack ORA-error stack (07445[ACCESS_VIOLATION]) logged in E:\ORACLE\PRODUCT\10.1.0\A DMIN\ORCL\ BDUMP\aler t_orcl.log . 17-Nov-2010 15:48:15
Alert Log Generic Alert Log Error ORA-error stack (00600[12761], 00604, 04030) logged in E:\ORACLE\PRODUCT\10.1.0\A DMIN\ORCL\ BDUMP\aler t_orcl.log . 17-Nov-2010 15:48:15
Alert Log Generic Alert Log Error ORA-error stack (04030, 00600[12761], 00604, 04030) logged in E:\ORACLE\PRODUCT\10.1.0\A DMIN\ORCL\ BDUMP\aler t_orcl.log . 17-Nov-2010
Checking the ORA-errors it appears to be a memory or OS problem, but we haven't experience them since that time. Restarts of the server and addition of memory / disk checks have not helped things at all. I am an Oracle noob, so my analysis of the problem centered around trying to find errors in the database (which I'm told is very unlikely in Oracle).
Is there a sure fire way of error checking in this situation and do my errors look to be something fixable? This system is soon to be decomissioned and I am loathed to start up any more Oracle support as the system will be defuncy by the beginning of the new year. Many thanks in advance.
Alert Log Generic Alert Log Error ORA-error stack ORA-error stack (07445[ACCESS_VIOLATION]) logged in E:\ORACLE\PRODUCT\10.1.0\A
Alert Log Generic Alert Log Error ORA-error stack (00600[12761], 00604, 04030) logged in E:\ORACLE\PRODUCT\10.1.0\A
Alert Log Generic Alert Log Error ORA-error stack (04030, 00600[12761], 00604, 04030) logged in E:\ORACLE\PRODUCT\10.1.0\A
Checking the ORA-errors it appears to be a memory or OS problem, but we haven't experience them since that time. Restarts of the server and addition of memory / disk checks have not helped things at all. I am an Oracle noob, so my analysis of the problem centered around trying to find errors in the database (which I'm told is very unlikely in Oracle).
Is there a sure fire way of error checking in this situation and do my errors look to be something fixable? This system is soon to be decomissioned and I am loathed to start up any more Oracle support as the system will be defuncy by the beginning of the new year. Many thanks in advance.
ASKER
What edition of Oracle are you using (standard or enterprise)? Are you licensed to use the AWR? If not, do you have Statspack installed?
It's enterprise edition. I have no idea what AWR is I'm afraid, is there a way I can check? I don't know about the Statspack either so I'm not much help for you.
The SQL statement could be this one:
Action Run SQL Tuning Advisor on the SQL statement with SQL_ID "gt9jtvd37nqar".
SQL TextUPDATE DIARY SET CLASSTORUN=:1, DATETIMEDUE=:2, BYDIARY=:3, TITLE=:4, BODY=:5, S...
SQL IDgt9jtvd37nqar
Rationale SQL statement with SQL_ID "gt9jtvd37nqar" was executed 1947804 times and had an average elapsed time of 0.000048 seconds.
That didn't seem too bad to me, but I don't know what it's supposed to look like (again, I am a total noob for Oracle as you can tell).
As far as it being related to the memory message, it has been slow ever since that event (and yes it just could be a massive coincidence really).
thanks for the quick response!
It's enterprise edition. I have no idea what AWR is I'm afraid, is there a way I can check? I don't know about the Statspack either so I'm not much help for you.
The SQL statement could be this one:
Action Run SQL Tuning Advisor on the SQL statement with SQL_ID "gt9jtvd37nqar".
SQL TextUPDATE DIARY SET CLASSTORUN=:1, DATETIMEDUE=:2, BYDIARY=:3, TITLE=:4, BODY=:5, S...
SQL IDgt9jtvd37nqar
Rationale SQL statement with SQL_ID "gt9jtvd37nqar" was executed 1947804 times and had an average elapsed time of 0.000048 seconds.
That didn't seem too bad to me, but I don't know what it's supposed to look like (again, I am a total noob for Oracle as you can tell).
As far as it being related to the memory message, it has been slow ever since that event (and yes it just could be a massive coincidence really).
thanks for the quick response!
AWR is the Automatic Workload Repository. It is always installed in 10.2 but if you have not licensed the performance and tuning packs, you cannot access the data legally.
In your original message, you indicated that searches were slow. I would tend to assume that a search would be doing primarily queries, not updates like the SQL statement you posted.
In your original message, you indicated that searches were slow. I would tend to assume that a search would be doing primarily queries, not updates like the SQL statement you posted.
ASKER
As I said in the opening statement, searches appear to be the one function that operates fine; it's the opening of cases (which pulls back data from various tables) that appeared to be slow.
AWR I will have to look at, as Oracle was purchased in 2005 by the software provider.
AWR I will have to look at, as Oracle was purchased in 2005 by the software provider.
Sorry, you're right. I would still expect that opening a case would primarily involve issuing SELECT statements, though, not updates.
Is this a client/ server application? Or a three tier application?
Is this a client/ server application? Or a three tier application?
ASKER
It has Weblogic at the front end talking straight to a seperate Oracle server and that's it I think. There was a seperate workflow server that gets used for certain automation tasks but not on standard case retrieval.
It should be select statements I'd imagine, just grabbing data from the various areas to display on screen.
It should be select statements I'd imagine, just grabbing data from the various areas to display on screen.
OK, but the end users are using a web browser to access the application, right? That's going to make tracing the session to figure out the SQL statements that are slow problematic.
What does
return? Potentially, since opening cases was running well, you could try reverting the statistics that the optimizer uses back a day when performance was good (which sounds like it was the 17th, correct?). Reverting the statistics has the potential to change query plans throughout the system so it is a bit dangerous from that standpoint. But since we'd just be restoring old statistics, the risk that some plan will go catastrophically bad would be relatively low. Is there a test system that has a similar data volume that we could test the effect of reverting statistics on?
What does
SELECT dbms_stats.get_stats_history_retention()
FROM dual
return? Potentially, since opening cases was running well, you could try reverting the statistics that the optimizer uses back a day when performance was good (which sounds like it was the 17th, correct?). Reverting the statistics has the potential to change query plans throughout the system so it is a bit dangerous from that standpoint. But since we'd just be restoring old statistics, the risk that some plan will go catastrophically bad would be relatively low. Is there a test system that has a similar data volume that we could test the effect of reverting statistics on?
ASKER
I've got this in VMware so have an entire copy of the whole sstem to play with (what did we used to do?!?!?).
That statment returned 31 as a value.
What would I need to do as described? again, thanks for this as I've had no luck trying to search myself on the subject.
That statment returned 31 as a value.
What would I need to do as described? again, thanks for this as I've had no luck trying to search myself on the subject.
OK, cool.
The simplest thing that could possibly work would be to run
for each schema that the application uses (I'm guessing that there is probably only one schema that owns all the tables). This will restore the object statistics in that schema to their state on the morning of the 15th. This will invalidate all the query plans in the system, so the system will have to do some hard parsing initially. But ideally, after warming up the statement cache again, you'll see performance similar to what you had on the 15th.
The simplest thing that could possibly work would be to run
BEGIN
dbms_stats.restore_schema_stats( <<schema name>>, systimestamp - interval '7' day, false, false );
END;
for each schema that the application uses (I'm guessing that there is probably only one schema that owns all the tables). This will restore the object statistics in that schema to their state on the morning of the 15th. This will invalidate all the query plans in the system, so the system will have to do some hard parsing initially. But ideally, after warming up the statement cache again, you'll see performance similar to what you had on the 15th.
ASKER
ERROR at line 2:
ORA-06550: line 2, column 36:
PLS-00201: identifier 'OCEAN_CALMS' must be declared
ORA-06550: line 2, column 3:
PL/SQL: Statement ignored
Getting this message. Do I simply remove <<schema name>> and replace with our schema name, or is it supposed to be in quotes or something similar?
ORA-06550: line 2, column 36:
PLS-00201: identifier 'OCEAN_CALMS' must be declared
ORA-06550: line 2, column 3:
PL/SQL: Statement ignored
Getting this message. Do I simply remove <<schema name>> and replace with our schema name, or is it supposed to be in quotes or something similar?
It needs to be a quoted identifier, i.e.
to restore the statistics for the SCOTT schema.
BEGIN
dbms_stats.restore_schema_stats( 'SCOTT', systimestamp - interval '7' day, false, false );
END;
to restore the statistics for the SCOTT schema.
ASKER
AH yes, that did it. It appears to have ran successfully (PL/SQL procedure successfully completed) but the times are still quite high.
Oddly, with just me performing one case retrieval I see 30% CPU usage for the oracle service in the test environment. It then drops to nothing as soon as the case is retrieved.
The test Oracle server has 4GB RAM and 4 x CPUs (Xeon 5550s) and should be damn quick. This has got me totally stumped.
Oddly, with just me performing one case retrieval I see 30% CPU usage for the oracle service in the test environment. It then drops to nothing as soon as the case is retrieved.
The test Oracle server has 4GB RAM and 4 x CPUs (Xeon 5550s) and should be damn quick. This has got me totally stumped.
ASKER
Also, I saw this appear on the test server an hour ago:
~Errors in file e:\oracle\product\10.1.0\a dmin\orcl\ bdump\orcl _cjq0_3556 .trc:~ORA- 00604: error occurred at recursive SQL level 1~ORA-04031: unable to allocate 16 bytes of shared memory ("shared pool","select job, nvl2(last_date, ...","sql area","kglhin: temp")~
~Errors in file e:\oracle\product\10.1.0\a
OK. In the test environment, can you drill in to see what SQL statements are using so much CPU?
ASKER
SELECT DISTINCT TASKID FROM TASKINDEX WHERE TASKID IN (SELECT TASKID FROM TASKINDEX WHERE KEYNAME='wf-app-code' AND KEYVALUE='OM' AND TASKID IN (SELECT TASKID FROM TASKINDEX WHERE KEYNAME='a-case-number' AND KEYVALUE='70497' ))
This appears to be the biggest offender.
This appears to be the biggest offender.
ASKER
ACtually maybe this one as well, although this doesn't use as much time according to the report.
OK. Then in SQL*Plus, can you run
and post the output of the second statement?
SELECT /*+ gather_plan_statistics */ DISTINCT TASKID
FROM TASKINDEX
WHERE TASKID IN (
SELECT TASKID
FROM TASKINDEX
WHERE KEYNAME='wf-app-code'
AND KEYVALUE='OM'
AND TASKID IN (
SELECT TASKID
FROM TASKINDEX
WHERE KEYNAME='a-case-number'
AND KEYVALUE='70497' ));
SELECT *
FROM TABLE( dbms_xplan.display_cursor( null, null, 'ALLSTATS LAST' ));
and post the output of the second statement?
ASKER
ERROR at line 2:
ORA-00942: table or view does not exist
Apparently, it believes that the TASKINDEX table/view is wrong. Odd considering it's mentioned in the SQL statement I posted.
ORA-00942: table or view does not exist
Apparently, it believes that the TASKINDEX table/view is wrong. Odd considering it's mentioned in the SQL statement I posted.
ASKER
I've just checked and TASKINDEX appears in a different schema called OCEAN_CATALYST. Would that make a difference to the SQL statement above.
Are you logged in as the owner of the TaskIndex table? If not, you'll probably have to prefix the table name with the schema name, i.e.
SELECT /*+ gather_plan_statistics */ DISTINCT TASKID
FROM Ocean_Calms.TASKINDEX
WHERE TASKID IN (
SELECT TASKID
FROM Ocean_Calms.TASKINDEX
WHERE KEYNAME='wf-app-code'
AND KEYVALUE='OM'
AND TASKID IN (
SELECT TASKID
FROM Ocean_Calms.TASKINDEX
WHERE KEYNAME='a-case-number'
AND KEYVALUE='70497' ));
ASKER
I think this is what you mean:
SQL_ID 31wrjurs8dsxd, child number 0
-------------------------- ---------- -
SELECT /*+ gather_plan_statistics */ DISTINCT TASKID FROM OCEAN_CATALYST.TASKINDEX WHERE TASKID IN (
SELECT TASKID FROM OCEAN_CATALYST.TASKINDEX WHERE KEYNAME='wf-app-code' AND KEYVALUE='OM'
AND TASKID IN ( SELECT TASKID FROM OCEAN_CATALYST.TASKINDEX WHERE
KEYNAME='a-case-number' AND KEYVALUE='70497' ))
Plan hash value: 1933957753
-------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
| 1 | HASH UNIQUE | | 1 | 1 | 20 |00:00:40.97 | 15M|
|* 2 | TABLE ACCESS BY INDEX ROWID | TASKINDEX | 1 | 1 | 40 |00:00:27.21 | 15M|
| 3 | NESTED LOOPS | | 1 | 1 | 32M|00:00:00.11 | 4670K|
| 4 | NESTED LOOPS | | 1 | 1 | 1551K|00:00:07.76 | 1567K|
|* 5 | TABLE ACCESS BY INDEX ROWID| TASKINDEX | 1 | 1 | 775K|00:00:00.78 | 10289 |
|* 6 | INDEX RANGE SCAN | IND_TASKINDEX_KEYVALUE | 1 | 1 | 775K|00:00:00.01 | 2020 |
|* 7 | INDEX RANGE SCAN | IND_TASKINDEX_TASKID | 775K| 2 | 1551K|00:00:02.35 | 1556K|
|* 8 | INDEX RANGE SCAN | IND_TASKINDEX_KEYVALUE | 1551K| 1 | 31M|00:00:04.35 | 3103K|
-------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
Predicate Information (identified by operation id):
-------------------------- ---------- ---------- -----
SQL_ID 31wrjurs8dsxd, child number 0
--------------------------
SELECT /*+ gather_plan_statistics */ DISTINCT TASKID FROM OCEAN_CATALYST.TASKINDEX WHERE TASKID IN (
SELECT TASKID FROM OCEAN_CATALYST.TASKINDEX WHERE KEYNAME='wf-app-code' AND KEYVALUE='OM'
AND TASKID IN ( SELECT TASKID FROM OCEAN_CATALYST.TASKINDEX WHERE
KEYNAME='a-case-number' AND KEYVALUE='70497' ))
Plan hash value: 1933957753
--------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
--------------------------
| 1 | HASH UNIQUE | | 1 | 1 | 20 |00:00:40.97 | 15M|
|* 2 | TABLE ACCESS BY INDEX ROWID | TASKINDEX | 1 | 1 | 40 |00:00:27.21 | 15M|
| 3 | NESTED LOOPS | | 1 | 1 | 32M|00:00:00.11 | 4670K|
| 4 | NESTED LOOPS | | 1 | 1 | 1551K|00:00:07.76 | 1567K|
|* 5 | TABLE ACCESS BY INDEX ROWID| TASKINDEX | 1 | 1 | 775K|00:00:00.78 | 10289 |
|* 6 | INDEX RANGE SCAN | IND_TASKINDEX_KEYVALUE | 1 | 1 | 775K|00:00:00.01 | 2020 |
|* 7 | INDEX RANGE SCAN | IND_TASKINDEX_TASKID | 775K| 2 | 1551K|00:00:02.35 | 1556K|
|* 8 | INDEX RANGE SCAN | IND_TASKINDEX_KEYVALUE | 1551K| 1 | 31M|00:00:04.35 | 3103K|
--------------------------
Predicate Information (identified by operation id):
--------------------------
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
That is the one! It is enormously quicker now. Many, many thanks for your help today.
I take it that 7 day change won't affect actual data?
I take it that 7 day change won't affect actual data?
Correct, the 7 day change won't affect data, just statistics.
One problem that you may have, however, is that by default, there is a nightly job that runs to compute statistics on tables where the statistics are stale. It's not clear why the statistics on that table got so out of whack. Hopefully, it was a one-time hiccup so that if statistics are gathered on TaskIndex table & indexes, the query plan won't go south again. You could lock the statistics for that table to prevent Oracle from gathering statistics again
or lock the statistics in the entire schema
That's not something I would normally suggest doing, though. Most of the time, gathering fresh statistics is good for performance. If this is an end-of-life application, though, it should be pretty safe.
One problem that you may have, however, is that by default, there is a nightly job that runs to compute statistics on tables where the statistics are stale. It's not clear why the statistics on that table got so out of whack. Hopefully, it was a one-time hiccup so that if statistics are gathered on TaskIndex table & indexes, the query plan won't go south again. You could lock the statistics for that table to prevent Oracle from gathering statistics again
BEGIN
dbms_stats.lock_table_stats( 'OCEAN_CATALYST', 'TASKINDEX' );
END;
or lock the statistics in the entire schema
BEGIN
dbms_stats.lock_schema_stats( 'OCEAN_CATALYST');
END;
That's not something I would normally suggest doing, though. Most of the time, gathering fresh statistics is good for performance. If this is an end-of-life application, though, it should be pretty safe.
It sounds like there is a single query that is slow and that the slowness is relatively new. Do you know what SQL statement is slow? If so, do you know how to generate the query plan for this statement?
What edition of Oracle are you using (standard or enterprise)? Are you licensed to use the AWR? If not, do you have Statspack installed?