Solved

Oracle queries running quick then running slowly

Posted on 2006-11-28
18
500 Views
Last Modified: 2011-10-03
Hi, Folks.

I have an application that uses Oracle as the backend database.  I ran a series of tests this morning and the average runtime was 1.9 seconds.  I then refreshed Oracle's statistics for the schema and now the same tests are taking over 3 minutes.  Can anyone help me understand this behaviour please?  The two tests are exactly the same, the only change was that I ran these commands:
EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS(ownname => '&schema', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE, degree => DBMS_STATS.AUTO_DEGREE, cascade => DBMS_STATS.AUTO_CASCADE);

EXECUTE DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

The tests only read data, they never write data.  Between each test iteration I issue this command:
ALTER SYSTEM FLUSH BUFFER_CACHE

to reduce the impact of data caching.

I'm at a complete loss.  We're trying to performance test our application and results changing dramatically like this really throw a wrench in the works.

Cheers, Max

EDIT: I am running the tests on the same hardware each time.  Nobody else is using the test servers.  When I check the CPU usage on the database server (Windows) oracle.exe seems to be hovering around 7-8%
0
Comment
Question by:HibsMax
  • 7
  • 6
  • 2
  • +2
18 Comments
 
LVL 34

Accepted Solution

by:
Mark Geerlings earned 250 total points
Comment Utility
If there were no statistics at all the first time, then the optimizer had to use the older "rule-based" approach.  After you calculated statistics, the optimizer most likely used the newer, more complex "cost-based" approach, which is usually better.  But, SQL statements should be written differently for the two different optimizers.  If your SQL statements were written by people familiar with Oracle's older, cost-based optimizer (or by people not so familiar with Oracle) it is possible that they do not work so well with the cost-based optimizer.
0
 
LVL 34

Expert Comment

by:Mark Geerlings
Comment Utility
A more important value than CPU usage, is disk I/O.  And one big advantage of Windows (that I really miss in the Linux world!) is the Windows Performance Monitor and its ability to indicate disk I/O.  Set up your Windows Performance Monitor to simultaneously display:
1. CPU usage (per CPU)
2. page file usage
3. disk I/O per disk (assuming that you have multiple disks, that are not joined in one big RAID5 set - RAID5 is terrible for the Windows swap file and for the Oracle temp tablespace).
0
 

Author Comment

by:HibsMax
Comment Utility
Thanks, Mark.

I should have stated that there were statistics the first time around.  Pre-test steps include:
1. load data
2. gather statistics

The SQL queries are not all generated by people familiar with any specific database platform.  In fact, most of the queries are configured by Kodo (a JDO implementation).

The fact that the queries have been shown to run quicker shows me that there is nothing wrong with the actual queries.  I'm convinced this is an Oracle issue (which also includes how we are using Oracle).  I saw similar results when doing some reports testing a few months ago.  Tests run great, gather statistics, tests run sloooooooooooooow.  If query A runs in 10 seconds once, it should always run in 10s (or thereabouts) if the environment is unchanged from test to test.

Cheers, Max
0
 
LVL 34

Expert Comment

by:Mark Geerlings
Comment Utility
Do the queries include "bind variables" (which Oracle is optimized for)?  Or, do the queries contain "hard-coded" values coming from Kodo?  If they are "hard-coded", and if the values are different for different tests, then the cost-based optimizer could very easily choose a different path each time.
0
 

Author Comment

by:HibsMax
Comment Utility
The queries appear to use bind variables.  When I switch on Kodo tracing I see the queries displayed with ? placeholders and then a list of parameters after the query.
0
 

Author Comment

by:HibsMax
Comment Utility
Oh yeah, I copied the queries and ran them manually in SQL*Plus and they performed equally as badly in that environment.
0
 
LVL 34

Expert Comment

by:Mark Geerlings
Comment Utility
Have you used Oracle's "trace" utility (with "tkprof") or a different tool (like: TOAD) to examine the "explain plan" for the queries to see if they are using indexes as you expect?
0
 

Author Comment

by:HibsMax
Comment Utility
I haven't gone to that level yet but I have in the past.

I guess what concerns me the most is how the query times can change so dramatically when effectively nothing has changed......except for the statistics that the optimizer has its disposal of course.

As it happens I am using old code to establish a baseline.  The new code doesn't exhibit this freaky behaviour so it's not a big issue - I just need to be able to try and explain it. :)
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 34

Expert Comment

by:Mark Geerlings
Comment Utility
One of the disadvantages of Oracle's cost-based optimizer is the huge affect that table statistics (or lack thereof!) can have on the query performance.  Sometimes "optimizer hints" in queries can help in cases where the optimizer by default makes bad choices, but I'm not a big fan of optimizer hints since there is no guarantee that they will be used.  Another option (if you get a query to work efficiently at least once first!) is to save a "stored outline" and force Oracle to use that "stored outline" for subsequent query executions.  But this only helps if your queries are identical (bind variable values may vary).
0
 
LVL 47

Assisted Solution

by:schwertner
schwertner earned 250 total points
Comment Utility
There are some other elements.
You have to figure out the role of DB_BUFFER_CACHE.
If you flush it before the second test all data will be read
into the RAM from the disk.

But reading your intelegent explanation i am sure you didn't this mistake.

The element of SQL processing that causes the delay you experience could
be the process of so called "Hard Parse of SQL Statement".
It seems that after you refresh the statistics Oracle decides not to use
the parsed and placed SQL in the Shared Pool.
Instead this it in addition to Semantic analysys begins a new round
of Optimizing the query - some times CBO creates and analyzes
as much as 400 and more variants. After that it chooses the "best"
execution plan and stores it in the Shared Pool.
It is very likely that you observe the overhead of the new Optimizing Process.
0
 
LVL 6

Expert Comment

by:cjard
Comment Utility
> guess what concerns me the most is how the query times can change so dramatically when effectively nothing has changed......except for the statistics that the optimizer has its disposal of course

to a cost-based optimizer, statistics are everything - it has no other way to guess at how many rows are likely to come from which tables, where those rows are on disk, how it should access the data, how it should join the data. If the data is loosely sorted, nested loops may be beter than a hash, if a major part of the table is used, a full scan rather than index indirection would be better..

if you knew the hoops oracle jumped through under the hood, to get data efficiently, then you'd better appreciate why "statistics" are way more than "effectively nothing".. more like "effectively everything". i guess an analogy could be like someone sorting all your CDs into alphabetical order of artist surname rather than disc title - you'd never find anything with your old search method of "alphabetical by disc title" - bat out of hell is now under M for meatloaf, 3 shelves down. the CDs havent changed, and the new person can find them really quickly using their search method, its jsut your old query cant, and you may resort to doing a full shelf scan.. Thats the impact of gathering the statistics

Have you considered looking at the plans for the queries and re-writing them?
0
 

Author Comment

by:HibsMax
Comment Utility
Thanks guys.

I guess I didn't do a 100% great job of explaining the whole process.  I understand that statistics are important.  The whole process has been going on for a couple of weeks so I cannot fully document every step I took but you can consider it to be something like this:

1. load test data
2. gather statistics
3. run test with baseline code (flush cache between individual tests cycles)
4. run test with revised code (flush cache between individual tests cycles)
5. repeat step 4 X times

The data was only loaded once.  The stats were gathered once.  No more data was loaded into the database.

After running several tests with a different codebase (all Java code, no database code e.g. stored procs, functions, triggers, etc. - we have none of those) I decided to repeat the statistics gathering process, which for me is:
    DBMS_STATS.DELETE_SCHEMA_STATS
    DBMS_STATS.GATHER_SCHEMA_STATS
    DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO

It was remiss of me to miss out that information when asking the original question.  You see, I've always been running with statistics in the database.  The only changes that are made in the test environment are:
1. application code changes
2. collecting statistics *again* <- that's the bit I should have saif first time.

I don't know what Oracle does when it gather statistics but my assumption was that if I gathered stats once, run some tests and then gathered the stats again that they would be the same because the underlying data has not changed.  But maybe my assumption is incorrect.

Cheers, Max
0
 
LVL 34

Expert Comment

by:Mark Geerlings
Comment Utility
The statistics should be the same each time, if the only thing that happened in between was running some queries, and there were no inserts, update or deletes in the tables being queried.

Flushing the cache though could be contributing to some of the variations you are seeing, and may or may not give you a good comparison to a production environment (where the cache is usually not flushed).
0
 

Author Comment

by:HibsMax
Comment Utility
I agree with your last statement, Mark.  We're not trying to replicate a production environment, we're trying to replicate a worst case scenario by guaranteeing that the data has to be retrieved from disk instead of the cache.  This is part of our performance testing effort.

I also agree with your first statement and that's what confuses me.

I mentioned I had a similar problem before when testing some report queries.  This is probably going to be hard to swallow but it really did happen.  As with this query, the underlying data was never changed.  We had a query that we were trying to optimize.  We ran it a few times and each time it was completing in about 5 minutes.  We tried different queries and kept seeing roughly the same performance.  I "refreshed" the statistics using the above process (DELETE, GATHER, FLUSH) and I swear that I ran the same query that completed in about 5 minutes and I had to kill the process after 15 hours.  I'm not making this up.  That really screwed with my head.  Since the same thing, roughly, has happened twice I'm beginning to think that there is something else going on.  I don't have the results now but I recall in the past:
1. gather stats
2. run explain plan on query
3. gather stats
4. run explain plan query
5. compare cost from stpes (2) and (4) and they were wildly different.

I will do the exact same with this query and report back my findings.....
0
 

Author Comment

by:HibsMax
Comment Utility
Here are the results of my testing.  Now the costs of the queries below don't differ by much but they do differ.  The reports queries I was testing with before are much more complicated than the simple query I used this time.  The purpose of this later test was not to show a large difference in the stats, just a difference.

DELETE STATS
    Run query, COST [1] => 7964
GATHER STATS
    Run query, COST [2] => 1472
DELETE STATS
    Run query, COST [3] => 7964
GATHER STATS
    Run query, COST [4] => 1428
0
 
LVL 47

Expert Comment

by:schwertner
Comment Utility
Be  aware that Oracle has internal mechnism to discover WHEN the statistics is run and will hard parse the SQL if the CBO is provided with new statistics.
And hard parse is a long running process and "killer" of
aplications that do not use bind variables.

Also if you have experience in Export utility you will know for sure that Oracle know if the statistics of some table is old (if there are many changes in the table after the last ststistics).

Conclussion: The usage of the statistics is not a blind process always. nevertheles it is not so sophisticated as we want.
0
 
LVL 1

Expert Comment

by:Computer101
Comment Utility
Forced accept.

Computer101
EE Admin
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Join & Write a Comment

Suggested Solutions

Cursors in Oracle: A cursor is used to process individual rows returned by database system for a query. In oracle every SQL statement executed by the oracle server has a private area. This area contains information about the SQL statement and the…
I remember the day when someone asked me to create a user for an application developement. The user should be able to create views and materialized views and, so, I used the following syntax: (CODE) This way, I guessed, I would ensure that use…
This video shows how to Export data from an Oracle database using the Original Export Utility.  The corresponding Import utility, which works the same way is referenced, but not demonstrated.
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now