Link to home
Start Free TrialLog in
Avatar of bhunger
bhungerFlag for United States of America

asked on

Oracle DB Slows After Datapump Until Next Reboot

We have a Dell R710 running Oracle Standard Edition Two on two xeon X5687 processors and 192gb of dual rank (12x16gb) ddr3 running at 1333MHz. This is a standard database running on Solaris 10, supporting a website using a similar set up to LAMP, with the exception that, rather than using MySQL, or using Oracle 11 G. We measure responsiveness of the whole platform using a service called Monitis.  It times the process of going to the website hitting the back end database and then returning results to the web viewer.

Our problem is that, after we run a job like datapump or do a large insert, the performance of the machine deteriorates noticeably and stays on that new poorer performance track for hours and days at a time UNTIL the next system reboot.

We've noticed that the performance can be rectified immediately on a full system restart. On Wednesday night, for example, the system which had been running very slowly snapped back to great performance metrics after a reboot. It stayed this way until the next evening at 10 PM when the nightly data pump job was run. Responsiveness crashed dramatically at the outset of the process and, while it improved a lot once the process was finished, it has established a new lower plateau of performance that has persisted over the last 24 hours.

We've ruled out any kind of networking issues and are positive that the problem resides in the Oracle database server.

Is there an issue with memory leaking or too much ram for Oracle 11g?  This problem kicked in on Feb 11 when we 1) brought system ram up to 192 gb and 2) swapped out Xeon E5520's for X5687's.
Avatar of gheist
gheist
Flag of Belgium image

It means you either have oversized SGA or oversized AMM max..
Avatar of bhunger

ASKER

Thanks! Not sure what oversized AMM is?
Avatar of Geert G
Check the swap size and io before and after, and also memory usage

Make sure the max memory target is lower than physical available memory
AMM is /dev/shm (google can feed more marketing papers on how good it is)
You ment "large insert".
After such activity recalculate the statistics:

#########################################
##  SCHEMA                             ##
#########################################


execute DBMS_STATS.DELETE_SCHEMA_STATS (ownname =>'SYSTEM', cascade => true);

execute DBMS_STATS.GATHER_SCHEMA_STATS (ownname =>'BSI', cascade => true);


!!!!  WITH HISTOGRAMS:  !!!!!!

 SELECT systimestamp FROM dual;

execute DBMS_STATS.GATHER_SCHEMA_STATS ( -
        ownname          => 'BSI', -
        options          => 'GATHER', -
        estimate_percent => 100, -
        method_opt       => 'for all columns size auto', -
        cascade          => true, -
        degree           => 5 -
      );



#########################################
##  DATABASE                                                                        ##
#########################################

 
   dbms_stats.gather_database_stats(cascade=>true);
   

After statistics recalculation flush shared pool:

ALTER SYSTEM FLUSH SHARED_POOL;
Avatar of bhunger

ASKER

Update:  Last night, I swapped out the 192gb of r2 ddr3 memory for the 96gb of 2r ddr3 memory.  I also removed the Xeon X5687 CPU's and replaced it with the old Xeon E5520 CPU's.  The response time, which had averaged 4,500 ms over the past 10 days dropped to a much steadier, flat 1,400 ms.  We've run som e large jobs today and, so far, benchmarks have returned to normal after they have completed.

The real test is tonight's 10:00 PM EST data pump job.  This, being the one that has consistently locked up the resources of the server, should be an interesting challenge.  If it goes well, then it was the CPU's or ram that was causing problems.  Today's performance, though, was a big relief.  I appreciate the suggestions and have forwarded to my dba.  I'll update with the post data pump situation.
I don't think this command is a good idea:
 execute DBMS_STATS.DELETE_SCHEMA_STATS (ownname =>'SYSTEM', cascade => true);

But the suggestion to recalculate statistics at least on tables affected by the datapump job is a good idea.  I doubt if recalculating statistics on the entire schema is necessary.  I would recommend running dbms_stats.gather_table_stats on just the tables that had large numbers of records added.  Then set up a separate job to run weekly or monthly to recalculate statistics on the entire schema(s).

These statistics can have a big impact on performance.
The command
execute DBMS_STATS.DELETE_SCHEMA_STATS (ownname =>'SYSTEM', cascade => true);

is only a schema statistics recalculation sample. Only a sample! Do not read them verbatim.

There are also many other useful kinds of statistics that need to be calculated.
I keep a big file with different samples and they really help to increase the performance.
Do not forget to flush the shared pool.
If you doesn't use Exadata please skip the last.

#########################################
##  SYSTEM                                                                           ##
#########################################

execute DBMS_STATS.GATHER_SYSTEM_STATS (gathering_mode => 'NOWORKLOAD');

#########################################
##  DICTIONARY                                                                   ##
#########################################

 
EXECUTE DBMS_STATS.GATHER_dictionary_stats;

As with any upgrade it is strongly advisable to re-gather both the dictionary_stats and the table_stats, once the upgrade has completed.

#########################################
##  EXADATA                                                                         ##
#########################################

EXEC DBMS_STATS.GATHER_SYSTEM_STATS('EXADATA');
If you have many indexes in the Data Pump import this will slow down the performance.
Data Pump do not import the indexes. It imports the indexes DDLs and they are created from scratch.
So the statistics get lost.
Try to use the parallel option of Data Pump (either by Export and by Import).

Also observe the size of the SGA components.
Do you use some kind of "in memory" DB?
Given the 192 GB of RAM use the Mark Geerlings advices to place some important tables directly in the memory.
Mark is our best experts in this topics.
Do you use Exadata or some kind of Flash Cards? Smart Flash and Smart Scan are interesting features in this environmenrs
What values do you have for SGA_MAX_SIZE or SGA_TARGET?  It may be helpful if you could post the contents of your spfile here (excluding the first and last lines that include non-printable characters) so we could review all of your parameters.  If these are not set correctly for: your actual RAM; your database size; your number of users; and your application, the performance of Oracle may be far worse than what is possible.
>We measure responsiveness of the whole platform using a service called Monitis.

does it give the same response figures every time ?

i have come across developers with this code:
// drunk, fix when sober
if name_of_day(today) in ('Monday', 'Tuesday', 'Thursday') then
  for x := 1 to random(1000) do 
    sleep(random(500));

Open in new window


how sure are you of this monitis ?
if you empty and fill up lots of tables ...

calculate the statistics when the tables are full
then lock them

that's also a possibility for certain tables
Avatar of bhunger

ASKER

Mark,

Here's some of the information you referenced:

>cat init.ora
billdb.__db_cache_size=5872025600
billdb.__java_pool_size=33554432
billdb.__large_pool_size=33554432
billdb.__oracle_base='/u01/app/oracle'#ORACLE_BASE set from environment
billdb.__pga_aggregate_target=3657433088
billdb.__sga_target=6845104128
billdb.__shared_io_pool_size=0
billdb.__shared_pool_size=771751936
billdb.__streams_pool_size=67108864
*.audit_file_dest='/u01/app/oracle/admin/billdb/adump'
*.audit_trail='db'
*.compatible='11.2.0.0.0'
*.control_files='/u01/app/oracle/oradata/billdb/control01.ctl','/u01/app/oracle/oradata/billdb/control02.ctl','/u03/oradata/billdb/control03.ctl'
*.db_block_size=8192
*.db_domain=''
*.db_name='billdb'
*.db_recovery_file_dest_size=536870912000
*.db_recovery_file_dest='/dbbackup/flash_recovery_area'
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=billdbXDB)'
*.log_archive_dest_10='LOCATION=USE_DB_RECOVERY_FILE_DEST'
*.log_archive_dest_1='LOCATION=/u02/oradata/billdb/archivelog'
*.memory_target=10485760000
*.open_cursors=300
*.processes=150
*.remote_login_passwordfile='EXCLUSIVE'
*.undo_tablespace='UNDOTBS1'
well, you'r not overallocating memory :)

when are automatic garhering of statistics scheduled ?
does it improve after that stats job has finished ?

if the stats jobs run the same time as big imports, you might get very odd performance

move the auto stat gathering just after the import and see if it makes a difference
https://docs.oracle.com/cd/B28359_01/server.111/b28274/stats.htm#CIHDGIID
With 192Gb of RAM to work with, your key memory values look way too small to me.  You have approximately:
MEMORY_TARGET = 10 Gb
SGA TARGET           =  6.5 Gb
PGA Aggregate       = 3.5 Gb
DB Cache size         = 5.5 Gb

Is this a dedicated Oracle database server?  If so, your values are way too low.  Or, does this server also have to do other things, like run the web server or other programs?  Even if yes, you still may be able to increase your Oracle memory usage values somewhat and see a possible dramatic performance improvement for Oracle.
Avatar of bhunger

ASKER

We moved the ram down to 96gb (ddr3, 2r, running at 1066) on Thursday night.  

Yes, this db server only services as LAMP (w/Oracle not MySQL) back-end.  Nothing more.  There's a Red Hat web server in front of it on the subnet.

Makes sense, but what about resources being slowly diminished after each datapump?

User generated image
I'm not surprised that performance gets slower after each DataPump run, since your total memory for Oracle is quite low compared to the total memory on the server, even when you have that reduced now to 96gb.  Each DataPump run will cause the new data to take up space in Oracle's cache, and if that is not the data that gets queried the most, the database will have to do more work on queries to read the requested data off disk, instead of finding it already available in RAM.

With 96gb of RAM, I would set your MEMORY_TARGET to at least 60gb.  You may be able to set that higher, like 70 , 80 or even 90gb, but you will need to watch for swapfile activity.  You don't want to force your O/S to have to use swap.  It should just use physical RAM.  And, you will need to shutdown and restart your database after changing the MEMORY_TARGET value in your spfile, then since this is Oracle's "automatic" memory management setting, you will need to let the system run for some hours of normal use to see how Oracle decides to use this memory efficiently.  The "billdb.__..." values at the top of your spfile indicate the values that Oracle choose for those with only 10gb total.
Avatar of bhunger

ASKER

Mark, very good point.  Makes complete sense and that would explain the "stairstep" slowing down.  Question though: even if we increase it to, say 60 gb, will it ever clear itself up or will it trend the same way, only much more slowly?  We'll change and I'll report back.
I don't know the size of your database, nor the types of queries that are common in your application nor their references (or not) to the data that was recently loaded with DataPump so it is hard for me to guarantee that you won't see this stair-step slow down again.  But, I suspect that your performance will be dramatically better if you allow Oracle to efficiently use the majority of RAM in your server.  So, you may or may not get performance slowdowns after running DataPump in the future, or they may be small enough that you don't notice.  Another possibility to consider, if you run the DataPump job(s) in a particular overnight window, and most queries run outside of the time when DataPump is active is to use two different "Resource Manager plans" in your database, one to optimize queries and the other to optimize the performance of DataPump.  This isn't beginner-level Oracle tuning, but it is an option.

With 60gb now instead of 10gb for Oracle, you may get *MUCH* more than a 6x increase in performance!  For Oracle, I like to adjust the level of memory that Oracle can use up to the highest level possible without forcing the O/S to swap memory.  This may take some trial and error if you get beyond 80-90% of the physical RAM used by Oracle.  This is based on my experience with Oracle on 64-bit Linux in recent years.  I have no Solaris experience, but I expect Oracle on that to be very similar to Oracle on Linux.

Keep in mind, one of the big advantages of Oracle is that an Oracle database is *VERY* tunable for: different server hardware (number of CPUs, amount of RAM, etc.); different storage systems (NAS, SAN, SSD, RAID levels, etc.); size of the database; number of users; type of application (transaction-processing, reporting, a hybrid, etc.).

But, one of the disadvantages of Oracle is that you have to tune it, or your performance will be far less than what is possible on your combination of those things.

Don't assume that Oracle's default values are best for your system.
Avatar of bhunger

ASKER

Great insights, Mark.  Thanks for the tips.  We'll make the recommended changes and I'll let you know how it turns out.
For great surprise the big SGA doesn't speed up the performance :)  . For more great surprise only 15% of the RAM is used for data. 85% of the RAM of the SGA is used for internal tables and data control structures of the instance  :) The usage and the maintenance of these control data structures for astronomical great surprise speed down the productivity of the instance :) .
Given this big amount of the RAM you can rely on internal UNIX (Linux) speed up - the OS itself buffers the disk blocks in the RAM and so when reusing read them from the RAM instead from disks.
I disagree with these two comments from schwertner:
1. "big SGA doesn't speed up the performance"
2. "only 15% of the RAM is used for data"

For #1, in my 25 years of experience with Oracle, usually a larger SGA results in faster performance, as long as the O/S does not need to swap memory to the swapfile.
For #2, that depends on the size of the SGA.  If that is very small, like just 1 or 2gb, then that may be true.  But even with your MEMORY_TARGET of 10gb, you had about 50% of that used for your data, since your DB Cache size = 5.5 Gb.  A little bit of that 5.5gb is used for internal tables, but the majority of that would be your application's data.
Hello Mark,

Things don't always seems as expected.
I also have your way of thinking before in 2014 when I have read this article from Arup Nanda.
You   know who Arup Nanda is.
Don't forget that Oracle "eats" very big  amount of RAM without to speed up the  performance.
I have read some articles and the conclusion was (so far I recall now), that only 1/6 of the SGA (I don't recall if it was about db_buffer-cache or all SGA) is used to keep real data.


http://arup.blogspot.de/2011/04/can-i-fit-80mb-database-completely-in.html

Can I Fit a 80MB Database Completely in a 80MB Buffer Cache?

NO!
This is in the Series "100 Things You Probably Didn't Know
.....


Takeaways

From the above discussion you saw how a table with just two blocks populated fills up the buffer cache with 20 buffers. Imagine a normal database with, say 10000 filled blocks (8KX10,000 = 80M). It might easily fill 200,000 buffers. With a 8K block size that amounts to 8 K X 200 K = 1600M, or about 1.6 GB of buffer cache.
Avatar of bhunger

ASKER

Update:  On Tuesday night, we doubled the Memory_Target to 20 gb.  We would've added more, but decided, since it's a live production system, we'd inch slowly up to the 50gb goal.

It clearly helped the "stairstep" slowdown tendency (latency increasing 300 ms after each data pump and staying there).

It now is more flat at about 1900-2100 ms, which is well within the acceptable range.

We'll add more again this week, but the flatter, less spiked response times has brought down our collective blood pressure considerably.

Thanks everyone.  I'll update with any interesting developments.
SOLUTION
Avatar of schwertner
schwertner
Flag of Antarctica image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of bhunger

ASKER

Thanks guys.  The latency has flattened out at 2,000 ms.  We're going to let things settle down for three or four weeks and the move the memory up again to 40 or 50gb.

Thanks for your great insights.

Bill
Just curious: did you ever adjust your SGA size up to 50% (or more) of the physical RAM?  And if yes, did you see an additional performance improvement?
Avatar of bhunger

ASKER

Mark, thanks for touching base. We did increase the SGA by 15 GB. We wanted to step it up incrementally  due to the fact that it's a production machine.

This made a dramatic difference in performance. So much so, that we did not increase it subsequently.

Since  the end of February, when we made the adjustment, it has been much steadier, with a response rate of about 2100 ms. The only time that we get performance hits or during our weekly load process.

Most importantly though, the data pump process does not diminish performance as it did before.

Thanks again for your help, Bill