Solved

Redo Buffer Problem

Posted on 2004-04-13
6
1,752 Views
Last Modified: 2008-03-10
I have three redolog files in our Oracle system of 100 MB each. Of which only one of the group is active. Now whenever i query the v$sysstat i get the value for 'redo buffer allocation retries' as 6 and 'redo log space requests' as 9. What could be the possible reason should i need to increase the log buffer parameter. Or its some other problem. The same problem i encountered last week where in the values where quite high. I had shutdown the system and restarted the database the problem was resolved. Should i need to modify the parameter log_checkpoint_interval.

Awaiting your comments on the same.
0
Comment
Question by:pradiptam
  • 2
6 Comments
 
LVL 47

Accepted Solution

by:
schwertner earned 168 total points
ID: 10811550
redo buffer allocation retries : 6
             
Total number of retries necessary to allocate space in the redo buffer. Retries  
are needed either because the redo writer has fallen behind or because an event
such as a log switch is occurring.

This means that LOG SWITCHES occures too often. To avoid this increase the size of REDO LOGS.
You can also investigate alert<SID>.log file to see the history of this event and how often this occurs.

0
 
LVL 8

Assisted Solution

by:baonguyen1
baonguyen1 earned 166 total points
ID: 10811578
Hi

The "redo buffer allocation retries" indicates the total number of retries necessary to allocate space in the redo buffer.  Retries are needed either because the redo writer has
fallen behind or because an event such as a log switch is occurring. In other words, the log buffer is too small and LGWR tries to flush so frequently that contention is generated. This can also be caused by a db_block_buffers too small

YOu can try to make the DBWR more infrequent by adjusting FAST_START_IO_TARGET (8i) or FAST_START_MTTR_TARGET (9i). Keep in mind that the performance may inscrease but recoverbility could be affected.
Or you can enlarge your redo files to make checkpoiting more infrequent on log switches. However, the redo log file is 100M each and I dont think this is the case.

Attched here the CHECKPOINT TUNING AND ERROR HANDLING from oracle :

1.  What is a Checkpoint?

A Checkpoint is a database event which synchronizes the modified data blocks in memory with the datafiles on disk.  It offers Oracle the means for ensuring the consistency of data modified by transactions.  The mechanism of writing modified blocks on disk in Oracle is not synchronized with the commit of the corresponding transactions.
A checkpoint has two purposes: (1) to establish data consistency, and (2) enable faster database recovery.   How is recovery faster?  Because all database changes up to the checkpoint have been recorded in the datafiles, making it unnecessary to apply redo log entries prior to the checkpoint. The checkpoint must ensure that all the modified buffers in the cache are really written to the corresponding datafiles to avoid the loss of data
which may occur with a crash (instance or disk failure).
 

Oracle writes the dirty buffers to disk only on certain conditions:

  - A shadow process must scan more than one-quarter of the db_block_buffer
     parameter.
  - Every three seconds.
  - When a checkpoint is produced.

A checkpoint is realized on five types of events:

  - At each switch of the redo log files.
  - When the delay for LOG_CHECKPOINT_TIMEOUT is reached.
  - When the size in bytes corresponding to :
     (LOG_CHECKPOINT_INTERVAL* size of IO OS blocks)
     is written on the current redo log file.
  -  Directly by the ALTER SYSTEM SWITCH LOGFILE command.
  - Directly with the ALTER SYSTEM CHECKPOINT command.
 

During a checkpoint the following occurs:
 -  The database writer (DBWR) writes all modified database
    blocks in the buffer cache back to datafiles,
 -  Log writer (LGWR) updates both the controlfile and
    the datafiles to indicate when the last checkpoint
    occurred (SCN)


2. The checkpoint process
 
Please note that this parameter is obsolete starting with Oracle8 and now
the CKPT process is always started as part of the background processes.

The CHECKPOINT_PROCESS init.ora parameter determines whether or not the
optional CKPT background process will be started to perform LGWRs tasks during
checkpoint operations of updating the datafile headers.  LGWR is then free to
perform its' primary function flushing the redo log buffer to the online redo
logs.
The CKPT process can improve performance significantly and decrease the amount
of time users have to wait for a checkpoint operation to complete.  The
overhead associated with starting another background process is not
significant when compared to the performance benefit to be gained by enabling
CKPT, therefore, Oracle recommends always enabling the checkpoint process (CKPT).



3. Checkpoints and Performance

Checkpoints present a tuning dilemma for the Database Administrator.  Frequent
checkpoints will enable faster recovery, but can cause performance
degradation. How then should the DBA address this?
Depending on the number of datafiles in a database, a checkpoint can be a
highly resource intensive operation, since all datafile headers are frozen
during the checkpoint.  There is a performance trade-off regarding frequency
of checkpoints.  More frequent checkpoints enable faster database recovery
after a crash.  This is why some customer sites which have a very low
tolerance for unscheduled system downtime will often choose this option.
However, the performance degradation of frequent checkpoints may not justify
this philosophy in many cases. Let's assume the database is up and running 95%
of the time, and unavailable 5% of the time from infrequent instance crashes
or hardware failures requiring database recovery.  For most customer sites, it
makes more sense to tune for the 95% case rather than the rare 5% downtime.

This bulletin assumes that performance is your number one priority and so
recommendations are made accordingly. Therefore, your goal is to minimize the frequency
of checkpoints through tuning.

Tuning checkpoints involves four key initialization parameters

           -  CHECKPOINT_PROCESS
           -  LOG_CHECKPOINT_INTERVAL
           -  LOG_CHECKPOINT_TIMEOUT
           -  LOG_CHECKPOINTS_TO_ALERT

These parameters are discussed in detail below.

Recommendations are also given for handling "checkpoint not complete" messages
found in the alert log, which indicate a need to tune redo logs and
checkpoints.


4. Redo logs and Checkpoint

A checkpoint occurs at every log switch.  If a previous checkpoint is already
in progress, the checkpoint forced by the log switch will override the current
checkpoint.
This necessitates well-sized redo logs to avoid unnecessary checkpoints as a
result of frequent log switches.  The alert log is a valuable tool for
monitoring the rate that log switches occur, and subsequently, checkpoints
occur.  Oracle recommends sizing the online redo logs such that switches occur
no more than once per hour. The following is an example of quick log switches
from the alert log:
 

Fri May 16 17:15:43 1997
Thread 1 advanced to log sequence 1272
  Current log# 3 seq# 1272 mem# 0: /prod1/oradata/logs/redologs03.log
Thread 1 advanced to log sequence 1273
  Current log# 1 seq# 1273 mem# 0: /prod1/oradata/logs/redologs01.log
Fri May 16 17:17:25 1997
Thread 1 advanced to log sequence 1274
  Current log# 2 seq# 1274 mem# 0: /prod1/oradata/logs/redologs02.log
Thread 1 advanced to log sequence 1275
  Current log# 3 seq# 1275 mem# 0: /prod1/oradata/logs/redologs03.log
Fri May 16 17:20:51 1997
Thread 1 advanced to log sequence 1276
  Current log# 1 seq# 1276 mem# 0: /prod1/oradata/logs/redologs01.log
 

If redo logs switch every 3 minutes, you will see performance degradation.
This indicates the redo logs are not sized large enough to efficiently handle
the transaction load.

size of the redolog files.

5. Instance parameters related with the checkpoint process
LOG_CHECKPOINT_INTERVAL
The LOG_CHECKPOINT_INTERVAL init.ora parameter controls how often a checkpoint
operation will be performed based upon the number of operating system blocks
that have been written to the redo log.  If this value is larger than the size
of the redo log, then the checkpoint will only occur when Oracle performs a
log switch from one group to another, which is preferred.

NOTE: Starting with Oracle 8.1, LOG_CHECKPOINT_INTERVAL will be interpreted
to mean that the incremental checkpoint should not lag the tail of the
log by more than log_checkpoint_interval number of redo blocks.

On most Unix systems the operating system block size is 512 bytes.  This means
that setting LOG_CHECKPOINT_INTERVAL to a value of 10,000 (the default
setting), causes a checkpoint to occur after 5,120,000 (5M) bytes are written
to the redo log.  If the size of your redo log is 20M, you are taking 4
checkpoints for each log.

LOG_CHECKPOINT_INTERVAL influences when a checkpoint occurs, which means
careful attention should be given to the setting of this parameter, keeping it
updated as the size of the redo log files is changed.  The checkpoint
frequency is one of the factors which impacts the time required for the
database to recover from an unexpected failure.  Longer intervals between
checkpoints mean that if the system crashes, more time will be needed for the
database to recover.  Shorter checkpoint intervals mean that the database will
recover more quickly, at the expense of increased resource utilization during
the checkpoint operation.

This parameter also impacts the time required to complete a database recovery
operation during the roll forward phase of recovery.  The actual recovery time
is dependent upon this time, and other factors, such as the type of failure
(instance or system crash, media failure, etc.), and the number of archived
redo logs which need to be applied.

LOG_CHECKPOINT_TIMEOUT
The LOG_CHECKPOINT_TIMEOUT init.ora parameter controls how often a checkpoint
will be performed based on the number of seconds that have passed since the
last checkpoint.  

NOTE: Starting with Oracle 8.1, LOG_CHECKPOINT_TIMEOUT will be interpreted
to mean that the incremental checkpoint should be at the log position
where the tail of the log was LOG_CHECKPOINT_TIMEOUT seconds ago.

Checkpoint frequency impacts the time required for the
database to recover from an unexpected failure.  Longer intervals between
checkpoints mean that more time will be required during database recovery.
Oracle recommends using LOG_CHECKPOINT_INTERVAL to control the checkpoint
interval rather than LOG_CHECKPOINT_TIMEOUT, which will initiate a checkpoint
every "n" seconds, regardless of the transaction frequency.  This can cause
unnecessary checkpoints in cases where transaction volumes vary.  Unnecessary
checkpoints must be avoided whenever possible for optimal performance.

It is a misconception that setting LOG_CHECKPOINT_TIMEOUT to a given value
will initiate a log switch at that interval, enabling a recovery window used
for a stand-by database configuration.  Log switches cause a checkpoint, but a
checkpoint does not cause a log switch.  The only way to cause a log switch is
manually with ALTER SYSTEM SWITCH LOGFILE or resizing the redo logs to cause
more frequent switches.  This is controlled by operating system blocks, not a
timed interval.

Sizing of the online redo logs is critical for performance and recovery.

See additional sections below on redo logs and checkpoints.

LOG_CHECKPOINTS_TO_ALERT:
The LOG_CHECKPOINTS_TO_ALERT init.ora parameter, when set to a value of TRUE,
allows you to log checkpoint start and stop times in the alert log.  This is
very helpful in determining if checkpoints are occurring at the optimal
frequency and gives a chronological view of checkpoints and other database
activities occurring in the background.
See <Note:76713.1> to have more detail on How those instance parameters can influence the checkpoint.


6. Understanding Checkpoint Error messages  (&#65533;Cannot allocate new log&#65533; and &#65533;Checkpoint not complete&#65533;)

Sometimes, you can see in your alert.log file, the following corresponding
messages:
  Thread 1 advanced to log sequence 248
    Current log# 2 seq# 248 mem# 0: /prod1/oradata/logs/redologs02.log
  Thread 1 cannot allocate new log, sequence 249
  Checkpoint not complete
 

This message indicates that Oracle wants to reuse a redo log file, but the
corresponding checkpoint associated is not terminated.  In this case, Oracle
must wait until the checkpoint is completely realized. This situation may be encountered particularly when the transactional activity is important.

 This situation may also be checked by tracing two statistics in the
BSTAT/ESTAT report.txt file.  The two statistics are:

  - Background checkpoint started.
  - Background checkpoint completed.

These two statistics must not be different more than once.  If this is
not true, your database hangs on checkpoints.  LGWR is unable to continue
writing the next transactions until the checkpoints complete.

Three reasons may explain this difference:

- A frequency of checkpoints which is too high.
- A checkpoints are starting but not completing
- A DBWR which writes too slowly.

The number of checkpoints completed and started as indicated by
these statistics should be weighed against the duration of the bstat/estat
report.  Keep in mind the goal of only one log switch per hour, which ideally
should equate to one checkpoint per hour as well.

The way to resolve incomplete checkpoints is through tuning checkpoints and
logs:

1) Give the checkpoint process more time to cycle through the logs
       -  add more redo log groups
       -  increase the size of the redo logs
2) Reduce the frequency of checkpoints
        - increase LOG_CHECKPOINT_INTERVAL
        - increase size of online redo logs
3) Improve the efficiency of checkpoints enabling the CKPT process with CHECKPOINT_PROCESS=TRUE
4) Set LOG_CHECKPOINT_TIMEOUT = 0.  This disables the checkpointing based on
     time interval.
5) Another means of solving this error is for DBWR to quickly write the dirty
buffers on disk.  The parameter linked to this task is:

  DB_BLOCK_CHECKPOINT_BATCH.

 
DB_BLOCK_CHECKPOINT_BATCH specifies the number of blocks which are dedicated
inside the batch size for writing checkpoints.  When you want to accelerate
the checkpoints, it is necessary to increase this value.
7. Oracle Release Information

The CKPT process is optional in lower versions of Oracle7, but is mandatory in
Oracle8.
In versions 7.0 - 7.3.2, the CKPT is an optional background process which is
enabled by setting CHECKPOINT_PROCESS=TRUE in init.ora.

In versions 7.3.3 and 7.3.4, the CKPT process will be started automatically
regardless of the CHECKPOINT_PROCESS setting if either of the following
conditions exist:

 - a large value for DB_FILES (50 or higher)
 - a large value for DB_BLOCK_BUFFERS (10,000 or higher)

In version 8.0.3 and higher, the CKPT process is always enabled.  Attempting
to set CHECKPOINT_PROCESS in the init.ora will give the following error:

 LM-101 "unknown parameter name checkpoint_process"

Starting from Oracle8i, Oracle Corporation recommends that Enterprise
Edition users who were using incremental checkpoints in an earlier release
to use fast-start checkpointing in Oracle8i. In fast-start checkpointing, the
FAST_START_IO_TARGET parameter replaces DB_FILE_MAX_DIRTY_TARGET.
FAST_START_IO_TARGET specifies the number of I/Os that should be needed during
crash or instance recovery.When you set this parameter, DBWR writes dirty buffers out
more aggressively, so that the number of blocks that must be processed during recovery
stays below the value specified in the parameter.

So in Oracle8i The incremental checkpoint position should not lag the tail of the
log by more than LOG_CHECKPOINT_INTERVAL operating system blocks.The
LOG_CHECKPOINT_INTERVAL and FAST_START_IO_TARGET is taken into account
to determine how far behind the end of the redo stream the checkpoint position can really be.

In Oracle9i FAST_START_MTTR_TARGET is the preferred method of specifying
how far the checkpoint position should be behind the tail of the redo stream. However,
LOG_CHECKPOINT_INTERVAL is still supported if needed. It functions as per the Oracle8i
behaviour above.

FAST_START_MTTR_TARGET enables you to specify the number of seconds the database
takes to perform crash recovery of a single instance. FAST_START_MTTR_TARGET
can be overridden by either FAST_START_IO_TARGET or LOG_CHECKPOINT_INTERVAL.



8. Using Statspack to determine Checkpointing problems


Statspack snapshots can be taken every 15 minutes or so, these reports gather useful
information about number of checkpoints started and checkpoints completed and number
of database buffers written during checkpointing for that window of time . It also contains
statistics about redo activity. Gathering and comparing these snapshot reports gives you
a complete idea about checkpointing performance at different periods of time.

Another important thing to watch in statspack report is the following wait events,
they could be a good indication about problems with the redo log throughput and checkpointing:

log file switch (checkpoint incomplete)
log file switch (archiving needed)
log file switch/archive
log file switch (clearing log file)
log file switch completion
log switch/archive
log file sync


In the case when one or more of the above wait events is repeated frequently
with considerable values then you need to take an action like adding More
online redo log files or increasing their sizes and/or modifying checkpointing parameters.




0
 
LVL 47

Expert Comment

by:schwertner
ID: 10811638
Redo logs can be increased by dropping REDO log groups and recreating them:

To drop an online redo log group, you must have the ALTER DATABASE system
privilege. Before dropping an online redo log group, consider the following
restrictions and precautions:
n An instance requires at least two groups of online redo log files, regardless of
the number of members in the groups. (A group is one or more members.)
n You can drop an online redo log group only if it is inactive. If you need to drop
the current group, first force a log switch to occur.
n Make sure an online redo log group is archived (if archiving is enabled) before
dropping it. To see whether this has happened, use the V$LOG view.
SELECT GROUP#, ARCHIVED, STATUS FROM V$LOG;
GROUP# ARC STATUS
--------- --- ----------------
1 YES ACTIVE
2 NO CURRENT
3 YES INACTIVE
4 YES INACTIVE
Drop an online redo log group with the SQL statement ALTER DATABASE with the
DROP LOGFILE clause.
The following statement drops redo log group number 3:
ALTER DATABASE DROP LOGFILE GROUP 3;
When an online redo log group is dropped from the database, and you are not
using the Oracle Managed Files feature, the operating system files are not deleted
When an online redo log member is dropped from the database, the operating
system file is not deleted from disk. Rather, the control files of the associated
database are updated to drop the member from the database structure. After
dropping an online redo log file, make sure that the drop completed successfully,
and then use the appropriate operating system command to delete the dropped
online redo log file.
To drop a member of an active group, you must first force a log switch.
Forcing Log Switches
A log switch occurs when LGWR stops writing to one online redo log group and
starts writing to another. By default, a log switch occurs automatically when the
current online redo log file group fills.
You can force a log switch to make the currently active group inactive and available
for online redo log maintenance operations. For example, you want to drop the
currently active group, but are not able to do so until the group is inactive. You may
also wish to force a log switch if the currently active group needs to be archived at a
specific time before the members of the group are completely filled. This option is
useful in configurations with large online redo log files that take a long time to fill.
To force a log switch, you must have the ALTER SYSTEM privilege. Use the ALTER
SYSTEM statement with the SWITCH LOGFILE clause.
The following statement forces a log switch:
ALTER SYSTEM SWITCH LOGFILE;


Creating Online Redo Log Groups


To create a new group of online redo log files, use the SQL statement ALTER
DATABASE with the ADD LOGFILE clause.
The following statement adds a new group of redo logs to the database:
ALTER DATABASE
ADD LOGFILE ('/oracle/dbs/log1c.rdo', '/oracle/dbs/log2c.rdo') SIZE 500K;
0
 
LVL 23

Assisted Solution

by:seazodiac
seazodiac earned 166 total points
ID: 10813526
@pradiptam
try the following queries

and adjust the LOG_BUFFER parameter accordingly (do a testing increment on log_buffer, I mean adjust a bit at a time)...


prompt REDO BUFFER ALLOCATION RETRIES RATIO < 1%
prompt else increase LOG_BUFFER
column "REDO ALLOCATION RETRIES RATIO" format 999.9999

select a.value/(b.value+0.0001)*100 "REDO ALLOCATION RETRIES RATIO"
  from v$sysstat a, v$sysstat b
 where a.name = 'redo buffer allocation retries'
   and b.name = 'redo writes';


prompt REDO BUFFER SPACE REQUEST RATIO < 0.02%
prompt else increase LOG_BUFFER
column "REDO SPACE REQUEST RATIO" format 999.9999

select a.value/(b.value+0.0001)*100 "REDO SPACE REQUEST RATIO"
  from v$sysstat a, v$sysstat b
 where a.name = 'redo log space requests'
   and b.name = 'redo entries';
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Join & Write a Comment

Introduction A previously published article on Experts Exchange ("Joins in Oracle", http://www.experts-exchange.com/Database/Oracle/A_8249-Joins-in-Oracle.html) makes a statement about "Oracle proprietary" joins and mixes the join syntax with gen…
How to Unravel a Tricky Query Introduction If you browse through the Oracle zones or any of the other database-related zones you'll come across some complicated solutions and sometimes you'll just have to wonder how anyone came up with them.  …
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
This video shows how to Export data from an Oracle database using the Datapump Export Utility.  The corresponding Datapump Import utility is also discussed and demonstrated.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now