Alert Log shows Thread 1 cannot allocate new log
Posted on 2014-01-31
Linux x64 Virtual Machine
I have a production database that keeps spitting out this message:
Thread 1 cannot allocate new log, sequence xxx.
I know what causes the message to trigger - a redo log has filled and the database goes to use the next one in the cycle but it's currently marked as active, so it has to make sure that all protected blocks are written to disk so it can reuse the online log space.
I have been researching this error message all over the place. More often than not, it appears to be reported by people who are having very fast log switches and DBWn can't keep up with the furious pace of the log switches.
The general consensus seems to be "add more log files or increase the size of redo logs until the message goes away" (Tom Kyte has said this in several replies on asktom).
However, that's not my issue. I started with 3 log group, 2 members per group at 50 MB.
I run archive log backups every hour. RMAN does a log switch automatically when an archive log backup occurs. Average redo generation appears to be between 30 - 40 MB / hr - so my logs rarely fill and do an automatic switch.
Following the advice (although it didn't make much sense to me) I added a 4th, then a 5th and eventually a 6th redo log group. Still the message appears.
Checking v$log, it is not unusual to see the current log marked as CURRENT and all the other 5 marked as ACTIVE. If I understand this correctly, a redo log is marked as current as long as some data block it is protecting is still in memory and hasn't been written to disk yet.
If my understanding is correct, this can mean (6 log groups, swapping once per hour) that I can have dirty blocks in memory for 5-6 hours that haven't been written to disk.
If I manually issue ALTER SYSTEM CHECKPOINT and check v$log again, all 5 CURRENT logs become INACTIVE.
I started mucking around trying to get checkpoints to occur more frequently automatically. Since this database is Standard Edition I can't use FAST_START_MTTR_TARGET (which seems to be the way to do it). I've been playing with LOG_CHECKPOINT_INTERVAL but can't seem to get Oracle to checkpoint more frequently. I started it with a value of 2500 (OS block size is 4K - 2500 blocks is 10MB - thought that should trigger the switch). I've lowered it all the way down to 1000 at the moment and still am not seeing any change.
The wait every hour on the log switch is only 2-3 seconds which I guess isn't the end of the world (I'd rather it went away though), but I'm concerned about unsaved dirty blocks.
I am seeing in the alert log that it will say that it's starting a checkpoint for a particular marker .... but then it won't come up as complete for anywhere from 20 to 40 minutes for that particular checkpoint. It's almost like it starts but then doesn't have the priority to complete it. However if I manually execute a checkpoint, it tidies everything up in seconds.
The VM administrator is seeing nothing server side that concerns him. No excesssive CPU or disk disk activity other than when expected (during backups etc).
Other possible solutions I have seen for this also indicate enabling async io or increasing DBWR's - if you have the former, the latter is best left alone. I do not have that enable at the moment but I wonder if it is even relevant since I do not have frequent log switches going on.
So, after all of the above I guess my question is twofold.
* Do I understand what CURRENT in the v$log status means (do I have dirty blocks in the buffer cache sitting there for hours) ?
* Is there a way to make Oracle checkpoint more frequently and set my redo logs back to INACTIVE more frequently.
Thanks in advance for any insight into what I've got going on here.