asked on

Oracle refuses to load db

While removing rows from a 19.2M-row table in Oracle 8.1.6 yesterday, in batches of about 300k rows at a time, the system suddenly froze and has been unresponsive since. We've tried many different things we've found online and sometimes we'll get past one error message only to get to another. I don't expect this to be an easy one, so 500 points. Plus, it's for a very large website with about a million hits per day, so urgency it's quite urgent; I'm sure I'm getting cussed out a million times a day.

I expect this to be multi-step. If you can get me past one error message, I'm likely to get another.

And there is no cold backup.

Current error situation:

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01245: offline file 21 will be lost if RESETLOGS is done
ORA-01110: data file 21: '/u07/oradata/msg/big_rbs01.dbf'

SQL> recover database until cancel;
ORA-00279: change 6898137434 generated at 03/25/2004 11:49:04 needed for thread
1
ORA-00289: suggestion : /export/oracle/product/8.1.6/dbs/arch1_1.dbf
ORA-00280: change 6898137434 for thread 1 is in sequence #1

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

There's no such thing as arch1_1.dbf on this system, so what happens when I hit <RET> is predictable:

ORA-00308: cannot open archived log
'/export/oracle/product/8.1.6/dbs/arch1_1.dbf'
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Additional information: 3

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u02/oradata/msg/system01.dbf'

And let me backtrack just a bit to be sure I'm covering as many questions as possible right upfront.

oracle@oramsg<msg>$ sqlplus internal

SQL*Plus: Release 8.1.6.0.0 - Production on Thu Mar 25 12:39:44 2004

(c) Copyright 1999 Oracle Corporation. All rights reserved.

Connected to:
Oracle8i Enterprise Edition Release 8.1.6.1.0 - Production
With the Partitioning option
JServer Release 8.1.6.1.0 - Production

SQL> startup mount;
ORA-01081: cannot start already-running ORACLE - shut it down first
SQL> shutdown
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 634404848 bytes
Fixed Size 69616 bytes
Variable Size 141750272 bytes
Database Buffers 491520000 bytes
Redo Buffers 1064960 bytes
Database mounted.
SQL>

Looks peachy, eh?

SQL> select * from msg_user where userid=4591117;
select * from msg_user where userid=4591117
*
ERROR at line 1:
ORA-01219: database not open: queries allowed on fixed tables/views only

Okay, not peachy.

And if I try to log in as the correct user instead of internal:

oracle@oramsg<msg>$ sqlplus message@msg

SQL*Plus: Release 8.1.6.0.0 - Production on Thu Mar 25 12:42:19 2004

(c) Copyright 1999 Oracle Corporation. All rights reserved.

Enter password:
ERROR:
ORA-01033: ORACLE initialization or shutdown in progress

Enter user-name:

This is running on a Sun/Solaris box and if we try to use the W2K Enterprise Manager, we get the same 01033 error.

SOLUTION

baonguyen1

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Bob_Zumbrunnen

ASKER

Unfortunately a backup isn't a viable option. A problem we've had for months but have ignored because it's only an inconvenience is very slow throughput between this box and our new development box, which is the only place we've got enough free space to make a copy.

This db is about 40 gig in size and transfer rates are running about 250KB/s despite both NICs being locked at 100/FD.

Backing up simply isn't possible right now, at it'll probably take over a week to finish.

Bob_Zumbrunnen

ASKER

I was able to get into the Enterprise Manager well enough to see ControlFiles, and found what appeared to be valid archive names, so I tried one.

SQL> recover database using backup controlfile until cancel
ORA-00279: change 6898137435 generated at 03/25/2004 11:49:04 needed for thread
1
ORA-00289: suggestion : /export/oracle/product/8.1.6/dbs/arch1_1.dbf
ORA-00280: change 6898137435 for thread 1 is in sequence #1

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/export/oracle/admin/msg/arch/arch_49237.arc
ORA-00310: archived log contains sequence 49237; sequence 1 required
ORA-00334: archived log: '/export/oracle/admin/msg/arch/arch_49237.arc'

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01157: cannot identify/lock data file 21 - see DBWR trace file
ORA-01110: data file 21: '/u07/oradata/msg/big_rbs01.dbf'

Subsequent attempts at recover database using backup controlfile until cancel don't prompt for a log file. They just return the same 1547,1157,1110 errors.

SOLUTION

stmontgo

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

rajnadimpalli

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

baonguyen1

I'm not sure why you dont have the archived file as expected. You can use the below query to find the sequence# that the restore requires:

select * from v$log_history where sequence#=1;

If it there then then the log file has been archived and you have to apply the required file. If not, query:

select * from v$log;

Then apply the log file with the sequence# desired

Bob_Zumbrunnen

ASKER

Stmontgo: A dba from the company we bought the site from is having a look. When he's done, I'll try what you're suggesting. Was already steeling myself for that, since there are over 40k log files.

rajnadimpalli: At this point, I haven't a clue what we've done in the past 38 hours. Only a snapshot of where we're at right now. The reason we're trying "alter database open resetlogs;" is this:

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01589: must use RESETLOGS or NORESETLOGS option for database open

and

SQL> alter database open noresetlogs;
alter database open noresetlogs
*
ERROR at line 1:
ORA-01588: must use RESETLOGS option for database open

resetlogs does this:
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01190: controlfile or data file 21 is from before the last RESETLOGS
ORA-01110: data file 21: '/u07/oradata/msg/big_rbs01.dbf'

Bob_Zumbrunnen

ASKER

Baonguyen1:

SQL> select * from v$log_history where sequence#=1;

no rows selected

SQL>
SQL> select * from v$log;

GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS
---------- ---------- ---------- ---------- ---------- --- ----------------
FIRST_CHANGE# FIRST_TIM
------------- ---------
1 1 0 52428800 2 YES UNUSED
0

2 1 1 20971520 2 NO CURRENT
6898137433 25-MAR-04

4 1 0 31457280 1 YES UNUSED
0

ASKER CERTIFIED SOLUTION

seazodiac

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Bob_Zumbrunnen

ASKER

I'm comfortably beyond panic (4 hours sleep in what's now 39 hours helps), and always have a towel handy. :)

I honestly don't know if it's in ARCHIVELOG mode or not, or how to check. I'm actually very good with W2K and SQL Server, but am quite dangerous in Solaris/Oracle. As the foregoing attests.

Here's the latest from the dba that's working on it:

Bob,

I can finish the media recover part. But, when I try to open the
database, I got ora-00704. I will take a nother look after lunch.

Trying not to step on his toes while he's working, but since he's at lunch....

oracle@oramsg<msg>$ sqlplus message@msg

SQL*Plus: Release 8.1.6.0.0 - Production on Thu Mar 25 16:09:11 2004

(c) Copyright 1999 Oracle Corporation. All rights reserved.

Enter password:
ERROR:
ORA-01034: ORACLE not available

Enter user-name: internal
Connected to an idle instance.

SQL> startup mount;
ORACLE instance started.

Total System Global Area 634404848 bytes
Fixed Size 69616 bytes
Variable Size 141750272 bytes
Database Buffers 491520000 bytes
Redo Buffers 1064960 bytes
Database mounted.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [3668], [1], [2], [98219], [98219],
[4], [], []

SQL>

seazodiac

ok, after reading a few posts above, I think you are in ARCHIVELOG mode:

but here is how to verify:

after you mount the database:

SQL>archive log list;

Bob_Zumbrunnen

ASKER

Survey says "Yes". Thanks. One question down, so many more to go.

Connected to:
Oracle8i Enterprise Edition Release 8.1.6.1.0 - Production
With the Partitioning option
JServer Release 8.1.6.1.0 - Production

SQL> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /export/oracle/admin/msg/arch/
Oldest online log sequence 0
Next log sequence to archive 1
Current log sequence 1

seazodiac

yes, you are right. More questions are under way;

1. do you have a FULL database backup?
2. do you have all the archivelog files since that FULL database backup?

if you answer both YES , then you are in pretty good shape

Bob_Zumbrunnen

ASKER

Side note: Since the system is set to email me any errors, I know that the guy who's working on it was working on creating a new control file, which I was trying to do last night with more generous log space. I get the feeling he did successfully get the new control file made. I never could. Saved the old one as a text file, edited it, then spent an hour or so last night addressing errors it was reporting, but never got them all.

stmontgo

you need to fake out the control file by issuing
recover database until cancel using backup controlfile

this ignores the SCN in the control and allows you

> supply the on line redo logs

perform incomplete media reocvery which allows you to open resetlogs

the ora 600 appears to indicate that the control file was recreated, did you recreate it?

seazodiac

That's my hunch too although there are other possibilities.

BOB: one of your above comments--->SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01190: controlfile or data file 21 is from before the last RESETLOGS
ORA-01110: data file 21: '/u07/oradata/msg/big_rbs01.dbf'

My other questions are:

Did you try to restore database files and yet failed?

Did you use old control files from backup?

Bob_Zumbrunnen

ASKER

The only full backup I've got is on tape from May 16, 2003, when I acquired the site. I've got the tapes but don't own a tape drive. Guess what's been added to my shopping list! This is way to scary to risk ever running without a more solid safety net in the future. We're talking 40 gig, representing about 9 years worth of messages on one of the better-known stock market message boards.

I really don't know the status of the archivelog files. There are a LOT of them.

For all I know, there could be a full backup on the system's hard drives. But I don't know. Hmmmm.... Come to think of it, the site was entering "Molasses Mode" for an hour and a half on a daily basis shortly after we got it. Same time every day. And it turned out that a big backup job running during the day was the culprit.

I'm inclined to say the answer to both of your questions is "Yes" and that the guy dinking with the machine has an idea where these are if they're needed since this used to be his baby.

Bob_Zumbrunnen

ASKER

@Stmontgo: I believe he did successfully create a new controlfile within the past hour.

He just asked me to reboot the server, so hopefully we're making progress. I'll also encourage him to catch up on this thread and, if necessary, determine an appropriate points distribution when/if this problem gets solved. A lot of very helpful, knowledgeable people here.

seazodiac

@Bob: just out of curiosity, and it's very important,

Did your DBA do a full backup right before He starts the recovery effort so that you can always come back to the original state of database?

Bob_Zumbrunnen

ASKER

@seazodiac: I seriously doubt it. The system is far too large for him to have been able to have backed it up, even to another part of the same RAID system in anything less than hours.

The closest thing to a realtime backup is the copy of most (but not all) of the data I've migrated to a SQL Server setup over the past several months. In a worst-case scenario, it'd be possible to export most of the data back to Oracle. That's *very* worst-case. It might even be quicker to finish the new ASP.NET front-end and move the users to it than to move the data back.

seazodiac

ok, got it....hmmm...

BAD capacity planning PLUS bad practice means only worse...

I can tell you one thing though:

In the very beginning , the reason why your system froze up is probably your run out of disk space in ARCHIVELOG file system.
can you check if you have any space left in the drive where archive log file is stored?

seazodiac

Also, you were kind of going out and yelling for help after you have screwed up things...

You and your DBA should discuss thoroghly what you will proceed next when you have first crash.

What you SHOULD NOT HAVE DONE is to go frenzy and try every recovery command you can dig up in your memory.

Yes, you are under pressure, but panic does not help either, worsen things almost always...

I wonder what message does Oracle spit out when you tried to start up oracle database after you found out your database is hung ?

do you and your DBA remember?

Bob_Zumbrunnen

ASKER

@seazodiac

I assumed that was precisely why the system froze. It was doing a lot of deletions. I know how to prevent the log-file fillage in SQL Server, but not in Oracle. I should've foreseen this possibility and made sure that, if it's possible to do so, logging was shut off or limited while doing these deletions.

When we first checked last night in Enterprise Manager -> Controlfile -> Record Section, we saw Log History showing 12025 records total and 12025 used. Now it's zero used. One of the changes I tried to make to the control file last night was to bump this up quite a bit.

However, the drive containing the logs is still quite a ways from being full.

Bob_Zumbrunnen

ASKER

@seazodiac

The DBA wasn't involved until around noon (central) today.

We googled our butts off and most of what we tried initially came from here. I've got the printout on my desk. Q_20774022.html

Unfortunately I'm too sleep-deprived and, yes, we were trying too many things in "panic" mode to get this thing up and running again quickly, so I'm doing good to remember my name, let alone what steps we initially took and what error messages we initially got. Now I know better for next time. And also know to take backups more seriously. A dozen-disk RAID array makes you feel too secure, especially when you've also got most of the same data copied to another 6-disk array (albeit in a different RDBMS) and are migrating away from the original system and relying on it less each day.

Doesn't change the fact that tens of thousands of people *do* rely on it every day, though.

So any chastising (well-earned though it is) has already been done right here.

seazodiac

@Bob:

Ok, try this magic parameter, put this line into your init.ora file (at the end).

_allow_read_only_corruption =TRUE

then start up your database in the following order:

SQL>startup mount pfile=<point to the saved init.ora file containing the above line>;

SQL>alter database open read only;

see if you can open your database.

seazodiac

sorry, the line above :SQL>startup mount pfile=<point to the saved init.ora file containing the above line>;

should be:

SQL>startup pfile=<point to the saved init.ora file containing the above line> mount;

Bob_Zumbrunnen

ASKER

@seazodiac:

I ran across that one last night and didn't try it because of the dire warnings that came with it (See? My panic doesn't know *some* limits -- hehe), but just emailed it to the dba asking if he'd consider trying it while he's in there.

Is there supposed to be an underscore between "read" and "only"? Or maybe I'm thinking of a different but similar entry I saw.

Someone also just forwarded me something about a bug that can corrupt controlfiles after 248 days of uptime on this version. I think it's pretty likely to have been up at least that long, and I know the dba is still working on making a new controlfile.

seazodiac

Yes, your concern is well justified.

----->Someone also just forwarded me something about a bug that can corrupt controlfiles after 248 days of uptime on this version. I think it's pretty likely to have been up at least that long, and I know the dba is still working on making a new controlfile

if this is true, it does look promising...

Bob_Zumbrunnen

ASKER

I know this is cumbersome relaying back and forth between the dba and here, but I don't want to interact with the system at all while he's also doing so.

His reply: "It did not work. It said db needs recovery."

seazodiac

@Bob:

That's what I predicted but I havenot yet come to conclusion until now.

you got yourself a Catch-22 here. let me explain:

1. you sure can create a new control file, but it will lag behind the database because some of your database data file has new SCN. (notably, it's rollback data file)
2. you have a rollback data file has newest SCN but not recorded in the control file.

My suggestion, call up Oracle tech support.

rajnadimpalli

@Bob_Zumbrunnen :

My Two Cents :

You sounds like Your Not a DBA and your DBA is not up to mark.....So Don't try to do any furthur restore and recovery for such a Critical Large-sized Database system .Its TOO DANGEROUS with No or half knowledge for this kind of situation.

This sounds odds ,but it can work out for You ..all you need is trust ...yes TRUST...TRUST some one in this group...

Do you like some one from this group (members) ,can take a look on your system ?. You need to provide remote access ( VPN , dial-up ...any access)...or windows netmeeting....yahoo messages..some means to access your system and/or also can help on phone for communication... ..

If you interest ..Let me know...I am doing some basic Virtual DBA support... "For Free" for couple of non-profit organizations. Its "FREE" for people how need real help :) ...

-Raj

rajnadimpalli

Bob_Zumbrunnen

ASKER

@rajnadimpalli:

Let me sleep on it. Literally. I'm heading home now. And the dba who's been working on it admits he's quite rusty with it as he hasn't seen it in several months.

He pointed out to me that there seems to be a gaping hole in the alert_msg.log, btw. The server's on Eastern time and I know the problems started at 5:30 Eastern, but the earliest entry in the log file is about 8:00. Although it occurs to me that maybe earlier entries scroll out as the file fills?

And you're right. I'm most certainly not a DBA. I'm a programmer. Only the kinds of people in this group can truly appreciate the distinction. One isn't necessarily better than the other (except when it comes to deciding who's going to bang out code and who's going to take care of the db); just different specialties.

I'm aware enough of my own dearth of knowledge in this area, and the system is important enough to me, that I'm inclined pretty heavily to hand over the keys to a real expert.

Thanks, all, for the feedback today. A lot of learning happened because of some excellent teaching, and I'll be back in touch after I've gotten in some sleep.

baonguyen1

Bob_Zumbrunnen, after a sleep it should be better. Let us know if we can help more

Good luck !

Bob_Zumbrunnen

ASKER

@baonguyen1

Actually, it got better before a good sleep. Around 6 yesterday, the dba told me he'd finally gotten the database to open. I got on and was able to do a simple query. Around 9, the site started flickering and the dba called to let me know he'd managed to get most of the problems fixed. Some problems remain that remind me of trashed indexes back in my dBase days and I'm hopefully (certain) there's a mechanism for globally rebuilding all indexes, even if it just means programmatically dropping and recreating them.

This weekend, we're shutting down the site for a while and making a full backup onto another machine. And while we're at it, rebuilding indexes.

Since the dba (based in Seattle) speaks rapidly with a strong Chinese accent, I really don't know what he did that finally revived the system. I'll make sure to find out today and report it here.

As for points, if there are no objections, after I post this message I'd like to go through the replies and split the points among multiple contributors. I'd like to give some points to everyone who replied, especially @raj for offering to roll up his sleeves and do my dirty work, and most of all to @seazodiac; giving him lots of points for the large amount of knowledge he imparted, dinging him a few for sometimes being a bit arrogant/demeaning about it, then giving back the dinged points for his obviously having the hard-won right to be so.

Any objections?

PS. The site involved is Silicon Investor. Not a plug. Just wanted to let you know, in case any of you are familiar with the name, how big a site you were helping with.

PPS. If the site owners are looking on, kudos on a great site. As a message-board afficionado (running two big ones now and ran a huge BBS from 87 to 97), my hat's off to you for identifying and implementing a really sweet approach to how to profitably (I'm sure the assumption is fair) run one. The fact that your Alexa ranking is 1255 and mine for Silicon Investor is 6496 and for Investors Hub is 9198 (wow! That sure fell off recently!!!) attests to the fact that not only do you do well with the Alexa demographic (typically tech-savvy), but that you've done a very good job overall with the site.

I know message boards, and this is an excellent example of one!

seazodiac

@Bob:

thanks for the update. Certainly a relief that you got it back!!!

I am glad that I can help...

rajnadimpalli

@Bob : Great news to hear.

I guess next step for your organization is to FULL review of your back/restore/recovery practices.

I observed one thing you saying "Backing up simply isn't possible right now" your database size is 40GB..which is consider as small in today's database environment.

Other thing is "backing up Via network with speeds 250KB/s" sounds like you have architectural problems too..

Well!. "it's for a very large website with about a million hits per day, so urgency it's quite urgent; I'm sure I'm getting cussed out a million times a day." ..
You can simply attach a Tape drive to database server,take your backup...Check-out..Its just costs less then $1000.00

hp StorageWorks AIT 100 GB tape drive : Speed 86 GB/hr : Costs : $ 650.00" + few AIT 100GB cartridge (each $25) "

With this you can backup the your database in 30 minutes.

Btw : I am not HP Sales Guy :)....