Link to home
Start Free TrialLog in
Avatar of jflynt
jflyntFlag for United States of America

asked on

Power to server was turned off and will not restart

Have a sunfire 280r no longer supported but running archive data for company. power was pulled during a ups changeout and the system will not boot back up. Comes up and says  unable to repair the /filesystem. Run fsck -F ufs /dev/md/rdsk/d0. Exit the shell when done to continue the boot process.

Type control-d to proceed with normal startup, or give root password for system maintenance. Once control-d is selected, its says  the state of /dev/md/dsk/d0 is bad.  any suggestions in bringing this back up?  it has not been maintained due to its nature but it needs to work and be maintained, so its mine now.
any help appreciated.
Avatar of Harry_Truman
Harry_Truman

Hmm, that doesn't sound good.  We've had similar instances where we've lost power to a few of our databases and we've had to restore a backup to fix the corruption that's occurred -- I'd be willing to guess that's what happened to yours.  You said it wasn't maintained at all, is it safe to assume you have no backups of any kind for your data?
Avatar of jflynt

ASKER

nope no backups. when i bring up in single mode, it says cannot create utmpx wtmpx files and also during the fsck, it starts stating files should not be 0 but 2 then asks if ok? to wit i type in Y for yes. is that correct?  this unit is just used to lock at data for open records etc and for some reason it was never maintained due to just the query mode. oh well.
Avatar of robocat
Boot agin but this time log in with the root password and give the output of

 /usr/sbin/metastat

If you enter the root password, what happens? You stated what happens when you press on ^D, not what happens when you enter root password.

Single user mode won't help you, unless you boot from external media, such as Solaris Installation CD, and remove the 2nd mirror (leave it outside for the time being. To be on a safer ground).

Also, I can assume you're using Solaris 8 - Add the "logging" option to your /etc/vfstab to decrease the chance of such a mishap again.
Avatar of jflynt

ASKER

robocat when you say login with root password i assume you mean when the options come up for control-d or the root password. after i enter the root password it does a few things and then shows cannot create the utmpx file or wtmpx file. i will add the metastat tomorrow and let you know. i appreciate the help with this and will get back as soon as i can get to the machine. when i go to that file how do i get the output you want?  cat it? no very familiar with this system.
Avatar of jflynt

ASKER

ezaton i have no idea how to remove 2nd mirror. of course they cant find the original disks for this either. this is going to be one of those problems!!
If you get into a shell (after entering the root password) you should, as suggested, run the command:
fsck -y /dev/md/dsk/d0
This will fix the errors, and let you boot the system (following the test, you need to press of ^D, and then the system will reboot. This is OK).
jflynt, metastat is a command, not a file. It will tell you if one of the two mirrored disks has failed.

If there's a failed disk, it is best to remove it from the mirror before fsck'ing the filesystem as ezaton suggests.

Let us know.
Avatar of jflynt

ASKER

ezaton i ran the command and it looked like everything restored and appeared to restore. when i went for reboot, it again said   cannot create utmpx and then went to prompt #.

robocat i didnt see your post until i had already ran the above command. i will do this tomorrow. if i dont get anywhere, i will remove the machine from the rack and bring to IT office so i can communicate better. I really appreciate your help. If i can get this thing up and going, i will maintain it properly. we have to keep the Sheriff happy.
Check that you have some free space. You can do it using the command
df -k
If you don't have any free space, consider cleaning up some of the crash files in
/var/crash/<hostname>
(replace <hostname> with the name of the server)
Avatar of jflynt

ASKER

i have no free space. what files can i delete safely. i have a huge dead.letter file.
How huge? 1M 10M?

Look at /var/crash, as I suggested. Look at /var/adm for old large log files.
Avatar of jflynt

ASKER

i am back working on this today and will follow up tonight. thanks for the suggestions and help.
Avatar of jflynt

ASKER

i looked at var folder and had one file marked vmcore.0 that had 1888239616. i tried to delete but read only file. there were two others in it one was unix.0 which was 689120 and the other was bounds and was 2, can i delete those?
You can delete all 3, but it sounds like your file system is still being mounted read only.

What's the output from the metastat command ?

If you can delete them (due to the possibility the filesystem is in read-only mode), do that. It's OK. If you cannot, let me know, and we'll move on from there.
Avatar of jflynt

ASKER

it is a large file  but shows need maintenance on the drives. i dont have a way to get the output to you. i cant get more to work or i am doing it wrong but i would copy it. i was doing more metastat to allow page breaks. is that correct. also i try to delete and they are all read only. i am really at a loss as to how to get the output from metastat to you.
Avatar of jflynt

ASKER

the dead.letter file is 361651571. what is the command to check disk space and status. i dont have my cheat sheet.
Give us the output of

metastat
mount
df -k
Avatar of jflynt

ASKER

i cant get metastat because it will not pipe.
mount does nothin
df -k does nothing

i tried   more metastat and metastat more and it wont pipe. how do i make it pause at end of page

metastat | more doesn't work ?

Do you have the /usr filesystem mounted ? Try

ls /usr/bin

and tell us if you see any files

If you don't see any files, try mountall

You could also try

metastat >/tmp/out.txt    

(provided you have a /tmp at this moment)
Avatar of jflynt

ASKER

i cannot get pipe symbol to work on this computer. i see it but cannot get it active. this is not standard unix sunserver keyboard. is there a keystroke shortcut for it?
i do see files when i do ls /usr.bin
mountall returns   /dev/md/rdsk/d4  is stable
how do i get the out.txt file from server to you all? sorry to be a pain about this but i cant figure this part out.

Perhaps you could now do

/usr/bin/more /tmp/out.txt and check the entire output if you seen anything like this

d0: Mirror
    Submirror 0: d10
      State: Needs maintenance             <---- this would indicate a disk failure
    Submirror 1: d20
      State: Okay

or any other indication of an error ?
I don't think it is a problem of ds device needs maintenance, but I think it is a case of ro state due to lack of space.

I would have started with 'metadb' to verify all ds replicas are ok. I think that they are not, and that is the reason for the incorrect mount case. If they were found to be ok (aka, there is a quorum of working metadbs), I would have remounted / as read-write (I don't remember the exact syntax now), and remove the large files. Later on, I would have dealt with any remaining un-synced filesystems.
You could indeed try the metadb command besides the metastat command

Check the metadb ouput if you see anything like this:

metadb
       flags         first blk    block count
    M     p          unknown      unknown      /dev/dsk/c0t3d0s3           <------ the "unkown" would indicate disk or other serious failure
    M     p          unknown      unknown      /dev/dsk/c0t3d0s3
    a m  p  luo      16           1034         /dev/dsk/c0t2d0s3
    a    p  luo      1050         1034         /dev/dsk/c0t2d0s3

Avatar of jflynt

ASKER

this time the control-d came up and booted up. i deleted the crash files and the dead.letter. i have the output of the metastat  and metadb. i have placed a memory key in the usb slot. can i get to it somehow.
i have desktop running at this point. if no way to transfer to memory k i will have to write it down on here but i do have some maintenance needs i need help with or at least looked at.
Check if you can access the network at this time, perhaps you can telnet to the server from a PC and save the output from the commands using copy/paste.
Avatar of jflynt

ASKER

i can ping other computers on the network. what is the telnet command i need to use. i can ping all of the servers.
Okay, you could use the "telnet <yourserver>" command from a DOS/CMD box.

Or you could try to ftp the files from your server to another location that you can access from your PC ?
Avatar of jflynt

ASKER

d0: Mirror
    Submirror 0: d10
      State: Needs maintenance  
    Submirror 1: d20
      State: Okay          
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 8193204 blocks

d10: Submirror of d0
    State: Needs maintenance  
    Invoke: metareplace d0 c1t0d0s0 <new device>
    Size: 8193204 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t0d0s0                   0     No    Maintenance  


d20: Submirror of d0
    State: Okay          
    Size: 8193204 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t1d0s0                   0     No    Okay          


d1: Mirror
    Submirror 0: d11
      State: Okay          
    Submirror 1: d21
      State: Okay          
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 4096602 blocks

d11: Submirror of d1
    State: Okay          
    Size: 4096602 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t0d0s1                   0     No    Okay          


d21: Submirror of d1
    State: Okay          
    Size: 4096602 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t1d0s1                   0     No    Okay          


d4: Mirror
    Submirror 0: d2
      State: Okay          
    Submirror 1: d3
      State: Needs maintenance  
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 58632255 blocks

d2: Submirror of d4
    State: Okay          
    Size: 58632255 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t1d0s4                   0     No    Okay          
this is the output i have now.


d3: Submirror of d4
    State: Needs maintenance  
    Invoke: metareplace d4 c1t0d0s4 <new device>
    Size: 58632255 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t0d0s4                   0     No    Maintenance
Avatar of jflynt

ASKER

/ on /dev/md/dsk/d0 read/write/setuid/intr/largefiles/onerror=panic/dev=1540000  
on Wed Nov 14 14:24:49 2007   out put of mount
/proc on /proc read/write/setuid/dev=4140000 on Wed Nov 14 14:24:48 2007
/dev/fd on fd read/write/setuid/dev=4200000 on Wed Nov 14 14:24:49 2007
/etc/mnttab on mnttab read/write/setuid/dev=4300000 on Wed Nov 14 14:24:51 2007
/var/run on swap read/write/setuid/dev=1 on Wed Nov 14 14:24:51 2007
/tmp on swap read/write/setuid/dev=2 on Wed Nov 14 14:24:55 2007
/sds on /dev/md/dsk/d4 read/write/setuid/intr/largefiles/onerror=panic/dev=15400
04 on Wed Nov 14 14:24:55 2007
OK, as I expected from the beginning, it seems you lost a disk in the system (c1t0d0) and this the root cause of your problems. This is the disk in drive bay 0 for an E280R.

You need to obtain a replacement disk and then follow the procedure for replacing the disk:

http://docs.sun.com/app/docs/doc/819-2789/6n528vs3s?l=en&a=view

Meanwhile you can continue running on the other disk that is a mirror for the failed disk.

If you check the metadb command, you will probably see some meta db replicas in unknown state (see above). You need to delete these using a command  similar to:

metadb -d c0t0d0s3   (but with the devices in unkown state)

This way the system should boot correctly  until you can replace the disk.

Avatar of jflynt

ASKER

metadb        
        flags           first blk       block count
     a m  p  luo        16              1034            /dev/dsk/c1t0d0s3
     a    p  luo        16              1034            /dev/dsk/c1t1d0s3
Avatar of jflynt

ASKER

metadb        
        flags           first blk       block count
     a m  p  luo        16              1034            /dev/dsk/c1t0d0s3
     a    p  luo        16              1034            /dev/dsk/c1t1d0s3

how do i get it too boot up properly now? also, what is the command to reboot when not at console.
First, you might want to take a backup of your data if you haven't done so yet.

It seems the disk is not 100% dead yet, because the meta db on disk c1t0d0 is still online.

Try adding some replica's first:

metadb -a -c 2 /dev/rdsk/c1t1d0s3

See how that goes. You now should be able to boot the system using the command "boot disk2" at the OK prompt (shutdown first).

If you're not at the console you can reboot with "reboot" or "init 6", but of course if the boot fails you won't see what's happening. The E280R also has a remote management card (RSC) that allows you to connect to the console over the network, but this is probably not configured on your machine.
Avatar of jflynt

ASKER

it is up and working fine now. but i still have this on metastat. i assume i need to figure out which hard drive it is and replace it. with this machine, is there anything in particular i need to look for in a hard drive. i also want to thank you and i hope you have a good thanksgiving.

current metastat

# metastat
d0: Mirror
    Submirror 0: d10
      State: Needs maintenance  
    Submirror 1: d20
      State: Okay          
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 8193204 blocks

d10: Submirror of d0
    State: Needs maintenance  
    Invoke: metareplace d0 c1t0d0s0 <new device>
    Size: 8193204 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t0d0s0                   0     No    Maintenance  


d20: Submirror of d0
    State: Okay          
    Size: 8193204 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t1d0s0                   0     No    Okay          


d1: Mirror
    Submirror 0: d11
      State: Okay          
    Submirror 1: d21
      State: Okay          
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 4096602 blocks

d11: Submirror of d1
    State: Okay          
    Size: 4096602 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t0d0s1                   0     No    Okay          


d21: Submirror of d1
    State: Okay          
    Size: 4096602 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t1d0s1                   0     No    Okay          


d4: Mirror
    Submirror 0: d2
      State: Okay          
    Submirror 1: d3
      State: Needs maintenance  
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 58632255 blocks

d2: Submirror of d4
    State: Okay          
    Size: 58632255 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t1d0s4                   0     No    Okay          


d3: Submirror of d4
    State: Needs maintenance  
    Invoke: metareplace d4 c1t0d0s4 <new device>
    Size: 58632255 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t0d0s4                   0     No    Maintenance
ASKER CERTIFIED SOLUTION
Avatar of robocat
robocat

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial