Link to home
Start Free TrialLog in
Avatar of jsullivan
jsullivan

asked on

Cannot login after SCSI read error

Hello,

I am running SCO Unix 3.2V4.2 and I am receiving the following message over and over:

Notice: Sdsk: Unrecoverable error reading SCSI disk 0
dev 1/40 (ha=0 id=0 lun=0) Block=250
Medium error: unrecovered read error

The message and block number are always the same.  I would normally try to log in and run scsibadblk to try to find and relocate the bad block but I cannot login.  Since I couldn't login to take the system down, I had to power off/on.  After powering back on, the kernel loads up ok and then it wants to check the file system.  If I let it try to check the file system, I immediately start getting the same read errors.

So I rebooted again and skipped the file system check.  I get the following errors before it Inits single user mode:

/etc/bcheckrc: cannot make pipe
/etc/tcbck: /tmp/sh170: cannot create
/etc/smmck: restore missing files from backup or distribution.

Then I get the "Init: single user mode" message and am prompted to enter control-d for normal startup or the root password for system administration.  However, it won't accept the root password.  It just says "login incorrect".  If I press ctrl-d for normal startup, it starts to init
level 2 but then appears to just lock up (except the read errors keep printing).

Just for kicks I tried to see what would happen if I booted off of the N1 Installation disk and then rooted from the hard disk.  I had the same results.

What can I do?

Thanks,
Jay Sullivan
Avatar of jhance
jhance

This is bad, to say the least.  It appears that you've developed some type of hard drive trouble and it has corrupted either the passwd file or some part of the login system so that you cannot login as root.  Since I assume you want to get back onto this hard drive to recover stuff, you may have to build a complete bootable unix system on an alternate hard drive and then boot THAT one.  You can now login as root and mount the old drive.  I'd backup any stuff I could right away and then try running fdisk.  It may be able to recover some stuff but I suspect you're looking at rebuilding the filesystem and possibly even having to re-initialize the format.
Avatar of jsullivan

ASKER

Thanks for your help.  I was afraid I'd hear an answer such as yours.  I'm a relative novice as a Unix admin so I have a few more questions.  Can I boot from my installation floppies and then mount the file system?  If so, is there any way to find out what file(s) are stored at block 250?  I do have a system backup but it's not that current.  If the file on the bad block is a relatively static file, I can restore it from the backup.  If I can boot from floppy and mount the file system, should I use scsibadblk to mark the block bad?  Thanks.
I'm not sure which boot floppies you have but often these are designed for installation and not recovery.  Hence they will want to init your disk and install unix rather than let you login and recover.  This is why having a spare disk around that can be booted is valuable.  Running fdisk should be able to tell you which files are damaged by the disk errors and will attempt to recover what it can and mark the bad blocks out.  
Did you say that fdisk would be able to tell me what files are damaged?  As far as I can see, fdisk only deals with the partition table.  Did you mean scsibadblk?

I found an emergency boot disk.  I haven't tried it yet however.  I'm really not sure what to do once I boot it.  Right now I have the system booted from the N1/N2 install disks.  I shelled out of the installation and I think I can try mouting the file system at this point.  Would you suggest scratching this idea and going with the emergency boot disk?

Whichever way I go, how do I mount the file system on the hard drive?  (Sorry, I told you I was a novice).

Thanks.
Sorry jhance, I'm going to open this up to everyone again because my system is down and I need to get it back up as soon as possible.
If you can shell out of the installation or use the boot disk to get to a shell as root you can mount the hard drive.  Do it like this. Make sure you have a "root" to mount the disk on.  It needs to be a directory on the boot disk that is not needed.  Often there will be a directory for this purpose called /mnt.

mount /dev/dsk/xxxxx /mnt

/dev/dsk/xxxxx is the drive device.  This varies from system to system and hopefully you know what it is.

Now you should be able to "cd /mnt" and start poking around.
OK, here's the latest info:  I tried to mount the file system but it said that it might be damaged so it didn't mount.  I tried running "fsck /dev/hd0root" but it ran into an unreadable block so it quit.  I finally decided to run scsibadblk and it did find one bad block.  However, it wasn't the block that all the previous error messages had referred to.  The error messages always indicated block 250.  scsibadblk found and moved block 80778.

After scsibadblk was done, I was able to do a fsck on the file system and it found and fixed a number of errors.  After that I tried to boot the system up from the hard drive.  Now it just locks up after it prints the information on all the devices.  Any ideas?
Yes, a file that is important to the operation of unix is missing or corrupted.  Like I said before, build another bootable disk and use it to boot up your system and copy any required and recoverable files off of the bad drive.  One way or another, you're going to have to rebuild this system.
OK, I was able to get past the lockup.  I booted from the emergency floppy, mounted the file system and ran fixperm.  That fixed the /dev/console file.  Now the system boots and will to into multi-user mode successfully.  However, it won't accept any login.  It just says "Login incorrect".  I looked and it looks like the passwd file is ok and I checked /tcb/files/auth/r/root and it looks ok.

I realize that I may have to rebuild the system but I really want to leave that as a last resort.  I have a system backup, is there something that I can restore from that to get the logins to work again?  Or is there something else besides fixperm that I can run to check/fix things out?  It seems like I'm so close now.

Thanks.

ASKER CERTIFIED SOLUTION
Avatar of jhance
jhance

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
jhance,

Thanks very much for your help.  I finally got it fixed.  I ended up by restoring my /etc directory structure from backup.  My guess is that the one key problem was that I was missing the /etc/shadow file, which I'm told keeps the encrypted passwords.  After restoring, I was able to get back in.

Thanks again.