Solved

Cannot login after SCSI read error

Posted on 1998-02-26
11
437 Views
Last Modified: 2008-02-01
Hello,

I am running SCO Unix 3.2V4.2 and I am receiving the following message over and over:

Notice: Sdsk: Unrecoverable error reading SCSI disk 0
dev 1/40 (ha=0 id=0 lun=0) Block=250
Medium error: unrecovered read error

The message and block number are always the same.  I would normally try to log in and run scsibadblk to try to find and relocate the bad block but I cannot login.  Since I couldn't login to take the system down, I had to power off/on.  After powering back on, the kernel loads up ok and then it wants to check the file system.  If I let it try to check the file system, I immediately start getting the same read errors.

So I rebooted again and skipped the file system check.  I get the following errors before it Inits single user mode:

/etc/bcheckrc: cannot make pipe
/etc/tcbck: /tmp/sh170: cannot create
/etc/smmck: restore missing files from backup or distribution.

Then I get the "Init: single user mode" message and am prompted to enter control-d for normal startup or the root password for system administration.  However, it won't accept the root password.  It just says "login incorrect".  If I press ctrl-d for normal startup, it starts to init
level 2 but then appears to just lock up (except the read errors keep printing).

Just for kicks I tried to see what would happen if I booted off of the N1 Installation disk and then rooted from the hard disk.  I had the same results.

What can I do?

Thanks,
Jay Sullivan
0
Comment
Question by:jsullivan
  • 6
  • 5
11 Comments
 
LVL 32

Expert Comment

by:jhance
ID: 2008918
This is bad, to say the least.  It appears that you've developed some type of hard drive trouble and it has corrupted either the passwd file or some part of the login system so that you cannot login as root.  Since I assume you want to get back onto this hard drive to recover stuff, you may have to build a complete bootable unix system on an alternate hard drive and then boot THAT one.  You can now login as root and mount the old drive.  I'd backup any stuff I could right away and then try running fdisk.  It may be able to recover some stuff but I suspect you're looking at rebuilding the filesystem and possibly even having to re-initialize the format.
0
 

Author Comment

by:jsullivan
ID: 2008919
Thanks for your help.  I was afraid I'd hear an answer such as yours.  I'm a relative novice as a Unix admin so I have a few more questions.  Can I boot from my installation floppies and then mount the file system?  If so, is there any way to find out what file(s) are stored at block 250?  I do have a system backup but it's not that current.  If the file on the bad block is a relatively static file, I can restore it from the backup.  If I can boot from floppy and mount the file system, should I use scsibadblk to mark the block bad?  Thanks.
0
 
LVL 32

Expert Comment

by:jhance
ID: 2008920
I'm not sure which boot floppies you have but often these are designed for installation and not recovery.  Hence they will want to init your disk and install unix rather than let you login and recover.  This is why having a spare disk around that can be booted is valuable.  Running fdisk should be able to tell you which files are damaged by the disk errors and will attempt to recover what it can and mark the bad blocks out.  
0
 

Author Comment

by:jsullivan
ID: 2008921
Did you say that fdisk would be able to tell me what files are damaged?  As far as I can see, fdisk only deals with the partition table.  Did you mean scsibadblk?

I found an emergency boot disk.  I haven't tried it yet however.  I'm really not sure what to do once I boot it.  Right now I have the system booted from the N1/N2 install disks.  I shelled out of the installation and I think I can try mouting the file system at this point.  Would you suggest scratching this idea and going with the emergency boot disk?

Whichever way I go, how do I mount the file system on the hard drive?  (Sorry, I told you I was a novice).

Thanks.
0
 

Author Comment

by:jsullivan
ID: 2008922
Sorry jhance, I'm going to open this up to everyone again because my system is down and I need to get it back up as soon as possible.
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 32

Expert Comment

by:jhance
ID: 2008923
If you can shell out of the installation or use the boot disk to get to a shell as root you can mount the hard drive.  Do it like this. Make sure you have a "root" to mount the disk on.  It needs to be a directory on the boot disk that is not needed.  Often there will be a directory for this purpose called /mnt.

mount /dev/dsk/xxxxx /mnt

/dev/dsk/xxxxx is the drive device.  This varies from system to system and hopefully you know what it is.

Now you should be able to "cd /mnt" and start poking around.
0
 

Author Comment

by:jsullivan
ID: 2008924
OK, here's the latest info:  I tried to mount the file system but it said that it might be damaged so it didn't mount.  I tried running "fsck /dev/hd0root" but it ran into an unreadable block so it quit.  I finally decided to run scsibadblk and it did find one bad block.  However, it wasn't the block that all the previous error messages had referred to.  The error messages always indicated block 250.  scsibadblk found and moved block 80778.

After scsibadblk was done, I was able to do a fsck on the file system and it found and fixed a number of errors.  After that I tried to boot the system up from the hard drive.  Now it just locks up after it prints the information on all the devices.  Any ideas?
0
 
LVL 32

Expert Comment

by:jhance
ID: 2008925
Yes, a file that is important to the operation of unix is missing or corrupted.  Like I said before, build another bootable disk and use it to boot up your system and copy any required and recoverable files off of the bad drive.  One way or another, you're going to have to rebuild this system.
0
 

Author Comment

by:jsullivan
ID: 2008926
OK, I was able to get past the lockup.  I booted from the emergency floppy, mounted the file system and ran fixperm.  That fixed the /dev/console file.  Now the system boots and will to into multi-user mode successfully.  However, it won't accept any login.  It just says "Login incorrect".  I looked and it looks like the passwd file is ok and I checked /tcb/files/auth/r/root and it looks ok.

I realize that I may have to rebuild the system but I really want to leave that as a last resort.  I have a system backup, is there something that I can restore from that to get the logins to work again?  Or is there something else besides fixperm that I can run to check/fix things out?  It seems like I'm so close now.

Thanks.

0
 
LVL 32

Accepted Solution

by:
jhance earned 200 total points
ID: 2008927
The problem is that it could be almost anything.  One of the executables, one of the library files, one of the data files.  You might try restoring /bin, /lib, /usr/bin, /usr/lib, and /etc from your backup and see if that helps.  BTW, if you can boot from a floppy and mount the drive, PLEASE make a backup of what you can NOW.  Whatever caused the initial problem might still be a problem and you could loose everything eventually if you don't get it copied.
0
 

Author Comment

by:jsullivan
ID: 2008928
jhance,

Thanks very much for your help.  I finally got it fixed.  I ended up by restoring my /etc directory structure from backup.  My guess is that the one key problem was that I was missing the /etc/shadow file, which I'm told keeps the encrypted passwords.  After restoring, I was able to get back in.

Thanks again.

0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

FreeBSD on EC2 FreeBSD (https://www.freebsd.org) is a robust Unix-like operating system that has been around for many years. FreeBSD is available on Amazon EC2 through Amazon Machine Images (AMIs) provided by FreeBSD developer and security office…
Every server (virtual or physical) needs a console: and the console can be provided through hardware directly connected, software for remote connections, local connections, through a KVM, etc. This document explains the different types of consol…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now