Replacing failed HD in AIX 4.3.3
Posted on 2009-04-04
I have an RS6000 unit with AIX 4.3.3 and two drives, 4.6G and 18.3G. Sometime back it would no longer boot and when I ran standalone diagnostics from install CD 1, I found that the 18.3G drive had failed (in system configuration it had '????'' instead of an identifier). I had two problems to solve then: a system that would not boot, and a bad drive. I decided to purchase a replacement 18.3G drive. The following is the output after installing the drive and running diagnostics:
Volume group 000766351f330bc4 contains these disks:
hdisk1 4303 10-80-00-4, 0
Volume group 000766351f330bc4 includes the following logical volumes:
hd5 hd6 hd8 hd4 hd2 hd9var hd3 hd1 lv00
When I choose the option to access rootvg before mounting filesystems, I get the following output:
PV Status: hdisk1 PVACTIVE
varyonvg: Volume group rootvg is varied on.
0516-510 updatevg: Physical volume not found for physical volume identifier 00076635403389c7.
0516-548 syncl volm: Partially successful with updating volume group rootvg.
0516-622 updatelv: Warning, cannot write lv control block data.
0516-782 importvg: Partially successful importing of hdisk1.
Checking the /filesystem.
log redo processing for /dev/rhd4
Syncpt record at 13f028
end of log 13f028
Syncpt record at 13f028
Syncpt address 13f028
Number of log records=1
Number of data blocks=0
Number of nodo blocks=0
/dev/rhd4 (/): ** Unmounted cleanly - Check suppressed
Checking the /usr filesystem
/dev/rhd2 (usr): ** Unmounted cleanly - Check suppressed
If I try to access rootvg and mount filesystems, it goes into an infinite loop trying to load some module.
Ive tried a number of things from the 4.3 vintage LVM manual, and a 5.3 troubleshooting guide.
From the # prompt after accessing rootvg before mounting filesystems:
Any smit commands, cfgmgr, rmlvcopy, rmdev, reducevg all fail with /usr/bin/ksh not found.
Lsdev Ccdisk results in
hdisk0 Available 10-80-00-00, 0 N/A
hdisk1 Available 10-80-00-04, 0 N/A
Extendvg is functional but I havent yet done it since it appears that some boot files are missing.
The 5.3 troubleshooting guide recommends doing a system restore from an image backup. The client that uses this machine (and wants it running again) did do data and image backups but the tapes are not labeled clearly. I do have a tape labeled 'Image backup set #1'. I inserted this tape, booted from CD #1, and selected restore from backup tape.
After some time I got the message 'Invalid disk found'. Upon researching this further I concluded that the original disks were a mirrored set. The posts I found related to this (not on this forum) suggested restoring to two disks. I have however not been able to find how exactly to configure the system to restore to two disks - the option is to select one or the other, but not both.
When I pull up the option to change disks, I get the following:
hdisk1 - (what looks like a valid identifier)
hdisk0 - 0000000000000000
1. Does anyone know how to restore to two disks?
2. Since the new disk is now hdisk0, will that cause problems? The LVM guide suggested creating a dummy hd identifier and letting the system renumber the new drive to be higher than the boot drive.
3. Do I need to do anything else to initialize the new drive?
At this point Im wondering if I should just do a new install. I appreciate any assistance.