Solved

Replacing failed HD in AIX 4.3.3

Posted on 2009-04-04
5
983 Views
Last Modified: 2013-11-17
I have an RS6000 unit with AIX 4.3.3 and two drives, 4.6G and 18.3G.  Sometime back it would no longer boot and when I ran standalone diagnostics from install CD 1, I found that the 18.3G drive had failed (in system configuration it had '????'' instead of an identifier).  I had two problems to solve then: a system that would not boot, and a bad drive.  I decided to purchase a replacement 18.3G drive.  The following is the output after installing the drive and running diagnostics:

Volume group 000766351f330bc4 contains these disks:
hdisk1 4303 10-80-00-4, 0

Volume group 000766351f330bc4 includes the following logical volumes:
hd5 hd6 hd8 hd4 hd2 hd9var hd3 hd1 lv00

When I choose the option to access rootvg before mounting filesystems, I get the following output:
PV Status:      hdisk1      PVACTIVE
            NONAME

varyonvg: Volume group rootvg is varied on.
0516-510 updatevg: Physical volume not found for physical volume identifier 00076635403389c7.
0516-548 syncl volm: Partially successful with updating volume group rootvg.
0516-622 updatelv: Warning, cannot write lv control block data.
0516-782 importvg: Partially successful importing of hdisk1.
rootvg
Checking the /filesystem.
log redo processing for /dev/rhd4
Syncpt record at 13f028
end of log 13f028
Syncpt record at 13f028
Syncpt address 13f028
Number of log records=1
Number of data blocks=0
Number of nodo blocks=0
/dev/rhd4 (/): ** Unmounted cleanly - Check suppressed
Checking the /usr filesystem
/dev/rhd2 (usr): ** Unmounted cleanly - Check suppressed

If I try to access rootvg and mount filesystems, it goes into an infinite loop trying to load some module.

Ive tried a number of things from the 4.3 vintage LVM manual, and a 5.3 troubleshooting guide.

From the # prompt after accessing rootvg before mounting filesystems:

Any smit commands, cfgmgr, rmlvcopy, rmdev, reducevg all fail with /usr/bin/ksh not found.

Lsdev Ccdisk results in

hdisk0 Available 10-80-00-00, 0 N/A
hdisk1 Available 10-80-00-04, 0 N/A

Extendvg is functional but I havent yet done it since it appears that some boot files are missing.

The 5.3 troubleshooting guide recommends doing a system restore from an image backup.  The client that uses this machine (and wants it running again) did do data and image backups but the tapes are not labeled clearly.  I do have a tape labeled 'Image backup set #1'.  I inserted this tape, booted from CD #1, and selected restore from backup tape.

After some time I got the message 'Invalid disk found'.  Upon researching this further I concluded that the original disks were a mirrored set.  The posts I found related to this (not on this forum) suggested restoring to two disks.  I have however not been able to find how exactly to configure the system to restore to two disks - the option is to select one or the other, but not both.

When I pull up the option to change disks, I get the following:

hdisk1 - (what looks like a valid identifier)
hdisk0 - 0000000000000000

Questions:

1. Does anyone know how to restore to two disks?
2. Since the new disk is now hdisk0, will that cause problems?  The LVM guide suggested creating a dummy hd identifier and letting the system renumber the new drive to be higher than the boot drive.
3. Do I need to do anything else to initialize the new drive?

At this point Im wondering if I should just do a new install.  I appreciate any assistance.
0
Comment
Question by:acort
  • 2
5 Comments
 
LVL 7

Accepted Solution

by:
dolomiti earned 500 total points
ID: 24070412
hi,
for a moment ignore new disk.
Seeing boot sequence in SMS (system Management Service, the menu you access by F1),
which is the boot disk, the 18 or 4 one ?
I thinked was 4 and 18 contains data, (hdisk0 the 4gb and hdisk1 18 for history) but your report seem
to show different. Could you explain this?
What the system do (did). What is important to restore: data, OS ? and in which disk are they located ?

When you access volume group, before mounting fs, have you tried to perform a
fsck -y /
fsck -y /usr
fsck -y /var
fsck -y /tmp
fsck -y /dev/lv00 ?

Supposing you are in the disaster sit, to cannot anyway boot from old disk, I know 2 way.

One is to boot from CD1, and chose the option (I don't remember detailed name) that holds your data,
rebuilding new base OS (you loose your netcfg,name,program under /usr) but you maintain user fs
(if these are available, not on a failed hd). If the more important thigs are data and not sys cfg this may be a way. If the important is syscfg, usually a restore from system backup will do.

A way that I follow in some case is to take out both old disk, perform a new setup  in the new disk
(choose before a SCSI id different from old 2), and reach a minimum going system.
Then, seen new system goes (boots, reboots), attach 2 old disks, and, using smitty vg,
import VG from both old disk naming it phtmvg.
You will see errors, becouse /,/usr,/var... already exist, then it import them in new VG, puts the LVs
in odm, but does not update your /etc/filesystem.
Seeing the output in smitty (if you loose the video output, see it on /*smit* files) understand
the match between /dev/fs001 and old names: ie the system says: I have found an lv hd4 but I cannot import it becouse already exists, then it names /dev/fs003. Rename the lv as ohd4 and create an entry
in /etc/filesystem mounting it under /restore
do the same with /dev/ousr in /restore/usr.
and for other.
Make attention to use the second jfslog for these fs, the one of imported vg and not the same of rootvg.
You will see a phantom system under /restore... with all old fs, mounted in
/restore
/restore/usr
/restore/var
/restore/home
/restore/data

bye
vic

0
 
LVL 2

Expert Comment

by:kishored2004
ID: 24070428
What message do you get when you try to restore the system to hdisk0? Does it allow you to or you just get an error message?

Maybe you need an fsck to be run on all file systems since the shared libraries might not be accessible.

Can you paste the ouput of the results?
0
 

Author Comment

by:acort
ID: 24071415
hdisk1 is the boot disk (4G), hdisk0 is the new data disk.  I think the replacement disk was assigned hdisk0 when I installed it.  The 4.3 LVM guide said something about creating a dummy hdisk0 and letting the system asign a new ID (which I assume would become hdisk2).

System restore will allow me to choose one disk or the other, but not both.  A post I read on another forum describing the 'invalid disk found' recommended restoring the image to both disks, but I don't see how you can do that.

I think two things happened: I ran out of space on hdisk1 (the boot disk), looking at the 'cannot write LV control block data' message.  Shortly thereafter, the 18G disk died.

I really need the data which is on a backup tape, so installing a new OS would be acceptable.

I will try the commands you suggested and post the output - thanks.
0
 
LVL 7

Expert Comment

by:dolomiti
ID: 24075350
hi, warning,
mksysb save on tape just rootvg VG,
and I see from your output that rootvg contians just hdisk1.
hdisk1 contains standard lv and lv00, that may contain data, but probably the most
were on the other disk inside another vg

(if I had installed the system, I had created by the big disk a 2nd vg named as datavg,
and on this data, done periodically backup, while by mksysb copied rootvg to restart th sysstem
in case of failure..)

If rootvg contains just 4GB disk, and for some motivation it is corrupted, may be that 18gbhd is good
and for not fully compatibility with risc, it does not show size in SMS.
Are you sure that box has still 4gb hd as default boot devices ?
Sometime these 43P (I don't know your HW) forget the boot sequence

bye
vic


0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

More or less everybody in the IT market understands the basics of Networking, however when we start talking about Storage Networks, things get a bit dizzier, and this is where I would like to help.
How to update Firmware and Bios in Dell Equalogic PS6000 Arrays and Hard Disks firmware update.
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now