External SCSI drive shows available but cannot access

I have an IBM 43P workstation running AIX 4.3.3

The workstation has two internal SCSI drives of 9GB and 18GB, and an external SCSI tower with one drive of 73GB.  It is dedicated to running Scitex/Creo/Kodak Brisque software which takes graphic files and prepares them for output as printing plates on Trendsetter plate maker.

Last week we started getting messages that we ran out of disk space.  However, the "Jobs" folder where we normally have fifty to one hundred processed files was showing as empty.  We were unable to send any more jobs to the workstation without getting the "not enough disk space" message in the Brisque software.

I ran a customer maintenance program which checks the filesystems, runs fsck, etc, and got no error messages other than a warning to run varyonvg without reference to a specific volume.

Later I ran IBM diagnostics in maintenance mode and got no error codes. When I use smit devices all three hard drives are listed as available.

When I use smit to check logical and physical volumes, however, the only physical volumes that show up are hdisk0 and hdisk1, the two internal drives. When I try to add hdisk2, the drive in the external tower, I get a message that a device already exists at that address.

Listing the logical volumes shows three lvs:  rootvg, svg1, and svg2.  Svg2 is not showing as active. When I try to activate it, I get the following message:

0516-013 The volume group cannot be varied on because there are no good copies of the descriptor area.
PV STATUS   004a181a17b81a90 PVNOTFND

Is the drive dead, or is this a SCSI problem?  If I replace the hard drive, how do I re-establish the logical volumes, volume groups, etc?

Who is Participating?
Of course the disk COULD be dead, or just degraded to point where it responds to some commands, but just doesn't spin up. Since you tried setting unique SCSI IDs for the adapters (I presume you used 7 & 8 or 6 & 7, and didn't set the adapter to one of the IDs used by a disk).

Cabling, terminators is a possibility, but I would just see if I could mount the disk in the known working HBA, set the unit ID to 3, then see if I could see the disk.   No reason to attempt to mount it, certainly don't add it to ODM database, you don't need to do any of that and risk it writing to the drive to use dd to see if you can read from it.


dd if=/dev/hdisk2 of=/tmp/junk  count=1

If you get a 512 byte file, then  so far so good.
Then dd if=/dev/hdisk2 of=/dev/null

and see if it completes w/o throwing errors  (Obviously make sure device names are correct).

There is some commercial disk diagnostic software for AIX, but probably not worth the money.   (santools.com smartmon-ux is ported to AIX)..    

Those diagnostics may not be sufficient, were they full media tests, or just a S.M.A.R.T. test, or something that you have no idea what they did?

Most likely the SCSI Unit select.  (If using the 68-pin connector)  These are jumpers on each individual disk drive.  Depending on your enclosure, there may also be a requirement to run unit select cables from the HDD enclosure to the pin header

If the disk has an SCA connector (looks like a rounded trapezoid), then the unit select is going to be somewhere in the enclosure.

Also in the BIOS of the 43p there is a way to select the UNIT id.  Cant remember what you push down but I think it tells you on a POST.  
Don't forget that the controller itself uses an ID, typically 7, but it is also possibly 6 or 8 if there was more than one SCSI controller in the system.  they must be unique
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

jarego2Author Commented:
The internal SCSI adapter address is 10-80 and the SCSI ID is 7. The external adapter is at 1P-10 and the SCSI ID is also 7. The adapters are both Wide/Fast-20.

The internal drives are located at 10-80-00-0,0 and 10-80-00-1,0 respectively.  The external drive is located at 1P-10-00-0,0.  The external drive is an 80-pin Ultra160 Seagate drive.

I tried to change the SCSI ID on both adapters, and got the same message both times:

Method Error 0514-029
Cannot perform the requested function because a child device of the specified device is not in the correct state.

Do I have to unmount the drives in order to change the ID?
jarego2Author Commented:
I tried a different cable and the drive is now working!

Thanks for the suggestions
great, sometimes it is something easy :)
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.