Fixing dead paths (powerpath) on AIX 5.3 and AIX 5.2

Recently I was involved with migrating SAN, Our Storage folks are replacing there old SAN switches with NEW switches. New cables were layed out from the NEW switch and all I had to do is to remove fcs0 old connection and replace it with new connection from the new switch and vice versa on fcs1. So this is what happens

1.      Physically disconnect OLD fibre from fcs0 and reconnect it with the new fibre
2.      Fix the dead paths
3.      Physically disconnect OLD fibre from fcs1 and reconnect it with the new fibre
4.      Fix the dead paths.

Since powerpath was installed on these servers, I should be able to run the below commands to fix the dead paths in 2 & 4
  # powermt check
  # powermt config

Now the question is, I am able to run these commands and fix dead paths on AIX 5.2 but on AIX 5.3 my “powermt check” hangs, this is not a particular case, This is will all the servers I am working on and the only resolution is to reboot the servers for the paths to come alive.

Why is this happening on AIX 5.3, Is there any other way to fix the dead paths other than rebooting
mnis2008Asked:
Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

x
 
DavidConnect With a Mentor PresidentCommented:
I just can't tell you if that will or won't work.   But I will tell you that if any mounted disks are attached to hba0 then your system may crash, so make darned sure that nothing uses that port.

Rebooting IS best practices.  If this was my data I would just wait for a downtime window to reboot and be safe.
0
 
DavidPresidentCommented:
Do you have any processes running with open files/handles that use the other path?   If so, MAYBE a kill -9 will let you run the powermt  successfully.

But bottom line, you'll most likely have to reboot.  Reconfiguring FC and for that matter, any peripherals often requires a reboot.  This isn't an AIX thing, I've had to do this with every mainstream O/S you can probably think of (as I develop storage-centric configurators and diagnostics).

Bottom line, you got lucky not having to reboot 5.2.  Consider this just par for the course.  Also just double-check switches first and make sure they reboot switches after you reboot the AIX boxes before you sign off that the job has been properly done.
0
 
mnis2008Author Commented:
I dont have any running or open files on App side or the OS side as all the applications were properly shutdown

Also according to my understanding AIX should be resistant of any SAN changes if "dyntrk" attribute is turned on. This is turned on in my env.

http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105839
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
DavidConnect With a Mentor PresidentCommented:
No, you probably just don't have any user-level apps that have files mounted on any of these devices open.

I'm talking about system-level code that might have /dev/rhdisk[n] open,   The raw device handles are what kills you.  Can you get away with going to single-user mode and not rebooting?  That *MAY* do the trick.

But if you have any of these disks as part of your rootvg then you probably will have to reboot.

A less drastic work around (I know, rebooting can be very painful at times) is to go to the switch and temporarily rezone the LUNs in question.   If AIX can't see them, then typically any process that uses them moves on and releases them.  Then you can satisfy your curiosity and look at the various system logs and see what services complains that the disks  went away.

But don't even *THINK* about it if the HDD's in question are part of a mounted volume group.  You risk data loss if you don't dismount them first.
0
 
mnis2008Author Commented:
Does this work, I was just trying to make sure I dont need a reboot :), If If I have to I have to -

Before the cable is disconnected, Eg fcs0 ( Same for fcs1)

            Remove the paths on the failing adapter from powerpath:
            # sudo powermt remove hba=0


            Unconfigure the port and all attached devices:
            # sudo rmdev -Rdl fcs0

Disconnect the cable and reattach the new cable from new SAN    

              Run config Manager, fcs and fscsi devices should be Available (the disks will not yet be Available)
             # cfgmgr

             Run EMC cfgmgr.  It should login to the storage array, and configure all disks and redundant paths:
             # sudo emc_cfgmgr                  
         
             Run # powermt config
0
 
Duncan MeyersCommented:
Since you have new SAN switches, you'll need to reboot. AIX uses the FCID (Fibre Cahnnel ID) of the storage port in the device descriptor (so does HP-UX) so if you change the switch port that the storage is plugged into, you change the device descriptor. Reboot the server and you'll see the PowerPath pseudo devices come back with new PowerPath IDs. You'll just need to fix up the mount points and you're done.
0
All Courses

From novice to tech pro — start learning today.