Solved

aix mpio disks are in state failed and could not be enabled error

Posted on 2010-09-24
12
1,523 Views
Last Modified: 2013-11-17
Hello
i got the following error on my lpar.

can someone provide some information on it, what it means, what could cause it, what could be done to troubleshoot it?

some mpio disks are in state failed and could not be enabled?
0
Comment
Question by:assistunix
  • 8
  • 4
12 Comments
 
LVL 68

Assisted Solution

by:woolmilkporc
woolmilkporc earned 500 total points
ID: 33755717
Hi,

are those drives "Virtual Disks" coming from one or several VIO server(s)?

Are all the failing paths under the same parent "vscsix" as seen with "lspath"?

Is the VIO server responsible for this "Virtual SCSI adapter" OK?

To determine this check the errorlog of your VIO server(s) (Issue "errlog" as padmin on the VIO server(s)).
Do you see something like "vhostx Virtual SCSI Host Adapter detected an error"?

If so, and if "vhostx" is indeed the host adapter responsible for the client adapter "vscsix" at the LPAR,  you'll have to issue (again as padmin on the concerned VIOS):

rmdev -dev vhostx -ucfg -recursive
cfgdev -dev vhostx

If there are several VIO servers providing the disks for the LPAR you can leave it running. If there is only one VIOS, shutdown the partition beforehand!

-ucfg is very important, else the complete vhost config will be lost!

Now start the partition if you stopped it before, or issue "cfgmgr" there and enable the failing paths if you left it running.


wmp

0
 

Author Comment

by:assistunix
ID: 33758019
Hello Again, thank you for the quick reply.(as always)

The issue was a temporary one. as i got the error in the error long on the lpar of "path failed"
some mpio disks are in state failed and could not be enabled

than i did lspath on the lpar and everything was enabled.

than i went to vio server.

the issue was with VIO, checked the error logs there and this was the error.
 "vhostx Virtual SCSI Host Adapter detected an error"

worked with IBM, who recommended to increase the memory as after their analyzation, it seemed as if the memory was too little on this.

so that issue is resolved of temp error in vio causing failed path error alert, and hopefully adding more memory would prevent it from coming again.

HOWEVER, i am keen to learn

if this was not to be a temp error and if the disks on the lpar were not to have switched paths to become enable again as they did and were infact in failed state in the output of "lspath" as the error earlier stated.
than would i do the following?
"
If so, and if "vhostx" is indeed the host adapter responsible for the client adapter "vscsix" at the LPAR,  you'll have to issue (again as padmin on the concerned VIOS):

you mean, vhostx (which is found in error log on VIO for error  "vhostx Virtual SCSI Host Adapter detected an error")  ??
and you mean "vscsix" (which is found in lspath on the lpar for the disks) ??
    and "vscsix" is used to indicate which VIO that disk is coming from right ?

how can i relate and figure out that which "vscsix" goes to which "vhostx" ???

rmdev -dev vhostx -ucfg -recursive   ( this would remove vhost, "clear the cache or something)???
cfgdev -dev vhostx                              ( this would add the vhost again for the disk to that lpar) ???

If there are several VIO servers providing the disks for the LPAR you can leave it running. If there is only one VIOS, shutdown the partition beforehand!

(if only one vio for disks on lpar, than shutdown BEFORE using rmdev and cfgdev command in vio)???

-ucfg is very important, else the complete vhost config will be lost! ( what do these flags mean)???

Now start the partition if you stopped it before, or issue "cfgmgr" there and enable the failing paths if you left it running. (how to enable the failing paths??, would cfgmgr enable the paths itself)????


THANK YOU!!!
0
 
LVL 68

Assisted Solution

by:woolmilkporc
woolmilkporc earned 500 total points
ID: 33758509
>> how can i relate and figure out that which "vscsix" goes to which "vhostx" ??? <<

The easiest way is most probably this:

At the lpar, issue "uname -L". The first column contains the Partition ID.
Convert this number to hex.

--- Excursion: How to convert to hex ----------------------------------------------------------------
Use  bc if needed.
For e.g. partition ID "22":  echo "obase=16; 22" |bc
This will result in "16" which is decimal 22 converted to hex (also written as "0x16")
Or, all in one: echo "obase=16; $(uname -L |cut -f1 -d' ')" | bc
------------------------------------------------------------------------------------------------------------------

Note this hexadecimal value, log in to the VIOS and issue (again for the LPAR ID 22 (=0x16)) "lsmap -all |grep ^vhost  | grep 0016"
The first column shows the vhostx for the LPAR with ID 22


>> rmdev -dev vhostx -ucfg -recursive <<
>> what do these flags mean <<

This will remove vhostx and all its children (VTDs) from the running ("current") configuration, but keep all their definitions in the ODM database.
During the next "cfgmgr" run (in IOSCLI terms "cfgdev"), if a device with the same basic characteristicts in the same location is found, those definitions are reused.
Without -ucfg this ODM entries would have been purged.

>> if only one vio for disks on lpar, then shutdown BEFORE using rmdev and cfgdev command in vio??? <<

I should have written: If only one VIO and if your rootvg disks come from VIO ....
I think your LPAR will have died anyway in this case, but if it's still alive, yes, shut it down before rmdev ... etc.!

>> would cfgmgr enable the paths itself <<

"Missing" and "Failing" paths will be reconfigured, "Disabled" paths will remain disabled.

>> how to enable the failing paths <<

Best with a little help from our old friend "smitty"
 
 "smitty mpiopath_enable_all"  -> "All Paths" -> <Enter>

That's all.


YOU'RE WELCOME!!!

wmp

0
 

Author Comment

by:assistunix
ID: 33905949
thank you
0
 

Author Comment

by:assistunix
ID: 34306404
Hello wmp.

root # lspath
Enabled hdisk0 vscsi0
Enabled hdisk1 vscsi1
Enabled hdisk2 vscsi6
Enabled hdisk3 vscsi0
Failed  hdisk3 vscsi1


hdisk 3 only has two user created file systems and both are accessible for read and write, but one of its path has failed. Please help, in enabling the path.

the lpar is attached to two vio's.
i check the errlog on both VIO's-
On VIO1 - there is no Virtual SCSI Host Adapter detected an error- although there is one that came about a week back and the issue of failed path came today.
and On VIO2 - there is no Virtual SCSI Host Adapter detected an error at all.

i was able to find out the vhost information of the LPAR from VIO server using the client partition ID.

VIO1-

$ lsmap -all |grep ^vhost  | grep 007
vhost0          U9119.FHA.0292674-V5-C75                     0x00000007
vhost5          U9119.FHA.0292674-V5-C71                     0x00000007
vhost6          U9119.FHA.0292674-V5-C72                     0x00000007
vhost7          U9119.FHA.0292674-V5-C73                     0x00000007
$

VIO -2

$ lsmap -all |grep ^vhost  | grep 007
vhost0          U9119.FHA.0292674-V6-C75                     0x00000007
vhost5          U9119.FHA.0292674-V6-C71                     0x00000007
vhost6          U9119.FHA.0292674-V6-C72                     0x00000007
vhost7          U9119.FHA.0292674-V6-C73                     0x00000007
$

however, how do i match those vhosts to the vsci's on the LPAR ?

how can i determine which vhost is attached to the vscsi0 that has failed? and which VIO has the failed vscsi path?

0
 

Author Comment

by:assistunix
ID: 34306610
i believe in AIX, generally it is believed that vscsi0 comes from VIO1 and vscsi1 comes from VIO2. although that could vary from every environment, depending on how the configuration was done-
can you tell me how i can verify, in my system- as to which VIO is vscsi1 coming from-?
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:assistunix
ID: 34307839
turns out the hdisk3 was not mapped on VIO2, so i mapped it again with mkvdev command and than enabled the path smitty mpiopath_enable_all and all was well.
0
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 500 total points
ID: 34308741
>> how can i determine which vhost is attached to the vscsi0 that has failed? <<

This is a really good question, and the solution is either simple (using the HMC GUI) or a bit complicated (using HMC command line).

The easy one:
Open your HMC GUI, open the properties box of e.g. VIO1 (via "Systems Management" -> "Servers" -> (name of managed system) -> (name of VIO1 lpar) -> "Properties").
Click the "Virtual Adapters" Tab. In the displayed list search the lines corresponding to your LPAR ("Connecting Partition") for type "Server SCSI".
Now note the numbers for "Adapter ID" and the ones for "Connecting Adapter" in the same line.

"Adapter ID" corresponds to (in your lsmap example above) e.g. "C75" (without the "C", of course), and "Connecting Adapter" corresponds to the "Cxx" value displayed with "lscfg | grep vscsi" at your partition.

Lets say you found at the HMC in the VIO1 properties: Adapter ID = 75 and Connecting Adapter = 3 for partition 7.
You found at VIO1:
"vhost0  U9119.FHA.0292674-V6-C75  0x00000007"
Assuming you see with "lscfg | grep vscsi" at partition 7 the following:
"* vscsi0 U9119.FHA.0292674-V5-C3-T1   Virtual SCSI Client Adapter"
you now found out that "vhost0" at VIO1 (C75) is connected to "vscsi0" at LPAR 7 (C3)

To find tose numbers ("75" and "3" in my example) with the HMC command line, log in to the HMC as "hscroot" and issue:

lssyscfg -r prof -m (managed system) --filter "lpar_names=(VIO1)" -F "virtual_scsi_adapters"
The values in parentheses above must be supplied by you (don't type the parentheses).

You will see quite a bunch of output, consisting of comma-enclosed blocks. One of these blocks could look like:
"75/server/7/mylpar/3/0,..."

This way you've found the same numbers as with the HMC GUI (Adapter ID 75 and Connecting Adapter 3) for partition 7.

Now work wit these numbers the same way I described above.

I told you it was a good question ...

And no, vscsi0 does not always come from VIO1 etc. How should your LPAR know which of your VIOS is the "number one"?
vscsi0 is the first detected client adapter.
It could be the first one due to the sequence cfgmgr uses to scan the virtual slots, or due to timing issues, or due to the fact that you defined the adapters one by one and ran cfgmgr inbetween.

Did I ever tell you that I'm an AIX addict?

wmp





0
 

Author Comment

by:assistunix
ID: 34418705
Yes i can tell by the amount of knowledge you have- and for the sake of me and my others like me on this site, we are really grateful for having a Genius like you addicted to AIX. :) thank you.
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 34701309
Thank you for the compliments!

Do you need further assistance in solving this issue?

wmp
0
 

Author Comment

by:assistunix
ID: 34930788
yes, i am having an issue running lsmap. let me provide an output
0
 

Author Comment

by:assistunix
ID: 34930791
lspath*
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

I promised to write further about my project, and here I am.  First, I needed to setup the Primary Server.  You can read how in this article: Setup FreeBSD Server with full HDD encryption (http://www.experts-exchange.com/OS/Unix/BSD/FreeBSD/A_3660-S…
Using libpcap/Jpcap to capture and send packets on Solaris version (10/11) Library used: 1.      Libpcap (http://www.tcpdump.org) Version 1.2 2.      Jpcap(http://netresearch.ics.uci.edu/kfujii/Jpcap/doc/index.html) Version 0.6 Prerequisite: 1.      GCC …
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now