Solved

Link down on IBM DS3400 double controller disk array

Posted on 2014-10-21
8
965 Views
Last Modified: 2014-11-04
Our dual controller disk array has  a link down to the B controller.  This is evident from a cfgmgr error (0514-061Cannot find a child device) on the server and the fiber cable port LEDs on the B controller and from Storage Manager 10 service information files.  I have diff'd a 2012 storageArrayProfile with the current file and I see this on the 2012 "B"  controller:
      Host interface:                 Fibre                          
         Channel:                     1                              
         Current ID:                  2/0xE4                          
         Preferred ID:                2/0xE4                          
         NL-Port ID:                  0x0000E4                        
         Maximum data rate:           4 Gbps                          
         Current data rate:           4 Gbps                          
         Data rate control:           Auto                            
         Link status:                 Up                              
         Topology:                    Arbitrated Loop - Private      
         World-wide port identifier:  20:25:00:a0:b8:68:85:79        
         World-wide node identifier:  20:04:00:a0:b8:68:85:79        
         Part type:                   HPFC-5700           revision 5  
while this on the current "B" controller:
      Host interface:                 Fibre                          
         Channel:                     1                              
         Current ID:                  Not applicable/0xFFFFFFFF      
         Preferred ID:                2/0xE4                          
         NL-Port ID:                  0xFFFFFF                        
         Maximum data rate:           4 Gbps                          
         Current data rate:           4 Gbps                          
         Data rate control:           Auto                            
         Link status:                 Down                            
         Topology:                    Not Available                  
         World-wide port identifier:  20:25:00:a0:b8:68:85:79        
         World-wide node identifier:  20:04:00:a0:b8:68:85:79        
         Part type:                   HPFC-5700           revision 5  
In both cases the active "A" controller data is identical:
      Host interface:                 Fibre                          
         Channel:                     1                              
         Current ID:                  0/0xEF                          
         Preferred ID:                0/0xEF                          
         NL-Port ID:                  0x0000EF                        
         Maximum data rate:           4 Gbps                          
         Current data rate:           4 Gbps                          
         Data rate control:           Auto                            
         Link status:                 Up                              
         Topology:                    Arbitrated Loop - Private      
         World-wide port identifier:  20:24:00:a0:b8:68:85:79        
         World-wide node identifier:  20:04:00:a0:b8:68:85:79        
         Part type:                   HPFC-5700           revision 5  
I have replaced the fiber link, the SFP module on the controller, then the whole controller and the corresponding power supply.  Originally, we noticed the cfgmgr error swapping the CDROM back and forth between our production and test servers, and some event occurred that caused the change.  This was long after we upgraded to 5.3 TL10, so there have been no OS software or firmware upgrades.  I think there must be a configuration bit that has flipped or some such.  I am in the process of getting the unit back on its support contract.  The server has been steadily on contract and IBM support assured be that the Emulex HBA was not defective.  It is the only thing I have not replaced.  Can you guide me in diagnosing this problem?  Thanks, Tom
0
Comment
Question by:mhcbbcadmin
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
8 Comments
 

Author Comment

by:mhcbbcadmin
ID: 40395878
Adding event log event to above description.  This is making it appear that the "good" A controller is actually the problem.
Date/Time: 10/18/14 10:17:39 PM
Sequence number: 3342
Event type: 1208
Description: Data rate negotiation failed
Event specific codes: 0/0/0
Event category: Error
Component type: Channel
Component location: Host-side: controller in slot , port 1
Logged by: Controller in slot A
0
 
LVL 62

Expert Comment

by:gheist
ID: 40395904
Most likely somebody sneezed at optical fibre...
0
 
LVL 56

Assisted Solution

by:andyalder
andyalder earned 500 total points
ID: 40397408
Says they have replaced the cable, but no mention of the HBA  in the server (don't think there is a switch since it says FC-AL for the topology of the working links).
0
Portable, direct connect server access

The ATEN CV211 connects a laptop directly to any server allowing you instant access to perform data maintenance and local operations, for quick troubleshooting, updating, service and repair.

 

Accepted Solution

by:
mhcbbcadmin earned 0 total points
ID: 40407412
It turns out to be okay to disconnect the fiber cables when the link is down.  There was one light emitting SFP port on Controller B and no corresponding HBA ports emitting light.  Using the light-emitting port I was able to show that both the cables were good and most likely then the SFP's and Controller B and power supply.  Our IBM support contract got renewed and they agreed that the host bus adapter was the most likely remaining culprit and that will be replaced shortly.  Right in the middle of all this a drive went down so that was convenient.  I am going to assume that the new HBA will fix the problem.  Performance of the array is decreased under heavy loading conditions so it will be nice to have it fixed.  I'll try to append this comment later if the result is other than the HBA being out.
0
 

Author Comment

by:mhcbbcadmin
ID: 40407890
I've requested that this question be closed as follows:

Accepted answer: 0 points for mhcbbcadmin's comment #a40407412

for the following reason:

We ruled out all but one likely candidate for the problem, with a cause that IBM could not detect directly.  There still could a problem with supporting server hardware.  An "A" would be waiting to test the new HBA, but we need to be able to make other posts here so I want this one closed.
0
 
LVL 56

Expert Comment

by:andyalder
ID: 40407427
Told you it was the HBA.
0
 
LVL 62

Expert Comment

by:gheist
ID: 40407891
It is exactly the prime invalid question closing reason
0

Featured Post

Portable, direct connect server access

The ATEN CV211 connects a laptop directly to any server allowing you instant access to perform data maintenance and local operations, for quick troubleshooting, updating, service and repair.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Attention: This article will no longer be maintained. If you have any questions, please feel free to mail me. jgh@FreeBSD.org Please see http://www.freebsd.org/doc/en_US.ISO8859-1/articles/freebsd-update-server/ for the updated article. It is avail…
Data center, now-a-days, is referred as the home of all the advanced technologies. In-fact, most of the businesses are now establishing their entire organizational structure around the IT capabilities.
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:

615 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question