Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Link down on IBM DS3400 double controller disk array

Posted on 2014-10-21
8
874 Views
Last Modified: 2014-11-04
Our dual controller disk array has  a link down to the B controller.  This is evident from a cfgmgr error (0514-061Cannot find a child device) on the server and the fiber cable port LEDs on the B controller and from Storage Manager 10 service information files.  I have diff'd a 2012 storageArrayProfile with the current file and I see this on the 2012 "B"  controller:
      Host interface:                 Fibre                          
         Channel:                     1                              
         Current ID:                  2/0xE4                          
         Preferred ID:                2/0xE4                          
         NL-Port ID:                  0x0000E4                        
         Maximum data rate:           4 Gbps                          
         Current data rate:           4 Gbps                          
         Data rate control:           Auto                            
         Link status:                 Up                              
         Topology:                    Arbitrated Loop - Private      
         World-wide port identifier:  20:25:00:a0:b8:68:85:79        
         World-wide node identifier:  20:04:00:a0:b8:68:85:79        
         Part type:                   HPFC-5700           revision 5  
while this on the current "B" controller:
      Host interface:                 Fibre                          
         Channel:                     1                              
         Current ID:                  Not applicable/0xFFFFFFFF      
         Preferred ID:                2/0xE4                          
         NL-Port ID:                  0xFFFFFF                        
         Maximum data rate:           4 Gbps                          
         Current data rate:           4 Gbps                          
         Data rate control:           Auto                            
         Link status:                 Down                            
         Topology:                    Not Available                  
         World-wide port identifier:  20:25:00:a0:b8:68:85:79        
         World-wide node identifier:  20:04:00:a0:b8:68:85:79        
         Part type:                   HPFC-5700           revision 5  
In both cases the active "A" controller data is identical:
      Host interface:                 Fibre                          
         Channel:                     1                              
         Current ID:                  0/0xEF                          
         Preferred ID:                0/0xEF                          
         NL-Port ID:                  0x0000EF                        
         Maximum data rate:           4 Gbps                          
         Current data rate:           4 Gbps                          
         Data rate control:           Auto                            
         Link status:                 Up                              
         Topology:                    Arbitrated Loop - Private      
         World-wide port identifier:  20:24:00:a0:b8:68:85:79        
         World-wide node identifier:  20:04:00:a0:b8:68:85:79        
         Part type:                   HPFC-5700           revision 5  
I have replaced the fiber link, the SFP module on the controller, then the whole controller and the corresponding power supply.  Originally, we noticed the cfgmgr error swapping the CDROM back and forth between our production and test servers, and some event occurred that caused the change.  This was long after we upgraded to 5.3 TL10, so there have been no OS software or firmware upgrades.  I think there must be a configuration bit that has flipped or some such.  I am in the process of getting the unit back on its support contract.  The server has been steadily on contract and IBM support assured be that the Emulex HBA was not defective.  It is the only thing I have not replaced.  Can you guide me in diagnosing this problem?  Thanks, Tom
0
Comment
Question by:mhcbbcadmin
  • 3
  • 2
  • 2
8 Comments
 

Author Comment

by:mhcbbcadmin
ID: 40395878
Adding event log event to above description.  This is making it appear that the "good" A controller is actually the problem.
Date/Time: 10/18/14 10:17:39 PM
Sequence number: 3342
Event type: 1208
Description: Data rate negotiation failed
Event specific codes: 0/0/0
Event category: Error
Component type: Channel
Component location: Host-side: controller in slot , port 1
Logged by: Controller in slot A
0
 
LVL 62

Expert Comment

by:gheist
ID: 40395904
Most likely somebody sneezed at optical fibre...
0
 
LVL 55

Assisted Solution

by:andyalder
andyalder earned 500 total points
ID: 40397408
Says they have replaced the cable, but no mention of the HBA  in the server (don't think there is a switch since it says FC-AL for the topology of the working links).
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 

Accepted Solution

by:
mhcbbcadmin earned 0 total points
ID: 40407412
It turns out to be okay to disconnect the fiber cables when the link is down.  There was one light emitting SFP port on Controller B and no corresponding HBA ports emitting light.  Using the light-emitting port I was able to show that both the cables were good and most likely then the SFP's and Controller B and power supply.  Our IBM support contract got renewed and they agreed that the host bus adapter was the most likely remaining culprit and that will be replaced shortly.  Right in the middle of all this a drive went down so that was convenient.  I am going to assume that the new HBA will fix the problem.  Performance of the array is decreased under heavy loading conditions so it will be nice to have it fixed.  I'll try to append this comment later if the result is other than the HBA being out.
0
 

Author Comment

by:mhcbbcadmin
ID: 40407890
I've requested that this question be closed as follows:

Accepted answer: 0 points for mhcbbcadmin's comment #a40407412

for the following reason:

We ruled out all but one likely candidate for the problem, with a cause that IBM could not detect directly.  There still could a problem with supporting server hardware.  An "A" would be waiting to test the new HBA, but we need to be able to make other posts here so I want this one closed.
0
 
LVL 55

Expert Comment

by:andyalder
ID: 40407427
Told you it was the HBA.
0
 
LVL 62

Expert Comment

by:gheist
ID: 40407891
It is exactly the prime invalid question closing reason
0

Featured Post

NAS Cloud Backup Strategies

This article explains backup scenarios when using network storage. We review the so-called “3-2-1 strategy” and summarize the methods you can use to send NAS data to the cloud

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been running these systems for a few years now and I am just very happy with them.   I just wanted to share the manual that I have created for upgrades and other things.  Oooh yes! FreeBSD makes me happy (as a server), no maintenance and I al…
More or less everybody in the IT market understands the basics of Networking, however when we start talking about Storage Networks, things get a bit dizzier, and this is where I would like to help.
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question