?
Solved

Link down on IBM DS3400 double controller disk array

Posted on 2014-10-21
8
Medium Priority
?
1,024 Views
Last Modified: 2014-11-04
Our dual controller disk array has  a link down to the B controller.  This is evident from a cfgmgr error (0514-061Cannot find a child device) on the server and the fiber cable port LEDs on the B controller and from Storage Manager 10 service information files.  I have diff'd a 2012 storageArrayProfile with the current file and I see this on the 2012 "B"  controller:
      Host interface:                 Fibre                          
         Channel:                     1                              
         Current ID:                  2/0xE4                          
         Preferred ID:                2/0xE4                          
         NL-Port ID:                  0x0000E4                        
         Maximum data rate:           4 Gbps                          
         Current data rate:           4 Gbps                          
         Data rate control:           Auto                            
         Link status:                 Up                              
         Topology:                    Arbitrated Loop - Private      
         World-wide port identifier:  20:25:00:a0:b8:68:85:79        
         World-wide node identifier:  20:04:00:a0:b8:68:85:79        
         Part type:                   HPFC-5700           revision 5  
while this on the current "B" controller:
      Host interface:                 Fibre                          
         Channel:                     1                              
         Current ID:                  Not applicable/0xFFFFFFFF      
         Preferred ID:                2/0xE4                          
         NL-Port ID:                  0xFFFFFF                        
         Maximum data rate:           4 Gbps                          
         Current data rate:           4 Gbps                          
         Data rate control:           Auto                            
         Link status:                 Down                            
         Topology:                    Not Available                  
         World-wide port identifier:  20:25:00:a0:b8:68:85:79        
         World-wide node identifier:  20:04:00:a0:b8:68:85:79        
         Part type:                   HPFC-5700           revision 5  
In both cases the active "A" controller data is identical:
      Host interface:                 Fibre                          
         Channel:                     1                              
         Current ID:                  0/0xEF                          
         Preferred ID:                0/0xEF                          
         NL-Port ID:                  0x0000EF                        
         Maximum data rate:           4 Gbps                          
         Current data rate:           4 Gbps                          
         Data rate control:           Auto                            
         Link status:                 Up                              
         Topology:                    Arbitrated Loop - Private      
         World-wide port identifier:  20:24:00:a0:b8:68:85:79        
         World-wide node identifier:  20:04:00:a0:b8:68:85:79        
         Part type:                   HPFC-5700           revision 5  
I have replaced the fiber link, the SFP module on the controller, then the whole controller and the corresponding power supply.  Originally, we noticed the cfgmgr error swapping the CDROM back and forth between our production and test servers, and some event occurred that caused the change.  This was long after we upgraded to 5.3 TL10, so there have been no OS software or firmware upgrades.  I think there must be a configuration bit that has flipped or some such.  I am in the process of getting the unit back on its support contract.  The server has been steadily on contract and IBM support assured be that the Emulex HBA was not defective.  It is the only thing I have not replaced.  Can you guide me in diagnosing this problem?  Thanks, Tom
0
Comment
Question by:mhcbbcadmin
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
8 Comments
 

Author Comment

by:mhcbbcadmin
ID: 40395878
Adding event log event to above description.  This is making it appear that the "good" A controller is actually the problem.
Date/Time: 10/18/14 10:17:39 PM
Sequence number: 3342
Event type: 1208
Description: Data rate negotiation failed
Event specific codes: 0/0/0
Event category: Error
Component type: Channel
Component location: Host-side: controller in slot , port 1
Logged by: Controller in slot A
0
 
LVL 62

Expert Comment

by:gheist
ID: 40395904
Most likely somebody sneezed at optical fibre...
0
 
LVL 56

Assisted Solution

by:andyalder
andyalder earned 2000 total points
ID: 40397408
Says they have replaced the cable, but no mention of the HBA  in the server (don't think there is a switch since it says FC-AL for the topology of the working links).
0
Building an interactive eFuture classroom

Watch and learn how ATEN provided a total control system solution including seamless switching matrix switch, HDBaseT extenders, PDU, lighting control to build an interactive eFuture classroom.

 

Accepted Solution

by:
mhcbbcadmin earned 0 total points
ID: 40407412
It turns out to be okay to disconnect the fiber cables when the link is down.  There was one light emitting SFP port on Controller B and no corresponding HBA ports emitting light.  Using the light-emitting port I was able to show that both the cables were good and most likely then the SFP's and Controller B and power supply.  Our IBM support contract got renewed and they agreed that the host bus adapter was the most likely remaining culprit and that will be replaced shortly.  Right in the middle of all this a drive went down so that was convenient.  I am going to assume that the new HBA will fix the problem.  Performance of the array is decreased under heavy loading conditions so it will be nice to have it fixed.  I'll try to append this comment later if the result is other than the HBA being out.
0
 

Author Comment

by:mhcbbcadmin
ID: 40407890
I've requested that this question be closed as follows:

Accepted answer: 0 points for mhcbbcadmin's comment #a40407412

for the following reason:

We ruled out all but one likely candidate for the problem, with a cause that IBM could not detect directly.  There still could a problem with supporting server hardware.  An "A" would be waiting to test the new HBA, but we need to be able to make other posts here so I want this one closed.
0
 
LVL 56

Expert Comment

by:andyalder
ID: 40407427
Told you it was the HBA.
0
 
LVL 62

Expert Comment

by:gheist
ID: 40407891
It is exactly the prime invalid question closing reason
0

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction Regular patching is part of a system administrator's tasks. However, many patches require that the system be in single-user mode before they can be installed. A cluster patch in particular can take quite a while to apply if the machine…
Usually shares are where we want them for our users and we tend to take them for granted. There are times, however, when those shares may disappear causing difficulty for your users. One of the first things to try is searching for files that shou…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.
Suggested Courses

719 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question