• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 4166
  • Last Modified:

Blade S SAS RAID Module - Unable to see SCM

Hello experts,

I have a Blade S chassis that was working fine for over 2 years, but it was move to another location last week, and has not fully worked since.
It has;
2 SCM SAN
2 SAS controllers
3 Blades (VMware) zoned to boot from SAN- do not boot as they cannot see any boot device
2 Blades booting from local disk -  these work fine but cannot see any disks apart from local
IP of AMM - 192.168.96.232
SAS Raid CTRL MOD I/O 3 – 192.168.96.233
RAID controller subsystem 192.168.96.234
SAS Raid CTRL MOD I/O 4 – 192.168.96.235
RAID controller subsystem 192.168.96.236

All share the same mask of 255.255.255.0 and same gateway 192.168.96.251
Both controllers display the same error
I/O Module 3 requires user attention
I/O Module 4 requires user attention
Controller I/O 3 can ping its subsystem on 192.168.96.234
Controller I/O 4 cannot ping its subsystem on 192.168.96.236
Zoning is using Predefined Config 10, and displays Normal, then No Cable in the status for Blades 1, 2 and 3 as they continually reboot.

i have tried to update the firmware by the AMM and by the web portal and also using the Python method.
Both AMM methods fail with either bad image or invalid header, and running via Cli (C:\Windows\System32>C:\Temp\ibm_fw_bcsw_s0cl-1.2.3.006_windows_noarch.bat -i 192
.168.96.233 -n) first displayed the below, which i took to mean the password was wrong, so i put it back to the default PASSW0RD
............................................................
.......................
Image unpacked.


Package name  : rssm.1.2.3.006
Package level : 1.2.3.006
Product       : rssm
Image created : Oct25201107:42:11(GMT)


Raid ctlr uBoot version : H-1.1.4.6
Raid ctlr code version  : H-2.1.2.4
Raid ctlr Linux version : H-2.4.20.12
BMC version : S0BT10A
FPGA version : 01.07
SES version : 0107
BBU version : 58.0
DSM version : 1.08
SAS switch version : R1.07


Initializing firmware update - please wait.
MSG: ./SbInst.py failed in function telnetRemoteHost, rc = 13.
....................................................................

If i try and run it again, it just returns Unpacking image C:\Temp\ibm_fw_bcsw_s0cl-1.2.3.006_windows_noarch.bat.and the goes back windows prompt and does nothing else.

Any ideas would be great.
0
deanwilsons
Asked:
deanwilsons
  • 6
  • 2
1 Solution
 
DavidPresidentCommented:
Hardware has been known to break from time to time ... have you tried any diagnostics?
0
 
deanwilsonsAuthor Commented:
Not yet, as i didn't expect both controllers to fail at same time, but its certainly possible. Whats the best way to run these diags?
0
 
DavidPresidentCommented:
HI deanwilsons - I have no Idea. I don't have one of these.  I'm just falling back on tried-and-true techniques.  If you can't easily figure out how to fix what seems to be a software problem, confirm whether or not the hardware is good.
0
Network Scalability - Handle Complex Environments

Monitor your entire network from a single platform. Free 30 Day Trial Now!

 
andyalderCommented:
Why the list of IP addresses? They're only used for management so aren't really relevant with SAS connectivity.

You didn't remove the I/O switches to lighten it when unracking per chance and accidentally put the SAS switches in bays 1 and 2 rather than 3 and 4 did you? No, pretty sure not as says bays 3 and 4 in text above. What about swapping them around, that would break connectivity since they would both have the wrong zones on them.
0
 
deanwilsonsAuthor Commented:
Thought id cover all basis an supply as much info as poss, hence the ip addresses.
I will try and swap the controllers around, but during the rebuild, everything was labeled.
The controllers are in sync, so would that really matter if they were in the wrong bays?

I did notice the time is over 2 hours out on the controllers, but unable to connect the controllers via storage manager to change it.
0
 
deanwilsonsAuthor Commented:
The controllers were in the correct bays, but i swapped them around, and nothing changed so i put them back in the original bays.

I have checked the event logs in the AMM and the controller module, but nothing points to an issue.

The controller just displays a warning, and that user intervention is required.
0
 
deanwilsonsAuthor Commented:
Still getting no further with this issue.

does anyone else have any further suggestions?
0
 
deanwilsonsAuthor Commented:
IBM Support was called in, and they diagnosed it as a corrupt firmware. They supplied a pre-release firmware which resolved the issue.
0
 
deanwilsonsAuthor Commented:
No other resolution was offered.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

  • 6
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now