brithol
asked on
1783 - drive controler array failure
Hi,
I have a server dl380 g8 and I have 8 disks in a raid 1+0.
Today the server made a restart itself, and the message that is having when it reaches the raid controller is the 1783 - drive array controller failure.
I have another server with the same configuration, and I put the disks of the other server in these with the error and with these disks the message is not showing and the Windows is booting up. But when I put the disks of the server with the error message on the other server it gave the same message, so it means that it is a disk related error.
How can I see wich disk is causing these, and in the leds all are OK(green).
Please help...!!
I have a server dl380 g8 and I have 8 disks in a raid 1+0.
Today the server made a restart itself, and the message that is having when it reaches the raid controller is the 1783 - drive array controller failure.
I have another server with the same configuration, and I put the disks of the other server in these with the error and with these disks the message is not showing and the Windows is booting up. But when I put the disks of the server with the error message on the other server it gave the same message, so it means that it is a disk related error.
How can I see wich disk is causing these, and in the leds all are OK(green).
Please help...!!
Please let us know which Raid card or integrated raid chip that you have.
I would start with checking your firmware and being prepared to upgrade it. There was a bug in firmware on several cards that caused this issue when a drive in the array triggered a SMART error.
http://h20566.www2.hp.com/hpsc/doc/public/display?docId=emr_na-c03350045
I would start with checking your firmware and being prepared to upgrade it. There was a bug in firmware on several cards that caused this issue when a drive in the array triggered a SMART error.
http://h20566.www2.hp.com/hpsc/doc/public/display?docId=emr_na-c03350045
ASKER
Hi,
if I put the disks of the faulty server in another server the error message is the same.
So it means that is a disk issue.
The disks are in raid 1+0,
Thanks
if I put the disks of the faulty server in another server the error message is the same.
So it means that is a disk issue.
The disks are in raid 1+0,
Thanks
Lets get this straight.
You have Server "A" and its set of disks "A1" in a RAID10 Setup (1 hope thats what you mean by RAID 1+0)
You have Server "B" and its set of disks "B1"
Server A is showing an error, and if you put the B1 disks into the A server, it boots
and if you put the A1 disks into the B server it shows the same error message that A does.
Is that correct?
If so it means the error is travelling with the disks and that means its the disks!
You have Server "A" and its set of disks "A1" in a RAID10 Setup (1 hope thats what you mean by RAID 1+0)
You have Server "B" and its set of disks "B1"
Server A is showing an error, and if you put the B1 disks into the A server, it boots
and if you put the A1 disks into the B server it shows the same error message that A does.
Is that correct?
If so it means the error is travelling with the disks and that means its the disks!
Since the problem follows the disks I would suspect corrupt metadata on at least one of them. You can generate this error by taking a bunch of random disks from other servers and plugging them into a Smart Array controller which confuses it because the metadata doesn't match.
If you can press F10 to get into Intelligent Provisioning and then diagnostics - ACU and generate an ADUreport.zip file it would help data recovery but it might not let you into the ACU if the controller is locked. An ADU report without the disks in may yield some info since the slot0.txt file has the controller's self test log.
If you have a backup the easiest way is to power on without the disks, then add them hot and create a new array.
If you really need to get the data off then software de-striping with RAIDreconstructor may be needed, for that you'll need a computer with a non-RAID SAS HBA to image each raw disk onto, a large disk to put the de-striped data on an the software from runtime.org. Be aware though that if there is more than one logical disk on the array you may have to pay for their raidprobe service as well.
If you can press F10 to get into Intelligent Provisioning and then diagnostics - ACU and generate an ADUreport.zip file it would help data recovery but it might not let you into the ACU if the controller is locked. An ADU report without the disks in may yield some info since the slot0.txt file has the controller's self test log.
If you have a backup the easiest way is to power on without the disks, then add them hot and create a new array.
If you really need to get the data off then software de-striping with RAIDreconstructor may be needed, for that you'll need a computer with a non-RAID SAS HBA to image each raw disk onto, a large disk to put the de-striped data on an the software from runtime.org. Be aware though that if there is more than one logical disk on the array you may have to pay for their raidprobe service as well.
Putting "dl380 1783" into google gives lots of hits like the ones below, that all give various errors due to controller issues, mainly related to SCSI bus problems, such as below. Although some are down to dead WBC batteries, and firmware.
But none of these seem to fit your symptoms as the problem seems to be related to the disks. Have you done any changes updates recently to either server/controller/disks etc?
http://h20564.www2.hp.com/hpsc/doc/public/display?calledBy=&docId=emr_na-c02190973-1&docLocale=
Solution "the SCSI bus backplane was replaced, and that rectified the issue"
http://h20564.www2.hp.com/hpsc/doc/public/display?docId=c00815027
Solution "The issue was attributed to a missing SCSI terminator on the SCSI backplane."
But none of these seem to fit your symptoms as the problem seems to be related to the disks. Have you done any changes updates recently to either server/controller/disks etc?
http://h20564.www2.hp.com/hpsc/doc/public/display?calledBy=&docId=emr_na-c02190973-1&docLocale=
Solution "the SCSI bus backplane was replaced, and that rectified the issue"
http://h20564.www2.hp.com/hpsc/doc/public/display?docId=c00815027
Solution "The issue was attributed to a missing SCSI terminator on the SCSI backplane."
ASKER
Hi,
No I didnt. It was the production server and we didnt do any update or upgrade.
I made a test where I took out the disk number 1 from the array and the it didnt gave the error. It was only saying that there was missing a disk.
The issue is that it dont say wich disk it is faulty also.
Thanks
No I didnt. It was the production server and we didnt do any update or upgrade.
I made a test where I took out the disk number 1 from the array and the it didnt gave the error. It was only saying that there was missing a disk.
The issue is that it dont say wich disk it is faulty also.
Thanks
Lets hope you have good backups
If it boots with disk 1 out you can get the ADU report either under the OS or booting SmartStart CD and we can see what's wrong.
ASKER
I fortunatelly have backups and restore the OS in other server and everything is up and running.
Now i need to restore these server.
I have ordered some disks.
Where I can get the report from the ADU?
Thanks
Now i need to restore these server.
I have ordered some disks.
Where I can get the report from the ADU?
Thanks
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
it could either be cabling or the smart array card