yballan
asked on
HDD becomes read only
Dear Experts,
We have some servers hosted by Peer1, and those servers hold our web-based database program.
Yesterday, one of the servers went down. They were able to bring it back up in about 2hours, but this disrupted our operation severely.
When asked for the cause, this is the explanation I got.
" The system went into read only. This doesn't appear to be from the RAID array but most likely a system error that protected you file system. Its best that you have the RAID one and this can prevent a drive failing and loosing the data on the server.
The following is the output from the system and the drive hours are not brand new but they are acceptable and I am confident in the hardware on your solution.
Power_On_Hours 0x0032 089 089 000 Old_age Always - 8124
Power_On_Hours 0x0032 082 082 000 Old_age Always - 13388
u0 RAID-1 OK - - - 232.82 Ri ON
p0 OK u0 233.81 GB SATA 0 - WDC WD2503ABYX-01WE
p1 OK u0 233.81 GB SATA 1 - WDC WD2503ABYX-01WE "
I don't understand why we even bother to have RAIDs if this could happen.
These servers are supposed to have brand new HDDs.
Is this something that could happen regularly, and how can we prepare for it?
Please advise.
We have some servers hosted by Peer1, and those servers hold our web-based database program.
Yesterday, one of the servers went down. They were able to bring it back up in about 2hours, but this disrupted our operation severely.
When asked for the cause, this is the explanation I got.
" The system went into read only. This doesn't appear to be from the RAID array but most likely a system error that protected you file system. Its best that you have the RAID one and this can prevent a drive failing and loosing the data on the server.
The following is the output from the system and the drive hours are not brand new but they are acceptable and I am confident in the hardware on your solution.
Power_On_Hours 0x0032 089 089 000 Old_age Always - 8124
Power_On_Hours 0x0032 082 082 000 Old_age Always - 13388
u0 RAID-1 OK - - - 232.82 Ri ON
p0 OK u0 233.81 GB SATA 0 - WDC WD2503ABYX-01WE
p1 OK u0 233.81 GB SATA 1 - WDC WD2503ABYX-01WE "
I don't understand why we even bother to have RAIDs if this could happen.
These servers are supposed to have brand new HDDs.
Is this something that could happen regularly, and how can we prepare for it?
Please advise.
Is the drive setup software (host-based driver configuration) or an actual RAID on Chip (hardware accelerated)?
IMNSHO quite frankly a host-based RAID setup plus SATA drives is a recipe for death.
That being said, there should be some events in the server's logs that could indicate what was happening to bring about the full-stop. If the RAID is indeed hardware based then there very well could be some logs in the controller's on board log setup.
Make sure your backups (are there any?) are good!
To answer your question more to the point: Our last SATA based server went out the door about 4 years ago. Our last host-based RAID setup went out the door around 5 or 6 years ago.
There were some very good reasons we stopped using SATA and host-based RAID:
+ Server would go full-stop if a member of the array died
+ Data would be corrupted beyond recoverability (prior to restore from backup)
+ SATA did not, and does not, have the ability to communicate problems
+ Firmware compensation (WD RE, Seagate ES) for RAID was/is flaky
Is the excuse given for the full-stop reasonable? Given my own experience how else is one going to explain the full-stop that is virtually impossible to explain?
If this setup is so critical then it may be time to look at migrating to something more robust as suggested above.
Philip
IMNSHO quite frankly a host-based RAID setup plus SATA drives is a recipe for death.
That being said, there should be some events in the server's logs that could indicate what was happening to bring about the full-stop. If the RAID is indeed hardware based then there very well could be some logs in the controller's on board log setup.
Make sure your backups (are there any?) are good!
To answer your question more to the point: Our last SATA based server went out the door about 4 years ago. Our last host-based RAID setup went out the door around 5 or 6 years ago.
There were some very good reasons we stopped using SATA and host-based RAID:
+ Server would go full-stop if a member of the array died
+ Data would be corrupted beyond recoverability (prior to restore from backup)
+ SATA did not, and does not, have the ability to communicate problems
+ Firmware compensation (WD RE, Seagate ES) for RAID was/is flaky
Is the excuse given for the full-stop reasonable? Given my own experience how else is one going to explain the full-stop that is virtually impossible to explain?
If this setup is so critical then it may be time to look at migrating to something more robust as suggested above.
Philip
ASKER
Dear pgm554,
Thank you for your response. As I am still living in the RAID SATA world, I am not familiar with what you are referring to as " high availability " solution. If I wanted to still have a hosting company to host our servers, does this mean that I need to look for a company that offers redundant HW/SW?
Dear MPECSInc,
Thank you for your response, now I realize that our hosting company is using an older technology. I am not quite sure what you mean by "something more robust as suggested above". Are you referring to redundant HW/SW? Are there any services you recommend?
To Both Experts,
I am clearly not up to speed in this subject, so I would appreciate any recommendation/guidance.
Thank you!!
Thank you for your response. As I am still living in the RAID SATA world, I am not familiar with what you are referring to as " high availability " solution. If I wanted to still have a hosting company to host our servers, does this mean that I need to look for a company that offers redundant HW/SW?
Dear MPECSInc,
Thank you for your response, now I realize that our hosting company is using an older technology. I am not quite sure what you mean by "something more robust as suggested above". Are you referring to redundant HW/SW? Are there any services you recommend?
To Both Experts,
I am clearly not up to speed in this subject, so I would appreciate any recommendation/guidance.
Thank you!!
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you both for educating me on this matter, I really appreciate it!
There are a lot of ways to achieve high availability 99.999% uptime(clusters and vm fail overs).
But the solutions are not cheap.
RAID has very little to do with high availability.
It's redundant hardware/software and how it's configured.