Bad sectors on Proliant DL 380 G6

Proliant DL 380 G6 with bad sectors on Smart Array P410i

I have an older server that I just rebuild with Windows Server 2016 runing HYPER-V

I noticed that it started to reboot self at about the time the backup is started - after investigation I found following events showing just before the server crashes:

Event 153, Disk
The IO operation at logical block address 0x40800 for Disk 3 (PDO name: \Device\000001a9) was retried.

I do not get any errors in the HPE Smart Storage Administrator.

NO "amber" lights on the drives - no weird sounds - obviously something is not being reported properly.

I have  Raid 5 Array build from 7+1 spare 1TB SATA HDDs - with 2 logical drives - so I don't really understand the DISK 3 reference in the error.

I had run Smart Array Diagnostics Report - and I do not see anything wrong - but to be honest unless it will say "failure" i would not know anything from this report.



Any ideas how to troubleshoot which drive is failing would be appreciated
LVL 1
pyotrekAsked:
Who is Participating?
 
pyotrekAuthor Commented:
So far so good - no reboots anymore - I assume that it was and Windows update that created the problem, and next update fixed it.
0
 
Dr. KlahnPrincipal Software EngineerCommented:
Bring up Disk Management the next time the error appears and see which drive is "Disk 3."  It may not be the RAID array.

Note that the disk numbers correspond only so long as there has been no change in the system drive configuration.  If, for example, USB drives are going in and out, every time one comes in or goes out the disk numbers can change.

Disk numbers in Disk Management
1
 
Sam Simon NasserIT Support ProfessionalCommented:
Download HDTune Pro http://www.hdtune.com/files/hdtunepro_570_trial.exe and Install
Click on Health, do you see anything in RED or YELLOW? whats the health status (located in the lower corner)?
344567.jpghealth_failed.png
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
R@f@r P@NC3RVirtualization SpecialistCommented:
Hello,

You can run the smart start of hP to validate if you have problems at the hardware level.

Run the HP diagnostic tools, such as HPS report, log viewer.

At the level of the onboard administrator do you see errors at the disk or controller level?

Regards..
0
 
pyotrekAuthor Commented:
Dr. Klahn - thank you for your response - as I mentioned in my question - I have RAID 5 array with 2 logical drives, and yes I have a USB backup drives that are being changed daily.
I do not have Disk 3 - see below?!?!?!

Disk-Management.JPG
Just as a side note - I got another server Proliant DL380 G7 - similar setup, but new SATA SSD drives  - and just had the same happen last night.
HP Smart Storage Administrator does not show any errors - no failed "amber" lights on the drives.
In this case I get series of :

The IO operation at logical block address 0x40800 for Disk 6 (PDO name: \Device\00006251) was retried.

and the server crashes and restarts itself.

Again I have only 2 logical drives - I wonder if the "Disk 6" refers to physical drive number?
0
 
pyotrekAuthor Commented:
R@f@r P@NC3R - thank you for your comment

I am remote to that server - so it is hard to do the troubleshooting from the pre-OS level.
I guess I will have to go there to see.

At the moment my approach was that I assumed "Disk 3" in the error is referring to physical drive 3 in the cage.
I asked a person at the location to remove drive 3 (The array started rebuild and Spare Drive kicked in) - but I still get the  same error.
0
 
pyotrekAuthor Commented:
Sam Simon Nasser - thank you for your reply.

I tried the software you suggested - but I am not sure that it works With disk arrays.
The Health - does not show anything - maybe I do not know how to use it?

Tried "Error Scan" and got no errors.
0
 
pyotrekAuthor Commented:
UPDATE - after researching the "mighty internets" - it seems that this error is not hardware related (failing that is)
There is a lot of postings from people installing Server 2012 on Proliant Hardware with P410i Smart Array Controller - getting the same error and system crashing at the start of the backup.
The consensus it that it is Windows OS >>driver issue. Those postings are from 2014. I wonder if it got fixed with some windows update, and now comes back on Server 2016.

It kinds of make sense as it was working fine, and it started to fail recently - I recall installing some Windows updates in past few weeks - so maybe one of them is the culprit (The HP Smart Array driver did not change for 10/28/2013 - and there is no newer version)

Since the error comes in series - it floods the system and crashes it.

For now - I changed the schedule of the Backup to run one backup job at the time - so far it seems to be working fine - I guess it puts less stress n the HDD.

Here is interesting article - I will attempt to apply the fixes if it starts failing again.

http://www.pwrusr.com/system-administration/solved-warning-the-io-operation-at-logical-block-address-for-disk-was-retried
0
 
pyotrekAuthor Commented:
Please close - no sure solution found.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.