Bad sectors on Proliant DL 380 G6

Proliant DL 380 G6 with bad sectors on Smart Array P410i

I have an older server that I just rebuild with Windows Server 2016 runing HYPER-V

I noticed that it started to reboot self at about the time the backup is started - after investigation I found following events showing just before the server crashes:

Event 153, Disk
The IO operation at logical block address 0x40800 for Disk 3 (PDO name: \Device\000001a9) was retried.

I do not get any errors in the HPE Smart Storage Administrator.

NO "amber" lights on the drives - no weird sounds - obviously something is not being reported properly.

I have  Raid 5 Array build from 7+1 spare 1TB SATA HDDs - with 2 logical drives - so I don't really understand the DISK 3 reference in the error.

I had run Smart Array Diagnostics Report - and I do not see anything wrong - but to be honest unless it will say "failure" i would not know anything from this report.



Any ideas how to troubleshoot which drive is failing would be appreciated
LVL 1
pyotrekAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Dr. KlahnPrincipal Software EngineerCommented:
Bring up Disk Management the next time the error appears and see which drive is "Disk 3."  It may not be the RAID array.

Note that the disk numbers correspond only so long as there has been no change in the system drive configuration.  If, for example, USB drives are going in and out, every time one comes in or goes out the disk numbers can change.

Disk numbers in Disk Management
1
Sam Simon NasserIT Support ProfessionalCommented:
Download HDTune Pro http://www.hdtune.com/files/hdtunepro_570_trial.exe and Install
Click on Health, do you see anything in RED or YELLOW? whats the health status (located in the lower corner)?
344567.jpghealth_failed.png
0
R@f@r P@NC3RVirtualization SpecialistCommented:
Hello,

You can run the smart start of hP to validate if you have problems at the hardware level.

Run the HP diagnostic tools, such as HPS report, log viewer.

At the level of the onboard administrator do you see errors at the disk or controller level?

Regards..
0
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

pyotrekAuthor Commented:
Dr. Klahn - thank you for your response - as I mentioned in my question - I have RAID 5 array with 2 logical drives, and yes I have a USB backup drives that are being changed daily.
I do not have Disk 3 - see below?!?!?!

Disk-Management.JPG
Just as a side note - I got another server Proliant DL380 G7 - similar setup, but new SATA SSD drives  - and just had the same happen last night.
HP Smart Storage Administrator does not show any errors - no failed "amber" lights on the drives.
In this case I get series of :

The IO operation at logical block address 0x40800 for Disk 6 (PDO name: \Device\00006251) was retried.

and the server crashes and restarts itself.

Again I have only 2 logical drives - I wonder if the "Disk 6" refers to physical drive number?
0
pyotrekAuthor Commented:
R@f@r P@NC3R - thank you for your comment

I am remote to that server - so it is hard to do the troubleshooting from the pre-OS level.
I guess I will have to go there to see.

At the moment my approach was that I assumed "Disk 3" in the error is referring to physical drive 3 in the cage.
I asked a person at the location to remove drive 3 (The array started rebuild and Spare Drive kicked in) - but I still get the  same error.
0
pyotrekAuthor Commented:
Sam Simon Nasser - thank you for your reply.

I tried the software you suggested - but I am not sure that it works With disk arrays.
The Health - does not show anything - maybe I do not know how to use it?

Tried "Error Scan" and got no errors.
0
pyotrekAuthor Commented:
UPDATE - after researching the "mighty internets" - it seems that this error is not hardware related (failing that is)
There is a lot of postings from people installing Server 2012 on Proliant Hardware with P410i Smart Array Controller - getting the same error and system crashing at the start of the backup.
The consensus it that it is Windows OS >>driver issue. Those postings are from 2014. I wonder if it got fixed with some windows update, and now comes back on Server 2016.

It kinds of make sense as it was working fine, and it started to fail recently - I recall installing some Windows updates in past few weeks - so maybe one of them is the culprit (The HP Smart Array driver did not change for 10/28/2013 - and there is no newer version)

Since the error comes in series - it floods the system and crashes it.

For now - I changed the schedule of the Backup to run one backup job at the time - so far it seems to be working fine - I guess it puts less stress n the HDD.

Here is interesting article - I will attempt to apply the fixes if it starts failing again.

http://www.pwrusr.com/system-administration/solved-warning-the-io-operation-at-logical-block-address-for-disk-was-retried
0
pyotrekAuthor Commented:
So far so good - no reboots anymore - I assume that it was and Windows update that created the problem, and next update fixed it.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
pyotrekAuthor Commented:
Please close - no sure solution found.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
proliant

From novice to tech pro — start learning today.