Start Free Trial

asked on

We occasionally get a driver controller error (event ID) 11, but drives check out ok. Cause for concern?

Hi Experts, we run three different external hard drives on our main system for keeping copies of important data. These drives are used regularly. They are all fairly new and so is the system. The system itself is a brand new install of Windows 7, and the diagnostics have been run on the system and it seems to be running flawlessly and with the latest drivers.

However, we occasioanlly get the following error in the system log "The driver detected a controller error on \Device\Harddisk1\DR5."

It's USUALLY when the drive is first plugged in for the day, however, not necessarily. Sometimes it occurs sporadically once or twice throughout the day. It is NOT when the drive is being used. Full backups are sent to these drives regularly and it never seems to throw these events then. Backups complete fine and verify fine.

The drives are all Seagate (purchased within a week or two of each other) and Seatools has been run on all of them, says they are all good. Also, it seems to happen with all three drives.

Anyone have any input on the error? Thanks!

Log Name:      System
Source:        Disk
Event ID:      11
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      PC1
Description:
The driver detected a controller error on \Device\Harddisk1\DR5.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Disk" />
    <EventID Qualifiers="49156">11</EventID>
    <Level>2</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2010-06-30T19:53:17.695922100Z" />
    <EventRecordID>5181</EventRecordID>
    <Channel>System</Channel>
    <Computer>PC1</Computer>
    <Security />
  </System>
  <EventData>
    <Data>\Device\Harddisk1\DR5</Data>
    <Binary>0E01680001000000000000000B0004C003010000000000000000000000082D0000000000000000009B62850000000000FFFFFFFF0600000040000000000000000000061208000010000000003C00000000000000C06DFE850000000020C2CF89000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000</Binary>
  </EventData>
</Event>

Open in new window

In almost all cases, the event id 11 message is being posted due to hardware problems with either the controller or, more likely, a device that is attached to the controller in question. The hardware problems can be associated with poor cabling, incorrect termination or transfer rate settings, lazy or slow device responses to relinquish the SCSI bus, a faulty device, or, in very rare cases, a poorly written device driver.

Please folow below link..
http://support.microsoft.com/default.aspx/kb/154690
http://windows.microsoft.com/en-us/windows7/Check-a-drive-for-errors

Regards
RS

ASKER

Thanks. We looked at the Microsoft support article before making the EE post. It didn't get us very far though. The hardware seems to check out fine by all diagnostics.

ASKER

Windows 7 has an advanced power save feature for "USB selective suspend setting" which is enabled by default. Any chance these errors occur when the drive goes into suspend mode?

it may happen just disable it and see for sometime ....

When Windows tries to access the suspended drive you can get these even id 11 errors.

Also look this MS object: http://www.microsoft.com/technet/support/ee/transform.aspx?ProdName=Windows%20Operating%20System&ProdVer=5.2&EvtID=11&EvtSrc=disk&LCID=1033

BTW, do you have card reader device in this machine?

Look here what device the error is referencing to. Start - type regedit - enter. Follow this path:
HKEY_LOCAL_MACHINE\SYSTEM\MountedDevices
Yours is \Device\Harddisk1\DR5.
Is there such device listed at all?

ASKER

Hi Noxcho. Thanks. That would make sense. However, after revewing the log for several days it seems that although it does occasionally occur throughout the day at some random time (which might match the suspended drive theory), the most common time it occurs is once a day then the drive is replaced (IE: in the morning when drive A is changed for Dive B, or the next day when Drive B is changeed out for Drive C, etc).

The Technet article you linked to has some details our error does not mention. Specifically we don't get the "IO_ERR_CONTROLLER_ERROR" part of the message or the 1% at the end of the controller message.

I will check the registry this evening when the machine is not in use. There is no card reader on it.

Can you check if VSS snapshots are taken to external drive or from it? In another thread we discovered a problem where snapshots were taken (VSS attempted to take them) even when they were disabled. And sure the event log was giving various errors.

The shadow copies theory would explain why does the error occur when you exchange the drive. Some access operations are taking place at this moment.

ASKER

Hi Noxcho, VSS snapshots are taken via Norton Ghost for backups which are stored on this drive. However, no backups are running at this time. Also, the drive is always removed via "safely remove drive" and it does release the drive. Also, when doing a quick test remotely yesterday evening, we can choose to safely remove the drive (and then rebooted to re-initialize the drive) and it doesn't throw the error. Is it possible something different is occurring on the I/O when it's actually physically removed?

I/O means that some read'write operation was about to take place. But it could not access the drive. You need to find out which one. Look through logs and see what is starting at the same moment when the error occurs.
And according to your last statement. Does it mean that since reinit you do not get the Event 11 anymore?

ASKER

No, what I meant was, when doing a safely remove (but not actually unplugging the drive) and then restarting the machine . . . we don't get the event 11 ID. It seems to only occur when the drive is actually physically swapped in the AM (and then occasionally random times after that).

Hm, this is interesting. Does the error occur after drive removal or at the time when swapping is going on?

ASKER

Difficult to tell exactly as we have a volunteer staff and they swap the drives. It's always around 9 or 10 AM and that's when the error is occurring.

Assign the task to someone over there and get exact time they swap the drives. This could be the key to the answer.

ASKER

Okay. In the meantime, perhaps the application log might have a corresponding entry to the system log event 11 if there is indeed a snapshot taking place? We can check that this evening.

ASKER

Going to go now and do the swap on-site so we can get a first hand answer. WIll post back shortly.

ASKER

Okay, took a look. It looks like it does occur when the drive is clicked to safely remove. It occurs immediately in the system log, along with several informational messages about the virtual disk service stopping (or starting).

Nothing in the application log.

Does it remove the drive without any problem? I still think that Windows is trying to access the drive and by removing it you interrupt these attempts.
Have you tried to connect the drive to another computer and see if new machine has same IDs in Event Log?

ASKER

The drive does remove without any problems. It doesn't seem to appear when plugged into a random workstation (just long enough to look at the folders, close it and remove it)..

Did you reformat any of these drives since purchase? If not can you reformat one (delete the partition from it and create new one via Windows Disk Management of your server)?

ASKER

Yep, did that on all three drives recently as part of our pre-EE post troubleshooting.

Ok. Then I can resume that this error is just your system reaction to drive removal. There are some more errors generated similar way such as ID 51 for example.
If the backup is running properly simply ignore this error.

ASKER

I've seen paging error 51. That seems to occur on Windows XP, but NOT Windows 7. However, it does not seem to occur on removal, but randomly. This error seems to be happening almost every time the drive is removed.

However, like I said earlier, it does on occasion occur during the day at a random time. That would mean it's not JUST on removal, correct?

ASKER

Hi Noxcho, doesn't look like there is anything obvious that points to that DR# (IE: DR3) portion of the event 11 details. Attached is a screenshot of that registry area. Zoomed to 200% in Word it should be easy to read. FYI, the external drive currently connected is the E: drive.
mounted.doc

ASKER

Okay, please see the above post also. But we dug through the logs more. Although it doesn't happen everytime, we found two additional messages that might help. One of the last few times the message was generated, we got the following events also.

Event ID 225:
The application System with process id 4 stopped the removal or ejection for the device USB\VID_0BC2&PID_2101\2GE4ECS0.

and

Event ID 20011:
The application System with process id 4 stopped the removal or ejection for the device USB\VID_0BC2&PID_2101\2GE4ECS0.

ASKER

ID 20011 should have read:
Device action request for device 'USB\VID_0BC2&PID_2101\2GE4ECS0' was vetoed by 'STORAGE\Volume\{2fc723a4-7dc7-11df-af89-a4badb0318b0}#0000000000100000' with veto type 5.

So you need to know now which process in your system has id 4.
What I have found: http://social.technet.microsoft.com/Forums/en-US/itprovistasecurity/thread/4d7d110c-c82a-40b7-812a-821e5c0da2c1

ASKER

Tried running Process Explorer already. Nothing shows up under that process ID. I did use Handler from technet to see what has the drive open. It shows System Volume Information and BESR's directories are being viewed.

Found this "The "System" process (PID 4) is actually the NT Kernel. As such, it is outside the usual user-mode process space, but it's called "System" in Task Manager and some other tools, as a convenient name."

I think that System Volume Information is usually used to store restore points for the drive. As for BESR, I am out of idea what service of BESR could browse this drive, better to open a thread with Symantec.

ASKER

Thanks. Checked and this drive is not being imaged by system restore points. BESR was no help either. Tried killing the BESR service, seemed to work eventually but in the end got the error removing with it stopped.

I am out of ideas =))

ASKER

Thanks anyway! If BESR truly has the drive in use, would that cause the error? More frequently on Windows 7, removing the drive in use throws an event 57 in our experience. More importantly, do you anticipate long term harm on the drive this way?

If BESR really uses this drive then it could cause this error for sure. The understanding of this error is that something (some application or service) is using this drive but as soon as you remove it - it gives you I\O error (in different words). So this is normal IMHO.
And I don't think this could cause problems to drive itself.

ASKER

Disabled the BESR service for about 45 minutes, happened again even with the service off. Oddly enough, Windows allows you to click safely remove, and it then says "the device can now be safely removed" but it still throws this error at the exact same time.

ASKER

What's more odd . . . . According to Microsoft Enterprise Support, the Hard disk number and the DR number should match. However, we get random numbers at the end. IE: \Device\HardDisk1\DR5, then the error can be shown with \Device\HardDisk1\DR1, and then \Device\HardDisk1\DR2, etc. We do rotate three drives (and they all throw it), but more interestingly this evening while troubleshooting the same drive was inserted the whole time and after a while it stopped throwing \Device\HardDisk1\DR8 and moved on to throwing \Device\HardDisk1\DR1's

Get to console - Start - cmd - type mountvol and see which drive this volume is referring to \{2fc723a4-7dc7-11df-af89-a4badb0318b0}#0000000000100000'

ocanada_techguy

Random drive numbers, uh buoy, sounds like that stupid "paging error" of XP all over again eh jsmply, but of course not, different.
Perhaps it is VSS, perhaps VSS runs low priority in the backgroud (as you might expect) and Win7 considers it "safe" to remove regardless of whether it is finished or caught up, and could it be the MS engineers decided to deem it safe to eject the drive regardless? If it's something hidden in the protected space of the kernel then outside processes likely cannot inquire against it, and they can't very well wait for the kernel to stop can they? You and noxcho are already considering that evidently.
Here's a thought: I am getting the impression though I'm not yet certain of it, that "indexing" of drives is being put on each drive instead of all under C:\program files of the old search 4.0, and quite possibly under system volume information branch? So in a similar way could indexing be "always" going and log a message when the safe to remove "closes the door on it's foot"?

ASKER

Hey Noxcho and Ocanada. Noxcho we will check that right away and post again. Ocanada any idea where to check what you suggested in Windows 7? Just for the fun of it we checked with Dell support since the machine is rather new. They checked everything we did, drivers, diagnostics, etc. Ended up saying they have no further ideas and were willing to ship a new MOBO if we want. Just wanted to try to exhaust all options before changing the MOBO on a brand new machine. What do you both think?

Chances that MOBO exchange will repair the problem are really low. Problem is in Windows IMHO. If you have a spare HDD you can connect to this machine and install fresh copy of Windows and test the connection of USB Drive to it - then Even Log Viewer, that would be the best test. This will let you remove BESR from suspicious list.

ASKER

Well we tried a fresh install of Windows 7 recently, that did not help. Also running a near identical machine with BESR that we put together just for testing, and this issue doesn't occur there. Will have your info from mountvol and will check indexing in 5 minutes and will post again.

ASKER

Noxcho - ran Mountvol from the command prompt. That entry is not there (but remember that event your referencing was 2 days ago now). We have Volumes for C:\ D:\ and E:\ and then one that says *** NO MOUNT POINT ***

Ocanada, the external hard drive(s) are not set as index locations.

ASKER

Don't think it's BESR related. Got the error yesterday twice with the service stopped. Today we connected this afternoon to test this again. As soon as we clicked on the little green icon to safely remove hardware, it threw the error event before telling it to remove. Which means, don't see BESR causing it. Every driver we can find on the system is now up to date.

ASKER

Spoke with Seagate support today. Thought it would be one last spot in the equation. They seem to think it's either related to the Seagate drive sleep setting, or just the way the bridge card on the drives work. They had us change the sleep setting to never. Going to give it another 30 minutes or so and try again. If it's the bridge card, they say there is no "fix" as it might just be the way it's designed. . . . . not very reassuring.

ocanada_techguy

Honestly, I'm "reaching" and guessing too.
Have you tried the free microsoft aquired systernals process explorer to see the subthreads and open objects of process 4? You could maybe get more information than just process id gleaned from the event log description.
Thinking about it I would not expect indexing to run inside the kernel, although virtual memory management certainly would. Starting to seem awfully similar to that xp paging error stupidity.
Indexing and VSS are services you should be able to stop, if they somehow get restarted temporarily set them disabled if you really want to rule them out. noxcho was involved and I noticed the case as he said where it seemed like VSS was happening on disks where the settings were it was supposed to be turned off for those disks.
Wow the identical machine does not do it eh. Hmmm.
Well, you have to figure things like spin-up and seek are going to vary disk to disk according to circumstances, so maybe some operations take longer or have retries on one yet not the other. Maybe it's a combination of windows and drivers and hardware sort of timing-out that all just happen to coincide. I would doubt mobo replacement too. I'd be suspecting some non-critical issue with the confluence of device drivers and windows, and possibly some hotfixes playing a role even. If you had an identical machine heck you could swap drives and cards yourself. The key word as you put it is "near" identical, how near, same mobo disk controller graphics?
Even some graphics "share" memory with the the mobo and if said memory were swapped by virtual memory management (ever see a graphics card painting the screen reallly slowly, then speeds back up, that's usually what's happening then)
Son of a bleep eh stupid frigger frack Fred Flinstone bowling ball on the toe bleep.

Think of it this way though, it happens when you ask to safely remove, so what if, to fulfill that request it's trying to flush out writes, gets a bit of a traffic jam, and logs the error "warning" you notice BUT suppose it KEEPS retrying anyway and does sucesfully finish what it needs to and says ok safe to remove. Well, you'd be both safe, functioning, but also have those log events that you see, right?

To "truly" solve this, you'd likely have to install the debugging version of windows and put windows into step-by-step debugging mode and might have to have the debugging code of the drivers in question that come into play. In other words the sort of thing Microsoft and Dell and it's hardware driver makers keep to themselves, should have to be doing, but likely won't unless this were a bug with adverse effects and consequences, of which as best we can tell there doesn't seem any. Dell is equally puzzled eh. Well, if it were kicked to a tier 3 engineer they might be curious but honestly when they're up to their necks in crocodiles do they have time for my frog makes croaking noises? Um no.

cue Fred Flinstone "friger frickin... grumble

ASKER

Just to note, that process ID 4 message that showed up you and Noxcho are talking about, that only happended ONCE in two weeks and corresponded with the event ID 11. The rest of the time, we just get this error by itself. It either happens when safely removing (but here is the kicker, not ALWAYS). And, it seems to occasionally just occur throughout the day for no reason. It's not a warning like the paging error in XP, this is an actual error.

In Regards to the similiar hardware, it's not identical. It's the same ext drive and operating systems. The mobo/graphic cards are different though.

SOLUTION

ocanada_techguy

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

Thanks. Just want to make sure in the long run these errors won't account to bad data the day we go to retrieve a backup off the drives. Let it run for about an hour and a half after changing the sleep setting and then did a removal without any errors. We shall see . . .

As a last resort you can replace the mobo in case if this error is caused by USB concentrator and see if the problem remains. I would personally give it a try.

ASKER

Well, so far so good. Its been almost 24 hours (19 to be exact) that it sat idle. Tried a safe removal today and it generated no error. There have not been any since changing the sleep setting. Will keep an eye on it for the next few days. Tech support said if the issue is fixed, to freshly format the drives to make sure there was no damage left behind. Anyone see a need on that?

ASKER

Just to clarify, we are talking about the sleep mode on the seagate drive itself via their manager software, not a windows setting. We shall see if its the final answer or not. Interestingly, it looks like others have used the sleep mode setting to fix those event Id 57 errors that we spent a while chasing on Windows XP. Noxcho/Ocanada, do you think the repartition/format is neccesary if its fixed? If so we would probably have to do a seperate one each week as these hold backups.

I don't think if repartitioning is needed now. And I guess SG support is making double assurance that you have no errors in file systems of these drives via this reformat.

ASKER

Thanks. Still good so far, no random event 11's throughout the day and did a few more removals with no errors . . . might be finally solving it. Thank God, chasing that "crown for wild goose chases" was getting old Ocanada! We shall see. Will post again after the weekend.

ASKER

Well, it looks like the sleep function really might have been the cause. After changing the setting, there were no errors throughout the weekend. Today, that drive was swapped for the next drive in the rotation and there was NO error during the swap. This new drive (connected today) in the rotation had not been changed yet to the new sleep setting in the Seagate firmware. Sure enough, removing that drive generated the error. Just changed that one as well. One more drive in the rotation to change (tomorrow) and hopefully they will remain problem free.

It looks like this question solved the mystery behind the event id 57 paging errors in Windows XP also. It looks like the sleep setting was that cause also.

According to Seagate, changing the sleep setting should not significantly decrease drive life and will keep the error from coming back. I'm going to close this out soon. Anyone have any input on how to fairly divide the points? Everyone was helpful, despite finding the solution through different means.

ASKER CERTIFIED SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

The solution ended up being to disable sleep mode on the Seagate manager software (which changes the setting on the drive).

I hope Noxcho and Ocanada find the point distribution okay. Split between the two of you as you helped the longest, with a higher amount going to Noxcho just because of the number of posts he contributed.

Thanks!

Thanks for points. This will be the first suggestion I provide next time when error id 11 is reported.
Take care and have a nice day
Nox

ocanada_techguy

Wow. Well done and kudos! I'd say Seagate owes you some form of thanks.
Really, what's needed is for Microsoft and Seagate engineer to ensure that in fact that error results in a retry and is LOSSLESS, that would be key, or else the sleep/wakeup handling would have to be considred flawed I should think.
Gee thanks. I've seen some give themselves points, if ever it was deserved this would be one.
Again, well done.