Link to home
Start Free TrialLog in
Avatar of virginie8
virginie8Flag for Canada

asked on

CAN'T FIX the Event ID 11 : The driver detected a controller error in \device\scsi\adpu3201

Really if someone can help me to fix that, he's my master!
I have backup problems since December 07. I get an event id 11 (content below) and few miliseconds later, Arcserve (v11.5SP3) logs an error 6300.
I first called IBM and they came here to change my tape drive (HP Ultrium LTO 1). On the same evening, after few tests, I got again the event ID 11 some they changed the SCSI adapter and cable also. When they left, I tested and got the event ID 11 again...
On the week after, I double checked the drivers versions and the firmware. Everything was looking fine. But the error persists. I uninstalled Arcserve and reinstalled it. Not solved. I also tried Arcserve 12 this weekend without success. As before, the backup starts well and runs for 10 or 20 minutes and then, Event ID 11 appears and few seconds later error 6300 in the Arcserve log. The job stops.
I did a test using NTbackup which worked for 3 hours without error.
The arcserve backup job on this server was working for 3 years...
Please help me to fix that... I really did my best but I'm blocked now.

OS: Windows Server Std 2003 SP2 Eng.

Event Type:      Error
Event Source:      adpu320
Event Category:      None
Event ID:      11
Date:            2008-03-02
Time:            10:33:41
User:            N/A
Computer:      XXXX
Description:
The driver detected a controller error on \Device\Scsi\adpu3201.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 0f 00 10 00 01 00 68 00   ......h.
0008: 00 00 00 00 0b 00 04 c0   .......À
0010: 48 ff 30 c1 00 00 00 00   Hÿ0Á....
0018: 00 00 00 00 00 00 00 00   ........
0020: 00 00 00 00 00 00 00 00   ........
0028: 00 00 00 00 00 00 00 00   ........
0030: 00 00 00 00 06 00 00 00   ........


Avatar of honmapog
honmapog
Flag of Ireland image

Did they replace the terminator along with the drive? Or is it built-in?

This is some sort of hardware problem. One of the replacement parts may have been defective (refurbished?) as well.

Possibly move your SCSI card to another slot on the motherboard. There could be a problem with one of the slots, or with a card riser card if your server uses one of those.

You could possibly get some more info from the data part of the event log entry. I believe the event log entry you pasted shows the data in bytes. If you change this to words, and then look at appendix B in http://download.adaptec.com/pdfs/user_guides/ULTRA320_UG_EN.PDF you may get some information to troubleshoot this.
Avatar of virginie8

ASKER

Hi Honmapog; Thank you for your reply;

1: EVENT LOG ANALYSIS
----------------------------------
I already red the doc contained in the link that you send. But the error code that I get is not listed.
"In the Data section of the dialog box, the entry in the second row
and second column (to the right of the 0010: entry) lists the error
message generated by the driver."
The code that I get from the event log in WORDS mode view is: c130ff48
I don't know what it means.

2: SERVER HARDWARE PCI SLOT
---------------------------------------------
The server is an IBM xseries 225. (4 years old)
Yes the SCSI card is inserted in a slot (PCI-X 100Mhz). There's also a network card (unused - disabled in device manager) inserted in the slot next to the scsi adapter slot. This GB 3COM network adapter is there for 3 years now (even if not used).

Is there a way to test a PCI-X slot with a diagnostic tool?
No way to test as far as I know - you could ask IBM.
Can you swap network adapter and SCSI card?
I have a maintenance planned Wednesday morning early.
I will keep you informed of the results.
Thank you very much.
As planned, I removed the 3COM network card and insert in is slot the SCSI card. After that, I reinstalled the drivers as asked by the OS and after a reboot got again... the event ID 11 but faster than before... 1 minute after the backup started.

Adaptec SCSI card
-Slot before: PCI 2
-Slot now: PCI 3

3Com network card
-Slot before: PCI 3
-Slot now: "removed"

I red in IBM documentation that PCI slot 2 and 3 works together as Channel A et slot 4 and 5 (where IBM SCSI card is inserted) works together as Channel B.

Might there be a problem with Channel A?

Please help me, I'm almost lost now..

Thank you.
It could be a problem with Channel A.
Or you could have received a defective drive or cable replacement.

If all firmware and drivers are up to date, it is 99% certain that this is a hardware fault. You could go into the Adaptec BIOS and try lowering the speed of the SCSI channel to see if that helps. Personally I would get IBM involved again - tell them about your doubts over Channel A or ask them to replace the complete SCSI chain again.
Here is the log from tape.log (arcserve tape engine log). It contains the details about the error that I get.
At 9:14:36, I get an entry in the event log Event ID 11.

Later on, I disabled the tape device (in device manager) as recommended by Computer Associates (ArcServe) and I deleted all the registry entries below the "tape engine" in HKLM. After that, I was able to complete sucessfully a backup job of 15GB. Then I tried a second on of 30GB. Then the server had a blue screen saying:

Hardware Malfunction call you hardware venfor for support.
NMI: Parity check / memory Parity error
*** The system has halted***

It is the second time that I get this error when I try to backup data. Last time, I performed a testmem86 of my 4GB and the test was successfull. I also performed the microsoft memory test.

I start to feel a bit sick of this problem. I have this server since 4 years now and the backup were working well until last December. I didn't change anything.

Do you still think that it is an hardware problem? IBM runs away of my probem because NT backup works well.

2008/03/12 09:14:36 [0f88] 03/12 09:14:36 ABSL:3060 SCSICommandRetry: NT Error that can be retried. CMD:[0A], Error 1117
2008/03/12 09:14:40 [0f88] UnExpected SCSI Code ==> JobID[13], Device[5], ABSL[3 0 6 0], SN[HU10723WTR], CMD[34h], [Scsi Bus Reset Occurred], ASC[29], ASCQ[02]
2008/03/12 09:14:40 [0f88] =>ABSL:3060 Command:[READ POS            ] <Drive>
2008/03/12 09:14:40 [0f88]   ABSL:3060 Sense Key status: Unit Attention [06]
2008/03/12 09:14:40 [0f88]   ABSL:3060 Additional sense: Scsi Bus Reset Occurred [29, 02]
2008/03/12 09:14:40 [0f88] ABSL:3060 SCSICommandRetry: Error with ReadBlock Address :CMD:[03] - Let's Retry!
2008/03/12 09:14:40 [0f88] ABSL:3060 SCSICommandRetry: Block Positon at ZERO, space to EOD and get new Block Number.
2008/03/12 09:14:40 [0f88] ABSL:3060 SCSICommandRetry: Block Different by more than 64K. [14499584] [14499456]
2008/03/12 09:14:40 [0f88] JobID[13] SCSIPort Error DevType[5] LDN[5] SN[HU10723WTR] CMD[READ POS] Sent[09:14:40] MS Error[1117]
2008/03/12 09:14:40 [0f88] =>ABSL:3060 CMD<WRITE> NT Error that can be retried. Retry Count:0
2008/03/12 09:14:40 [0f88] =>ABSL:3060 Error MS=1117 [The request could not be performed because of an I/O device error.
]
2008/03/12 09:14:40 [0f88] =>ABSL:3060 Error. MS=1117 [The request could not be performed because of an I/O device error.

Open in new window

Make sure the "Removable Storage" Service in Windows is stopped and disabled. Especially after testing with NTBackup.
The disabling of the tape device was also important. See http://www.tek-tips.com/faqs.cfm?fid=3987

Yes, I do still think this is some sort of hardware. Also because you didn't change anything and all worked fine for 4 years.

Have IBM check into the Parity Error at least. No software should be causing that.
I just called IBM.
They will come here this weekend and change:
-SCSI cable
-RAM memory
-Motherboard
I will let you know if that solved my prolem.
Thank you Honmapog for your help.
ASKER CERTIFIED SOLUTION
Avatar of virginie8
virginie8
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial