CAN'T FIX the Event ID 11 : The driver detected a controller error in \device\scsi\adpu3201

Really if someone can help me to fix that, he's my master!
I have backup problems since December 07. I get an event id 11 (content below) and few miliseconds later, Arcserve (v11.5SP3) logs an error 6300.
I first called IBM and they came here to change my tape drive (HP Ultrium LTO 1). On the same evening, after few tests, I got again the event ID 11 some they changed the SCSI adapter and cable also. When they left, I tested and got the event ID 11 again...
On the week after, I double checked the drivers versions and the firmware. Everything was looking fine. But the error persists. I uninstalled Arcserve and reinstalled it. Not solved. I also tried Arcserve 12 this weekend without success. As before, the backup starts well and runs for 10 or 20 minutes and then, Event ID 11 appears and few seconds later error 6300 in the Arcserve log. The job stops.
I did a test using NTbackup which worked for 3 hours without error.
The arcserve backup job on this server was working for 3 years...
Please help me to fix that... I really did my best but I'm blocked now.

OS: Windows Server Std 2003 SP2 Eng.

Event Type:      Error
Event Source:      adpu320
Event Category:      None
Event ID:      11
Date:            2008-03-02
Time:            10:33:41
User:            N/A
Computer:      XXXX
Description:
The driver detected a controller error on \Device\Scsi\adpu3201.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 0f 00 10 00 01 00 68 00   ......h.
0008: 00 00 00 00 0b 00 04 c0   .......À
0010: 48 ff 30 c1 00 00 00 00   Hÿ0Á....
0018: 00 00 00 00 00 00 00 00   ........
0020: 00 00 00 00 00 00 00 00   ........
0028: 00 00 00 00 00 00 00 00   ........
0030: 00 00 00 00 06 00 00 00   ........


virginie8Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

honmapogCommented:
Did they replace the terminator along with the drive? Or is it built-in?

This is some sort of hardware problem. One of the replacement parts may have been defective (refurbished?) as well.

Possibly move your SCSI card to another slot on the motherboard. There could be a problem with one of the slots, or with a card riser card if your server uses one of those.

You could possibly get some more info from the data part of the event log entry. I believe the event log entry you pasted shows the data in bytes. If you change this to words, and then look at appendix B in http://download.adaptec.com/pdfs/user_guides/ULTRA320_UG_EN.PDF you may get some information to troubleshoot this.
0
virginie8Author Commented:
Hi Honmapog; Thank you for your reply;

1: EVENT LOG ANALYSIS
----------------------------------
I already red the doc contained in the link that you send. But the error code that I get is not listed.
"In the Data section of the dialog box, the entry in the second row
and second column (to the right of the 0010: entry) lists the error
message generated by the driver."
The code that I get from the event log in WORDS mode view is: c130ff48
I don't know what it means.

2: SERVER HARDWARE PCI SLOT
---------------------------------------------
The server is an IBM xseries 225. (4 years old)
Yes the SCSI card is inserted in a slot (PCI-X 100Mhz). There's also a network card (unused - disabled in device manager) inserted in the slot next to the scsi adapter slot. This GB 3COM network adapter is there for 3 years now (even if not used).

Is there a way to test a PCI-X slot with a diagnostic tool?
0
honmapogCommented:
No way to test as far as I know - you could ask IBM.
Can you swap network adapter and SCSI card?
0
The Five Tenets of the Most Secure Backup

Data loss can hit a business in any number of ways. In reality, companies should expect to lose data at some point. The challenge is having a plan to recover from such an event.

virginie8Author Commented:
I have a maintenance planned Wednesday morning early.
I will keep you informed of the results.
Thank you very much.
0
virginie8Author Commented:
As planned, I removed the 3COM network card and insert in is slot the SCSI card. After that, I reinstalled the drivers as asked by the OS and after a reboot got again... the event ID 11 but faster than before... 1 minute after the backup started.

Adaptec SCSI card
-Slot before: PCI 2
-Slot now: PCI 3

3Com network card
-Slot before: PCI 3
-Slot now: "removed"

I red in IBM documentation that PCI slot 2 and 3 works together as Channel A et slot 4 and 5 (where IBM SCSI card is inserted) works together as Channel B.

Might there be a problem with Channel A?

Please help me, I'm almost lost now..

Thank you.
0
honmapogCommented:
It could be a problem with Channel A.
Or you could have received a defective drive or cable replacement.

If all firmware and drivers are up to date, it is 99% certain that this is a hardware fault. You could go into the Adaptec BIOS and try lowering the speed of the SCSI channel to see if that helps. Personally I would get IBM involved again - tell them about your doubts over Channel A or ask them to replace the complete SCSI chain again.
0
virginie8Author Commented:
Here is the log from tape.log (arcserve tape engine log). It contains the details about the error that I get.
At 9:14:36, I get an entry in the event log Event ID 11.

Later on, I disabled the tape device (in device manager) as recommended by Computer Associates (ArcServe) and I deleted all the registry entries below the "tape engine" in HKLM. After that, I was able to complete sucessfully a backup job of 15GB. Then I tried a second on of 30GB. Then the server had a blue screen saying:

Hardware Malfunction call you hardware venfor for support.
NMI: Parity check / memory Parity error
*** The system has halted***

It is the second time that I get this error when I try to backup data. Last time, I performed a testmem86 of my 4GB and the test was successfull. I also performed the microsoft memory test.

I start to feel a bit sick of this problem. I have this server since 4 years now and the backup were working well until last December. I didn't change anything.

Do you still think that it is an hardware problem? IBM runs away of my probem because NT backup works well.

2008/03/12 09:14:36 [0f88] 03/12 09:14:36 ABSL:3060 SCSICommandRetry: NT Error that can be retried. CMD:[0A], Error 1117
2008/03/12 09:14:40 [0f88] UnExpected SCSI Code ==> JobID[13], Device[5], ABSL[3 0 6 0], SN[HU10723WTR], CMD[34h], [Scsi Bus Reset Occurred], ASC[29], ASCQ[02]
2008/03/12 09:14:40 [0f88] =>ABSL:3060 Command:[READ POS            ] <Drive>
2008/03/12 09:14:40 [0f88]   ABSL:3060 Sense Key status: Unit Attention [06]
2008/03/12 09:14:40 [0f88]   ABSL:3060 Additional sense: Scsi Bus Reset Occurred [29, 02]
2008/03/12 09:14:40 [0f88] ABSL:3060 SCSICommandRetry: Error with ReadBlock Address :CMD:[03] - Let's Retry!
2008/03/12 09:14:40 [0f88] ABSL:3060 SCSICommandRetry: Block Positon at ZERO, space to EOD and get new Block Number.
2008/03/12 09:14:40 [0f88] ABSL:3060 SCSICommandRetry: Block Different by more than 64K. [14499584] [14499456]
2008/03/12 09:14:40 [0f88] JobID[13] SCSIPort Error DevType[5] LDN[5] SN[HU10723WTR] CMD[READ POS] Sent[09:14:40] MS Error[1117]
2008/03/12 09:14:40 [0f88] =>ABSL:3060 CMD<WRITE> NT Error that can be retried. Retry Count:0
2008/03/12 09:14:40 [0f88] =>ABSL:3060 Error MS=1117 [The request could not be performed because of an I/O device error.
]
2008/03/12 09:14:40 [0f88] =>ABSL:3060 Error. MS=1117 [The request could not be performed because of an I/O device error.

Open in new window

0
honmapogCommented:
Make sure the "Removable Storage" Service in Windows is stopped and disabled. Especially after testing with NTBackup.
The disabling of the tape device was also important. See http://www.tek-tips.com/faqs.cfm?fid=3987

Yes, I do still think this is some sort of hardware. Also because you didn't change anything and all worked fine for 4 years.

Have IBM check into the Parity Error at least. No software should be causing that.
0
virginie8Author Commented:
I just called IBM.
They will come here this weekend and change:
-SCSI cable
-RAM memory
-Motherboard
I will let you know if that solved my prolem.
Thank you Honmapog for your help.
0
virginie8Author Commented:
Ok, there was 2 problems:

1- Event ID 11. The problem was an incompatibility problem between IBM ServeRaid agent running and Computer Associates ArcServe. When I set the IBM ServeRaid Agent to disabled, I don't have the Event ID 11 anymore.

2- Memory problem: Even if the Kingston memory problem was supposed to be "compatible", it wasn't. I had to put back the IBM memory and the blue screen never came back.

Thank you Honmapog for your help.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Storage

From novice to tech pro — start learning today.