Linux SMP - kernel: scsi : aborting command due to timeout

Hi,

My Linux version 2.4.20-20.9smp (gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)),

Whats wrong, how m i going to fix it .?

from /var/log/messages.* , this error messages appear everyday ::-

Oct 26 04:23:04 gs kernel: scsi : aborting command due to timeout : pid 28374, scsi1, channel 0, id 0, lun 0 Read (10) 00 00 47 4b 8
8 00 00 08 00
Oct 26 04:23:04 gs kernel: sym53c8xx_abort: pid=28374 serial_number=28375 serial_number_at_timeout=28375
Oct 26 04:23:06 gs kernel: SCSI host 1 abort (pid 28374) timed out - resetting
Oct 26 04:23:06 gs kernel: SCSI bus is being reset for host 1 channel 0.
Oct 26 04:23:06 gs kernel: sym53c8xx_reset: pid=28374 reset_flags=2 serial_number=28375 serial_number_at_timeout=28375
Oct 26 04:23:06 gs kernel: sym53c1010-33-1: restart (scsi reset).
Oct 26 04:23:06 gs kernel: sym53c1010-33-1: handling phase mismatch from SCRIPTS.
Oct 26 04:23:06 gs kernel: sym53c1010-33-1: Downloading SCSI SCRIPTS.
Oct 26 04:23:06 gs kernel: sym53c1010-33-1-<0,*>: FAST-80 WIDE SCSI 160.0 MB/s (12.5 ns, offset 62)



>chopped from /var/log/dmesg for reference ::-

SCSI subsystem driver Revision: 1.00
sym53c8xx: at PCI bus 4, device 2, function 1
sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up)
sym53c8xx: 53c1010-33 detected with Symbios NVRAM
sym53c8xx: at PCI bus 4, device 2, function 0
sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up)
sym53c8xx: 53c1010-33 detected with Symbios NVRAM
sym53c1010-33-0: rev 0x1 on pci bus 4 device 2 function 1 irq 53
sym53c1010-33-0: Symbios format NVRAM, ID 7, Fast-80, Parity Checking
sym53c1010-33-0: on-chip RAM at 0xfe9fc000
sym53c1010-33-0: restart (scsi reset).
sym53c1010-33-0: handling phase mismatch from SCRIPTS.
sym53c1010-33-0: Downloading SCSI SCRIPTS.
sym53c1010-33-1: rev 0x1 on pci bus 4 device 2 function 0 irq 52
sym53c1010-33-1: Symbios format NVRAM, ID 7, Fast-80, Parity Checking
sym53c1010-33-1: on-chip RAM at 0xfe9fa000
sym53c1010-33-1: restart (scsi reset).
sym53c1010-33-1: handling phase mismatch from SCRIPTS.
sym53c1010-33-1: Downloading SCSI SCRIPTS.
scsi0 : sym53c8xx-1.7.3c-20010512
scsi1 : sym53c8xx-1.7.3c-20010512
blk: queue c35dea18, I/O limit 1048575Mb (mask 0xffffffffff)
  Vendor: SEAGATE   Model: ST336607LW        Rev: 0006
  Type:   Direct-Access                      ANSI SCSI revision: 03
blk: queue c35de818, I/O limit 1048575Mb (mask 0xffffffffff)
sym53c1010-33-1-<0,0>: tagged command queue depth set to 8
Attached scsi disk sda at scsi1, channel 0, id 0, lun 0
sym53c1010-33-1-<0,*>: FAST-80 WIDE SCSI 160.0 MB/s (12.5 ns, offset 62)
SCSI device sda: 71687372 512-byte hdwr sectors (36704 MB)
qazakaxAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

jlevieCommented:
I've got a couple of systems that have that same SCSI controller and Seagate disk and they don't exhibit the problem you are observing. One runs RH 8.0 and the other RH 9 with the same kernel you have. So I believe it is safe to say that the problem is not generic to that combination. Accordingly, it stands to reason that this problem is some sort of hardware fault with your particular devices. I'd look first at the cable & terminator and then at the disk as the cause.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
qazakaxAuthor Commented:
Hi,

"then at the disk as the cause" ?
you mean the SCSI hardisk got error, "bad sector" ? Need to change the whole SCSI hardisk ..!?


Rdgs,
-Qaz
0
jlevieCommented:
Well it actualy sounds more like a problem with the disk interface electronics than a problem with a bad sector. I'd expect a different error from a bad spot on the drive.
0
jlevieCommented:
Another possibiltiy, now that I've just noticed that you are running an SMP kernel, is that you have a problem with the motherboard on this system and interrupts aren't being correctly handled. That could could result in this sort of error being reported. You could boot the system into uni-processor mode for a while and see if the errors persist. I'd also suggest checking to see if there's a later system BIOS available for your motherboard.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux Distributions

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.