paul williams
asked on
Buffer I/O error on device sdb, sdc, sdd, sde, sdf, sdg, sdi - Oracle VM 3.2.8
Version 3.2.8 of OVM. Get these errors when rebooting VM Server. Rebooting takes about 20 mins.
Clip from /var/log/messages:-
Jan 7 10:45:04 svr440 kernel: Buffer I/O error on device sdi, logical block 31457251
Jan 7 10:45:04 svr440 kernel: Buffer I/O error on device sdi, logical block 31457252
Jan 7 10:45:04 svr440 kernel: sd 9:0:0:1: [sdi] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 7 10:45:04 svr440 kernel: sd 9:0:0:1: [sdi] Sense Key : Illegal Request [current]
Jan 7 10:45:04 svr440 kernel: sd 9:0:0:1: [sdi] <<vendor>> ASC=0x94 ASCQ=0x1ASC=0x94 ASCQ=0x1
Jan 7 10:45:04 svr440 kernel: sd 9:0:0:1: [sdi] CDB: Read(10): 28 00 0e ff ff 18 00 00 08 00
Jan 7 10:45:04 svr440 kernel: end_request: I/O error, dev sdi, sector 251658008
Any ideas?
Clip from /var/log/messages:-
Jan 7 10:45:04 svr440 kernel: Buffer I/O error on device sdi, logical block 31457251
Jan 7 10:45:04 svr440 kernel: Buffer I/O error on device sdi, logical block 31457252
Jan 7 10:45:04 svr440 kernel: sd 9:0:0:1: [sdi] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 7 10:45:04 svr440 kernel: sd 9:0:0:1: [sdi] Sense Key : Illegal Request [current]
Jan 7 10:45:04 svr440 kernel: sd 9:0:0:1: [sdi] <<vendor>> ASC=0x94 ASCQ=0x1ASC=0x94 ASCQ=0x1
Jan 7 10:45:04 svr440 kernel: sd 9:0:0:1: [sdi] CDB: Read(10): 28 00 0e ff ff 18 00 00 08 00
Jan 7 10:45:04 svr440 kernel: end_request: I/O error, dev sdi, sector 251658008
Any ideas?
Have you run check disk on your hard disks to confirm they are not failing or have bad sectors?
Do you have a PERC RAID controller? If so the error indicates a problem with multipathing configuration when trying to read 512KB from the indicated offset. Did something physically get moved, and/or are you running multiple paths?
You need to report more of the log. It does NOT indicate a HDD failure. If you had a HDD read error then the sense key would not be "Illegal request", and ASCQ would not be 0x94. It is as if you're trying to read from a location that doesn't exist, either because the disk is not addressed where the controller expects it to be, or the offset is greater than the capacity of the drive.
P.S. the LAST thing you want to do is a check disk (fsck) as another author suggested. It will likely result in 100% data loss.
You need to report more of the log. It does NOT indicate a HDD failure. If you had a HDD read error then the sense key would not be "Illegal request", and ASCQ would not be 0x94. It is as if you're trying to read from a location that doesn't exist, either because the disk is not addressed where the controller expects it to be, or the offset is greater than the capacity of the drive.
P.S. the LAST thing you want to do is a check disk (fsck) as another author suggested. It will likely result in 100% data loss.
ASKER
Yes, I was a little dubious about running an fsck for now.
Good news that its not a HDD failure.
VM server is running on a SunX3-2 (X86) and connected to an Oracle Storage 2540-M2 storage array.
Im wondering whether something has changed on the set up. Its not been working for a while apparently but I've only just picked this up.
Good news that its not a HDD failure.
VM server is running on a SunX3-2 (X86) and connected to an Oracle Storage 2540-M2 storage array.
Im wondering whether something has changed on the set up. Its not been working for a while apparently but I've only just picked this up.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hmm. Got reply from oracle on this :-
I can see the "Buffer I/O error" are generated by active ghost devices.
===========
$ grep "Buffer I/O error" var/log/messages | sort -k11| awk '{print $11}'|uniq
sdb,
sdd,
sdf,
sdi,
sdk,
sdm,
$ grep ghost sos_commands/devicemapper/ multipath_ -v4_-ll | sort -k2
`- 7:0:0:0 sdb 8:16 active ghost running
`- 7:0:0:2 sdd 8:48 active ghost running
`- 7:0:0:4 sdf 8:80 active ghost running
`- 9:0:0:1 sdi 8:128 active ghost running
`- 9:0:0:3 sdk 8:160 active ghost running
`- 9:0:0:5 sdm 8:192 active ghost running
7:0:0:0 sdb 8:16 -1 undef ghost SUN,LCSM100_F running
7:0:0:2 sdd 8:48 -1 undef ghost SUN,LCSM100_F running
7:0:0:4 sdf 8:80 -1 undef ghost SUN,LCSM100_F running
9:0:0:1 sdi 8:128 -1 undef ghost SUN,LCSM100_F running
9:0:0:3 sdk 8:160 -1 undef ghost SUN,LCSM100_F running
9:0:0:5 sdm 8:192 -1 undef ghost SUN,LCSM100_F running
===========
Which is not harmful so, can be ignored. Refer Doc ID 1464587.1 for more information.
I can see the "Buffer I/O error" are generated by active ghost devices.
===========
$ grep "Buffer I/O error" var/log/messages | sort -k11| awk '{print $11}'|uniq
sdb,
sdd,
sdf,
sdi,
sdk,
sdm,
$ grep ghost sos_commands/devicemapper/
`- 7:0:0:0 sdb 8:16 active ghost running
`- 7:0:0:2 sdd 8:48 active ghost running
`- 7:0:0:4 sdf 8:80 active ghost running
`- 9:0:0:1 sdi 8:128 active ghost running
`- 9:0:0:3 sdk 8:160 active ghost running
`- 9:0:0:5 sdm 8:192 active ghost running
7:0:0:0 sdb 8:16 -1 undef ghost SUN,LCSM100_F running
7:0:0:2 sdd 8:48 -1 undef ghost SUN,LCSM100_F running
7:0:0:4 sdf 8:80 -1 undef ghost SUN,LCSM100_F running
9:0:0:1 sdi 8:128 -1 undef ghost SUN,LCSM100_F running
9:0:0:3 sdk 8:160 -1 undef ghost SUN,LCSM100_F running
9:0:0:5 sdm 8:192 -1 undef ghost SUN,LCSM100_F running
===========
Which is not harmful so, can be ignored. Refer Doc ID 1464587.1 for more information.
ASKER
I've requested that this question be closed as follows:
Accepted answer: 0 points for paul williams's comment #a40539683
for the following reason:
Advice from Oracle support.
Accepted answer: 0 points for paul williams's comment #a40539683
for the following reason:
Advice from Oracle support.
Those messages were not included in the snippet.
So how could we check for that fact?
still the advice you need to review the multipath setup is a valid one.
In this case there were more messages indicating that those were just warnings.
So it would be fair imho that you award some points.
So how could we check for that fact?
still the advice you need to review the multipath setup is a valid one.
In this case there were more messages indicating that those were just warnings.
So it would be fair imho that you award some points.