ORA-01115: IO error reading block from file; ORA-27091: skgfqio: unable to queue I/O

Posted on 2006-05-25
Last Modified: 2011-09-20
We have started experiencing this error in the last few weeks and are trying to identify the cause.  We opened a ticket with Oracle and one with HP (SAN)  I  from the web servers and found this ORA-01115 error was logged by the web application 6 times yesterday between 2:49 and 3:10 PM.  We're not sure if it's both nodes or just one.  The last time it had been logged previously was 12/22/05.  Four of those times, it was the same file that our batch process had a problem with -- PARTIES_DATA01.dbf.  On the other 2 occasions, it was PARTIES_DATA02.dbf and LOCATION_DATA01.dbf.  

HP says the SAN is healthy, Windows hasn't logged any issues, and Oracle says it isn't their problem if there is no block corruption.  If anyone has a suggestion for diagnosing this, I'm all ears.  We have tested the datafiles for corruption with RMAN and haven't found anything. The error is hopping around to different datafiles, redolog files and blocks. HP says it's monitoring doesn't detect anything and says Oracle ought to wait longer and is suggesting setting the SCSI timeout value from 60 to 90.  If we had experienced a SCSI timeout it would be in the event log.  There are no errors in the event logs.

Other servers connected to the SAN aren't experiencing any problems, but none of them are under the same kind of load.  This is our prod transactional db.

I'm not sure how to check to see what the maximum wait time for access to the primarily identified datafile is.  Statspack doesn't show anything unusual over the period the errors occurred and I'm aware that you can't identify or troubleshoot a discrete error with aggregate data.  The disks are in RAID 10 configuration (SAME).

The only identifiable node that the error occurs on is Node 2 because our batch process only connects to that. The other connections are load balanced so we're not sure whether or not its occurring on one or both nodes.

Where do I go from here?  All the FC switch log analysis is done by HP.  I don't think we have access to any logs on the SAN Appliance or a reader to decode them.  I have a hand tied behind my back with HP.  Is there some sort of diagnostic program I could run?   Do I need to enable some sort of logging?

Here are examples:

ORA-01115: IO error reading block from file 18 (block # 182340)
ORA-01110: data file 18: 'O:\ORADATA\NEDSSPC\PARTIES_DATA01.DBF'
ORA-27091: skgfqio: unable to queue I/O
Fri May 12 23:48:59 2006
Errors in file c:\oracle\admin\nedsspc\bdump\nedsspc2_arc0_3836.trc:
ORA-00333: redo log read error block 135169 count 2048
ORA-00312: online log 6 thread 2: 'O:\ORADATA\NEDSSPC\REDO06.LOG'
ORA-27091: skgfqio: unable to queue I/O
ARC0: Completed archiving log 6 thread 2 sequence 18565
Fri May 12 23:49:04 2006

Sat May 13 23:49:10 2006
Errors in file c:\oracle\admin\nedsspc\bdump\nedsspc2_arc0_3836.trc:
ORA-00333: redo log read error block 129025 count 2048
ORA-00312: online log 7 thread 2: 'O:\ORADATA\NEDSSPC\REDO07.LOG'
ORA-27091: skgfqio: unable to queue I/O
ARC0: Completed archiving log 7 thread 2 sequence 18675

Drive information
redologs at: O:\ORADATA\NEDSSPC\

Drive O:
Description Local Fixed Disk
Compressed No
File System OraCFS
Size 99.99 GB (107,364,544,512 bytes)
Free Space 16.62 GB (17,850,949,632 bytes)

Question by:DonFreeman
    LVL 7

    Expert Comment


    We had faces same problems when we had moved to RAC on windows about a year and half back. We were on 9iR2 at that time. Faced a number of problems before we managed to convince the management to move to a Unix box. Since then we have moved to Unix.

    As for your problem, we were facing ORA-27091: skgfqio: unable to queue I/O on mainly redo log files. Oracle Support was also onsite for help. We finally moved the redo logs on a completely separate set of disks and that resolved the problem.


    LVL 1

    Author Comment

    Hmmmm.....It seemed obvious to us that as a last resort we could start moving files around to see what would happen.  We have datafiles involved as well.  I'm not too sure exactly how we're going to ensure that we get everything moved around to fix the problem.  And, this type of thing is exactly what they promised we wouldn't have to do.  Storage management was supposed to become completely seamless.....
    LVL 5

    Accepted Solution

    Closed, 500 points refunded.
    Site Admin

    Expert Comment

    Hi Guys,
    Where is the Solution for these issue...?
    It is seems not professionals.....
    very very Bad...

    LVL 1

    Author Comment

    Wow, it's been a long time ago.  We finally decided we were overloading our SAN despite what the vendor and everybody said.  We removed non-production storage from the SAN and the problem went away.  I'm not sure about the configuration of the luns.  All our storage was striped and mirrored so I thought the probability that that any individual disk was being resourced by more than one instance at a time was pretty high.  The error is simply saying the queue is full and the disk is not available for writing.  The only thing that could make that happen is something else is using it.  

    I hope my memory and reasoning is correct and it is helpful to you.

    Featured Post

    Looking for New Ways to Advertise?

    Engage with tech pros in our community with native advertising, as a Vendor Expert, and more.

    Join & Write a Comment

    Suggested Solutions

    Note: this article covers simple compression. Oracle introduced in version 11g release 2 a new feature called Advanced Compression which is not covered here. General principle of Oracle compression Oracle compression is a way of reducing the d…
    How to Unravel a Tricky Query Introduction If you browse through the Oracle zones or any of the other database-related zones you'll come across some complicated solutions and sometimes you'll just have to wonder how anyone came up with them.  …
    This video explains at a high level with the mandatory Oracle Memory processes are as well as touching on some of the more common optional ones.
    Via a live example show how to connect to RMAN, make basic configuration settings changes and then take a backup of a demo database

    754 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    21 Experts available now in Live!

    Get 1:1 Help Now