[Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

ORA-01115: IO error reading block from file; ORA-27091: skgfqio: unable to queue I/O

Posted on 2006-05-25
6
Medium Priority
?
20,801 Views
Last Modified: 2011-09-20
We have started experiencing this error in the last few weeks and are trying to identify the cause.  We opened a ticket with Oracle and one with HP (SAN)  I  from the web servers and found this ORA-01115 error was logged by the web application 6 times yesterday between 2:49 and 3:10 PM.  We're not sure if it's both nodes or just one.  The last time it had been logged previously was 12/22/05.  Four of those times, it was the same file that our batch process had a problem with -- PARTIES_DATA01.dbf.  On the other 2 occasions, it was PARTIES_DATA02.dbf and LOCATION_DATA01.dbf.  

HP says the SAN is healthy, Windows hasn't logged any issues, and Oracle says it isn't their problem if there is no block corruption.  If anyone has a suggestion for diagnosing this, I'm all ears.  We have tested the datafiles for corruption with RMAN and haven't found anything. The error is hopping around to different datafiles, redolog files and blocks. HP says it's monitoring doesn't detect anything and says Oracle ought to wait longer and is suggesting setting the SCSI timeout value from 60 to 90.  If we had experienced a SCSI timeout it would be in the event log.  There are no errors in the event logs.

Other servers connected to the SAN aren't experiencing any problems, but none of them are under the same kind of load.  This is our prod transactional db.

I'm not sure how to check to see what the maximum wait time for access to the primarily identified datafile is.  Statspack doesn't show anything unusual over the period the errors occurred and I'm aware that you can't identify or troubleshoot a discrete error with aggregate data.  The disks are in RAID 10 configuration (SAME).

The only identifiable node that the error occurs on is Node 2 because our batch process only connects to that. The other connections are load balanced so we're not sure whether or not its occurring on one or both nodes.

Where do I go from here?  All the FC switch log analysis is done by HP.  I don't think we have access to any logs on the SAN Appliance or a reader to decode them.  I have a hand tied behind my back with HP.  Is there some sort of diagnostic program I could run?   Do I need to enable some sort of logging?

Here are examples:

ORA-01115: IO error reading block from file 18 (block # 182340)
ORA-01110: data file 18: 'O:\ORADATA\NEDSSPC\PARTIES_DATA01.DBF'
ORA-27091: skgfqio: unable to queue I/O
Fri May 12 23:48:59 2006
Errors in file c:\oracle\admin\nedsspc\bdump\nedsspc2_arc0_3836.trc:
ORA-00333: redo log read error block 135169 count 2048
ORA-00312: online log 6 thread 2: 'O:\ORADATA\NEDSSPC\REDO06.LOG'
ORA-27091: skgfqio: unable to queue I/O
ARC0: Completed archiving log 6 thread 2 sequence 18565
Fri May 12 23:49:04 2006

Sat May 13 23:49:10 2006
Errors in file c:\oracle\admin\nedsspc\bdump\nedsspc2_arc0_3836.trc:
ORA-00333: redo log read error block 129025 count 2048
ORA-00312: online log 7 thread 2: 'O:\ORADATA\NEDSSPC\REDO07.LOG'
ORA-27091: skgfqio: unable to queue I/O
ARC0: Completed archiving log 7 thread 2 sequence 18675

Drive information
~~~~~~~~~~~~~~~~~
redologs at: O:\ORADATA\NEDSSPC\

Drive O:
Description Local Fixed Disk
Compressed No
File System OraCFS
Size 99.99 GB (107,364,544,512 bytes)
Free Space 16.62 GB (17,850,949,632 bytes)

0
Comment
Question by:DonFreeman
5 Comments
 
LVL 7

Expert Comment

by:vishal68
ID: 16766613
Hi

We had faces same problems when we had moved to RAC on windows about a year and half back. We were on 9iR2 at that time. Faced a number of problems before we managed to convince the management to move to a Unix box. Since then we have moved to Unix.

As for your problem, we were facing ORA-27091: skgfqio: unable to queue I/O on mainly redo log files. Oracle Support was also onsite for help. We finally moved the redo logs on a completely separate set of disks and that resolved the problem.

HTH
Vishal

0
 
LVL 1

Author Comment

by:DonFreeman
ID: 16789562
Hmmmm.....It seemed obvious to us that as a last resort we could start moving files around to see what would happen.  We have datafiles involved as well.  I'm not too sure exactly how we're going to ensure that we get everything moved around to fix the problem.  And, this type of thing is exactly what they promised we wouldn't have to do.  Storage management was supposed to become completely seamless.....
0
 
LVL 5

Accepted Solution

by:
Netminder earned 0 total points
ID: 17118335
Closed, 500 points refunded.
Netminder
Site Admin
0
 

Expert Comment

by:fran_aro
ID: 25579620
Hi Guys,
Where is the Solution for these issue...?
It is seems not professionals.....
very very Bad...

Thanks
Francis
0
 
LVL 1

Author Comment

by:DonFreeman
ID: 25579939
Wow, it's been a long time ago.  We finally decided we were overloading our SAN despite what the vendor and everybody said.  We removed non-production storage from the SAN and the problem went away.  I'm not sure about the configuration of the luns.  All our storage was striped and mirrored so I thought the probability that that any individual disk was being resourced by more than one instance at a time was pretty high.  The error is simply saying the queue is full and the disk is not available for writing.  The only thing that could make that happen is something else is using it.  

I hope my memory and reasoning is correct and it is helpful to you.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Note: this article covers simple compression. Oracle introduced in version 11g release 2 a new feature called Advanced Compression which is not covered here. General principle of Oracle compression Oracle compression is a way of reducing the d…
This post first appeared at Oracleinaction  (http://oracleinaction.com/undo-and-redo-in-oracle/)by Anju Garg (Myself). I  will demonstrate that undo for DML’s is stored both in undo tablespace and online redo logs. Then, we will analyze the reaso…
This video shows information on the Oracle Data Dictionary, starting with the Oracle documentation, explaining the different types of Data Dictionary views available by group and permissions as well as giving examples on how to retrieve data from th…
Via a live example, show how to take different types of Oracle backups using RMAN.
Suggested Courses

868 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question