• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1842
  • Last Modified:

Nawk outputing the error: "nawk: FILENAME makes too many open files"

Hi All,

For some background this question has the following PAQ's leading up to this point.
This question isn't so much about how to refine the awk script, rather than to determine how best to work with large files.

The file I am testing gets broken into 30+ files, but breaks at the 20 mark. If you're interested in the input in order to answer the question, please review the following PAQ's.

http://www.experts-exchange.com/Programming/Programming_Platforms/Unix_Programming/Q_21763311.html
http://www.experts-exchange.com/Programming/Programming_Platforms/Unix_Programming/Q_21764891.html
And most recently:
http://www.experts-exchange.com/Programming/Programming_Platforms/Unix_Programming/Q_21854851.html

----------------------------------------------------------------------

SCRIPT EXPLANATION:
===============

The current nawk is formatted:

=================================================
nawk -F| -v Footer= BEGIN{f="/dev/null"; oldfile = f}
                /^\|\|/{
                        c++
                        f=sprintf("%s%s'$DateTime'%03d.DHR",$3,$4,c)
                        print Footer >> oldfile
                        oldfile = f
                       }

                       { print >> f }
                END { print Footer >> oldfile }
                FILENAME
=================================================

Where the files being created look like:

AU402679060529.1643001.DHR
AU414381060529.1643002.DHR
AU420252060529.1643003.DHR
AU423247060529.1643004.DHR
<etc....>

Or general format:
XX123456YYMMDD.HHMM???.DHR
Where:
XX => Comes from the file delimiter
123456 => Comes from the delimiter
YYMMDD.HHMM => Comes from a variable $DateTime

In addition:
* the variable $Footer contains some text to be inserted at the end of each file.
* The delimiter is || where example is ||XX|123456|blah|blah|blah|blah
* /dev/null is used to ensure all text in input file before the first "||" is ignored

ERROR MESSAGE AS IT APPEAR:
=====================

This currently appears upon execution of a large input file (from ksh -x output):

=================================================
+ nawk -F| -v Footer= BEGIN{f="/dev/null"; oldfile = f}
                /^\|\|/{
                        c++
                        f=sprintf("%s%s060529.1639%03d.DHR",$3,$4,c)
                        print Footer >> oldfile
                        oldfile = f
                       }

                       {print >> f }
                END { print Footer >> oldfile}

                /directory/filename
nawk: AU451922060529.1639020.DHR makes too many open files
 input record number 7676, file /directory/filename
 source line number 9
=================================================

QUESTIONS:

Q: How can this error be avoided?
Q: Is there a more efficient way to deal with large files?


Thanks in advance,

Glenn
0
glennstewart
Asked:
glennstewart
  • 4
  • 3
  • 2
2 Solutions
 
glennstewartAuthor Commented:
Btw - if it isn't already obvious.... ??? is the c++ counter.
It was made to be 3 digits to ensure at least up to 999 files.
Not even getting to 21 files was a surprise.
0
 
ahoffmannCommented:
> Q: How can this error be avoided?
use in your END{} block
   close(oldfile);

BTW, I'm wondering about your posted script, where I miss a proper quoting
0
 
brettmjohnsonCommented:
If you are using the files sequentially [you don't need all the files opened at once],
then close each oldfile before you start a new one:
http://www.gnu.org/software/gawk/manual/html_node/Close-Files-And-Pipes.html

If you do need all the files open simultaneously, then you need to raise the limit
on number of open files per process.  Check out the 'limit' and 'ulimit' commands.
The following command returns the limits of various resources:
% limit
cputime         unlimited
filesize        unlimited
datasize        6144 kbytes
stacksize       8192 kbytes
coredumpsize    0 kbytes
memoryuse       unlimited
descriptors     256
memorylocked    unlimited
maxproc         100

The following command changes the limit on file descriptors: [note that this may
be 'files' on your flavor of unix]
% limit descriptors 1024

Query it back to see the raised limit on file descriptors:
% limit descriptors
descriptors     1024
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
ahoffmannCommented:
brettmjohnson, it's an nawk error message, not one of the shell or OS
I guess if you set the limit to 10, you get the shell's message instead of the nawk message.
0
 
brettmjohnsonCommented:
Excerpts from the documentation link I gave you earlier:

"Similarly, when a file or pipe is opened for output, the file name or command associated with it is remembered by awk and subsequent writes to the same file or command are appended to the previous writes. The file or pipe stays open until awk exits."

"To write numerous files, successively, in the same awk program. If you don't close the files, eventually you may exceed a system limit on the number of open files in one process. So close each one when you are finished writing it."


awk is generating the message because it failed to get another file handle from the system.  So you must either close the files if you no longer need them or raise the number of file handles available to the process before running the awk program.
0
 
brettmjohnsonCommented:
> brettmjohnson, it's an nawk error message, not one of the shell or OS

Oh and the limit is set by the OS and imposed upon awk, it has nothing to do with the shell.


0
 
glennstewartAuthor Commented:
>BTW, I'm wondering about your posted script, where I miss a proper quoting

This is cut and paste is specifically (and working):

typeset Footer=$(egrep ^ED $ParamFile | tail -1 | cut -c4-)
typeset DateTime=$(date +%y%m%d.%H%M)

    nawk -F'|' -v Footer=$Footer 'BEGIN{f="/dev/null"; oldfile = f}
                /^\|\|/{
                        c++
                        f=sprintf("%s%s'$DateTime'%03d.DHR",$3,$4,c)
                        print Footer >> oldfile
                        oldfile = f
                       }

                       { print >> f }
                       END { print Footer >> oldfile }

               ' $CAL_REPTDIR/$SplitFile.OLD

You'd be missing that last =>'<= quote.

---

At present there is only one input file and many output files.

Q: Where would be the most suitable place for a close?
My assumption would be that I would like the oldfile closed immediately after a "Footer" is directed into it.
0
 
glennstewartAuthor Commented:
Closing as per my assumption doesn't help. I'm probably closing incorrectly:

+ nawk -F| -v Footer= BEGIN{f="/dev/null"; oldfile = f}
                /^\|\|/{
                        c++
                        f=sprintf("%s%s060530.1026%03d.DHR",$3,$4,c)
                        print Footer >> oldfile
                        { close("oldfile") }
                        oldfile = f
                       }

                       { print >> f }
                END { print Footer >> oldfile }{ close("oldfile") }

                /calypso/reports/VOUCHERS.SYD.OLD
nawk: AU451922060530.1026020.DHR makes too many open files
 input record number 7676, file /calypso/reports/VOUCHERS.SYD.OLD
 source line number 10
0
 
glennstewartAuthor Commented:
Changed a little bit....

    nawk -F'|' -v Footer=$Footer 'BEGIN{f="/dev/null"; oldfile = f}
                /^\|\|/{
                        c++
                        f=sprintf("%s%s'$DateTime'%03d.DHR",$3,$4,c)
                        print Footer >> oldfile
                        { close(oldfile) }
                        oldfile = f
                       }

                       { print >> f }
                END { print Footer >> oldfile }{ close(oldfile) }

               ' $CAL_REPTDIR/$SplitFile.OLD


This worked at least to a tested 110 files :)

Case closed :)
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 4
  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now