C efficiency:  large file read, character substitution, file write.

Posted on 2003-02-25
Medium Priority
Last Modified: 2010-04-15
To correct certain characters combinations in a large ASCII file.(Where a backslash is followed by TAB,EOF,EOL chars, we need to insert an additional space between \ and following character)

Current Plan:
Open file for read and file for write
Read char by char - using fgetc()
Look for character combination
Correct it
Write to output file - using putc()

The file writing is the slowest part. Does anyone know the most efficient way?

Need to write a C program to run on VAX/VMS. Alas, sed or awk not available.

Many thanks in advance!
Question by:PaulStevens
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2

Expert Comment

ID: 8017405
Try writing to a temporary file instead of re-writing
your existing file. Once you're done, have the program unlink the old file and rename the temporary.

This would remove all the overhead for re-writing the file
while it is open. (but would take twice the disk space for
a small amount of time)

Accepted Solution

gj62 earned 150 total points
ID: 8017487
I've had to do a large amount of file processing on ASCII files in excess of 1GB.  Here are some tips/techniques I've found useful:

1) Open the file in binary mode (which is the only mode on VAX I believe, and read into memory by chunks of 16K (or multiples of 16K).  This will mean that you have to create a buffer large enough to store your manipulations (additional characeters).  Then write the entire buffer back out.  Use fread() and fwrite().  This will prevent as much disk thrashing as possible.  We usually work with buffers of 64K or so.  This is 90% of the issue...

2) As djacobsen said, write to a temp file (which it sounds like you are already doing).  Once processing is complete, close both files, delete the original file, and rename the temp file...

3) Play with the multiples of 16K - it is OS and system config specific to get the best speed.

Now, I know people will tell me that system cashing should be taking care of most of this, but we've empiracally found that it is not nearly as efficient as the above approach.

Expert Comment

ID: 8017550
A few notes (pseudocode) on the above:

open files (one for reading- fIn, one for writing - fOut)

fread(bufIn, sizeof(char), sizeof(buf), fIn)

for (x==0; x<sizeof(buf); ++x)
  if x == escape sequence
    bufOut[y++] = bufIn[x++];
    bufOut[y++] = ' ';
    bufOut[y++] = bufIn[x];
    bufOut[y++] = bufIn[x];

fwrite(bufOut, sizeof(char), y, fOut);

That should get you most of the way there... Let me know if you have questions...

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.


Expert Comment

ID: 8017567
whoops - in the if statement, it should read

if bufIn[x] == escape sequence...

and not

if x == escape seqeunce

obviously, you may need to test 2 characters to see if it really is an escape sequence you want to trap.  Sounds like you already know how to do that...

Author Comment

ID: 8017670
Not used this site before; will regrade when I've tested the result. Thanks!

Author Comment

ID: 8110918

Although, gj62, this method works, the output files are no longer readable by EDT or TPU, neither do VMS commands like $diff work. The errors are of this form:-

4007392 byte record too large for user's buffer

This is despite newline characters being read and written to the file. (Tests with smaller files and buffer sizes reveal this)

I guess this is a limitation of C running on VMS.
Will use my original method if there is no other workaround.

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Summary: This tutorial covers some basics of pointer, pointer arithmetic and function pointer. What is a pointer: A pointer is a variable which holds an address. This address might be address of another variable/address of devices/address of fu…
This is a short and sweet, but (hopefully) to the point article. There seems to be some fundamental misunderstanding about the function prototype for the "main" function in C and C++, more specifically what type this function should return. I see so…
The goal of this video is to provide viewers with basic examples to understand how to use strings and some functions related to them in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use conditional statements in the C programming language.

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question