Increase performance when writing to a file

Posted on 2006-06-07
Last Modified: 2008-02-01

I'm creating a component (C++) which intends to simulate non-volatile RAM (32k) by writing information to a file on disk (data is flushed to disk after each write operation).
I notice that each write to this NVRAM takes about (max.) 94ms. However the application which uses this component frequently writes to the NVRAM (f.e. after scanning an article it writes 10 settings to NVRAM by calling 10 times the write-method of the component). So for each action by the application, I notice an extra delay of approx. 1s (or more) which is intolerable for the application.
Could you provide me some advice on increasing the performance when using writing to a file?

Current way of working in write method of component :
* read content of file
* write new content to file
* flush data to disk (takes most of the time)

Is multithreading an option and why?
Overlapped I/O?

BTW. Using other hardware (harddisk) is no option.
BTW. The application may not be changed.

Question by:jlsjls
    LVL 14

    Expert Comment

    Couple of ideas:

    - Keep the file in memory the whole time, and write it as needed. That way you save the cost of the read every time through. Write the entire file into a buffer in memory, and then write the entire buffer out at once, this way you are doing only one write operation.

    - What file i/o functions are you using? If you are using iostreams from the standard library, switch to c library functions such as fopen, fwrite, etc., they are much faster. You can fiddle with buffer sizes using setvbuf() to maximize your performance. Or even better switch to native win32 API calls if you are programming on Windows.

    - Switch to using a memory-mapped file, this will probably get you the best performance of all. This lets you open the file and treat it as a memory array, to write to the file you simply change that memory location.
    LVL 3

    Expert Comment

    Do you really need to flush the data to the disk every time? If you are concerned that their might be a power or system failure at any moment and need to have the most up to date data then you will have to do this, which will cause a delay whilst the physcial storage medium writes the data. This will be regardless of whether you have a conventional file or memory mapped file.

    The comments by wayside are correct and will improve performance, but the only real way to dramatically improve matters is to not call the flush function so often.
    LVL 3

    Author Comment

    The purpose of the component is to simulate a Non-volatile RAM. So it must be able to cope with power/system failures at any moment (most important goal).
    So I agree with you about flushing to disk (accessing slow media) for each write request is the only solution.
    I notice for a complete write cylce (max. 94ms) :
    1/3 time -> reading + writing
    2/3 time -> flushing

    Maybe by using memory-mapped files (my file is only 32k in size) I can improve the read/write operation a bit.

    MSDN states that asynchronous I/O for relatively fast I/O would be avoided :
    "In situations where an I/O request is expected to take a large amount of time, such as a refresh or backup of a large database, asynchronous I/O is generally a good way to optimize processing efficiency.
    However, for relatively fast I/O operations, the overhead of processing kernel I/O requests and kernel signals may make asynchronous I/O less beneficial, particularly if many fast I/O operations need to be made.In this case, synchronous I/O would be better."

    LVL 17

    Expert Comment

    It sounds like you don't want buffered I/O, because you want to commit changes all the time. Use open (UN*X) / CreateFile (Windows) rather than fopen. Load the image into RAM. Update, seek and write modified parts of the image only.
    LVL 3

    Author Comment

    I'm using Windows API to create (with FILE_FLAG_NO_BUFFERING), read and write to file.
    The file contains plain text data.
    LVL 22

    Expert Comment

    What I'd do is copy the data to a local array and set a timer for say 100 msec.

    If you get called before the timer fires, just copy the new data to the array.

    If the timer fires, then you can write to the file and close it.  You've saved a bunch of writes and flushes.

    No need to call flush(), most systems will do so in a second or so.

    In other words, cache the data in memory until the flurry of updates subsides, THEN write the whole mess to disk.

    A few gotchas though:

    (1)  There's no way to fully simulate NVRAM.  If the power fails during the disk write, the disk block might get half-written, which means next time you go to read it it will be  unreadable.  Much better idea:  write to a different file each time, say NV1 thru NV5.  That way if one file goes bad you can go to the previous one.

    (2)  The power might fail while writing the directory.  That's REALLY bad news.

    (2)  Calling Flush() isnt a secure way to ensure anything.  Modern file systems have so many layers of buffering (in the app, in the OS, in the disk cache, in the disk controller, in the disk drive), that callng flush() from the app is like the president shouting "private Jones, go to bed!" and expecting the order to be carried out.

    LVL 17

    Expert Comment

    > The file contains plain text data.

    If that means your entire 32K is liable to be altered with each write I guess you are stuck with having to write all 32K with each update. If, however, you are able to get away with writing no more than a few disk sectors each time, you could SetFilePointer to the relevant sector offset and write only the changed sectors with FILE_FLAG_NO_BUFFERING. The file should be a fixed size for this approach and after its initial creation, which means there should be no worries about trashing the directory entry on a power failure, but take grg99's advice on this point I'm not sure of my ground.
    LVL 22

    Expert Comment

    >be no worries about trashing the directory entry on a power failure,

    On most OS's, every time you open and close the file the last access time gets updated in the directory.

    And I should have mentioned, there are some really clever file systems, specially designed so that a bad directory write does no major harm.  You can lose the last file changes, but at least the previous file contents are readable.  This isnt true for the FAT file systems.  Probably not true for NTFS either but I'm not 100% sure.  You need one of those file systems with "logging" on the name.

    LVL 3

    Author Comment

    After carefully reading of MSDN documentation, I've decided to use the CreateFile-method with attribute FILE_FLAG_NO_BUFFERING and no longer use the method 'FlushFileBuffers' which leads to the delays. In that way, it's possible that the metadata of the file isn't flushed to disk (MSDN) on power failure but that's the least of my concerns.

    Accepted Solution

    PAQed with points refunded (125)

    Community Support Moderator

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    How your wiki can always stay up-to-date

    Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
    - Increase transparency
    - Onboard new hires faster
    - Access from mobile/offline

    This article will show you some of the more useful Standard Template Library (STL) algorithms through the use of working examples.  You will learn about how these algorithms fit into the STL architecture, how they work with STL containers, and why t…
    C++ Properties One feature missing from standard C++ that you will find in many other Object Oriented Programming languages is something called a Property (…
    The goal of the video will be to teach the user the difference and consequence of passing data by value vs passing data by reference in C++. An example of passing data by value as well as an example of passing data by reference will be be given. Bot…
    The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.

    759 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    12 Experts available now in Live!

    Get 1:1 Help Now