We are pulling data from a Database and writing out static files on a quad xeon 2.8, 1gb ram, a single 200gb sata raid 5 disk. We have 4 seperate processes running and it current takes 26 hrs. to complete for ~500,000 small (< 20KB) files.
The bottleneck being I/O currently each process speeds 10 -15% WA, with 80% of RAM Free. We would like to speed up this process.
a) Although, breaking the RAID and using seperate HD for each process would be faster. We can't beak the raid drive up.
b) The scripts currently prints the contents as created - so it prints to the file line by line as generated for a single file, then moves to next file.
c) 99% of the filenames (not contents) are reused.
1) Will generating all the content in memory and then writing the file at once speed it up? Best way (besides putting the content in a single variable and printing that variable at the end).
2) Should we make use of truncating files to speed it up? And does "> file" already "truncate" or does it unlink and recreate?
3) Would turning off buffering help or hurt?
4) Any low level routines/modules help?
While we could beanchmark and test each of these, at 26hrs per run - we would rather hear some ideas first!