Hi All,
I'm not sure if this is a bug or not but it is intriguing.
I have a tool (my code) which searches for duplicate records within a file and between one file and an archive of up to 10 files.
Each file is between 80,000 and 150,000 records long and between 25mb and 50mb in size.
Duplicate checking is only required for the first 90 bytes of the record.
To avoid sorting the files I generate an index file for each data file containing the absolute offset into the data file for each record in sorted order.
The internal duplicate (look for duplicates within the file) check consistently takes just a few seconds. The external check (against archived files) takes around the same time when there are less than 5 archives to process but when 6 or more files are to be processed performance drops to a crawl!!!
Careful use of ftime has narrowed the time waster down to fread.
When running at full speed, fread takes an undectable amount of time to run with the occasional 15ms every 2000-4000 calls, probably due to cacheing by the OS.
When running like a dog, fread can consistently take 15ms every other invocation and sometimes take 30ms or 45ms.
What could make this happen? How can I avoid it?
Process Monitor reports CPU usage at around 90% when running at full speed and around 2% when running slow. No other processes show a significant amount of cpu usage but the PC runs very slowly when the checker is running slow.
I get the same kind of results when running on a quad-core 2tb-raid server as on a dual-core desktop.
Any ideas?
Start Free Trial