Best (fastest) way to find duplicate entries in a text file.
Posted on 2008-10-21
I'm parsing a text file that could potentially have tens of thousands of lines (perhaps more!) and I want to check for duplicate lines and if there is a duplicate line replace it with a pointer to the original line. I've tried two ways of doing this. a) using nested for loops to iterate over all the lines after they've been stored in a CStringArray, and b) storing the entire file in one huge CString and using a combination of find and replace while breaking it into a CStringArray. Method b seems to work faster but I'm wondering if there might be an even faster way of accomplishing this task? Any thoughts and suggestions?
Thanks in advance!