Comparing textfiles.

Posted on 2004-11-18
Last Modified: 2010-04-05
Hi all,
I've made an application that compares two similar textfiles line for line, then write an output of both the lines if they are different;

aaaaaaaaaaaaaa       <->        aaaaaaaaaaaaaa
bbbbbbbbbbbbbb       <->        bbbbbbbbbbbbbb
cccccccccccccccc       <->        cccccccccccccccx   -> ccccccccccccccccc
zzzzzzzzzzzzzzzz        <->        zzzzzzzzzzzzzzzz

The thing is that this won't work if somewhere in one of the files a line has been added, in which case I need only this line to be written, and then the process resynchronized on the next similar line. There can be several instances of added lines after eachother, in numerous instances of the file, but mainly the two files will contain large chunks where the lines are equal. How do I solve this problem?


Question by:Aqueath
    LVL 17

    Expert Comment

    by:Wim ten Brink
    This is not as simple as you think. Besides, there's already a tool that's doing this which is called WinDiff. There are probably some sources on the Internet somewhere that show you where to start.

    And how would you handle this:

    File 1                  File 2
    AAAAAA                  AAAAAA
    BBBBBB                  CCCCCC
    CCCCCC                  BBBBBB
    DDDDDD                  DDDDDD

    Now, you have two lines that are exchanged... :-)
    LVL 13

    Expert Comment

    it's quite a chalenge

    one approach you could take is make one of your files the "master"

    then you would go checking the lines of file 1 (master) in file2, if there is a mismatch, you keep searching in file2 for the same line in file1, if you don't find it, you advance a line in file1 and so on...

    so in the example that Workshop_Alex showed, it would go like this:

    AAAA AAAA == same - both "pointers" advance
    BBBB CCCC == diff - stay in same line in file 1, advance in file 2
    BBBB BBBB == same - both pinters advance
    CCCC DDDD == diff - stay in same line in file 1, advance in file 2

    ...that's just an idea, haven't tried it... there's many things that could be done... maybe even using multiple threads
    LVL 13

    Expert Comment

    the ideal would probably be that if you find a difference, you start looking in the "other" file at the same time, whichever finds a match in the "other" file stops first and consider that a match (and so both pointers keep advancing from there)

    so, the same concept of my previous answer, except that is applied to both files, when there is a difference:
    - you have a pointer static in file 1 looking at the next lines of file 2
    - you also have a static pointer in file 2, looking at the next lines of file 1
    LVL 2

    Author Comment

    Hi again,
    Thanks for the comments so far, and it seems that this was a bigger problem than I first expected. So I've raised the reward ;-)

    Aren't there components that could do this? I've downloaded something called TDiff, which compares two files, and lists the difference in two separate windows. The thing is it merges the two files, deletes some lines, and I'm too stupid to figure out a way to integrate it into my program. What I would be looking for is a component that is for instance able to compare two StringLists or Listboxes, and expand one or both in the same way that graphical tools do this. Ie if an extra line has been inserted into StringList one there would be a blank one in Stringlist two etc. Then I could compare each line in the StringLists(now containing the same number of lines) with eachother. Something like ;


    Maybe this can be done with WinDiff or the component I downloaded:

    I started out down the same path you described above prior to posting my initial message, but it turned out to be quite messy, and I probably did not find a good solution as it turned out to be pretty slow.

    So I figured that someone has probably done this thing before, and made a component for this sort of thing that makes this easy. I mean, why reinvent the wheel? ;-)

    Anyone know a component that could do this, and know how I could implement it?

    LVL 13

    Expert Comment

    I started writing it, is almost done... but if you want a component, here you go
    LVL 17

    Expert Comment

    by:Wim ten Brink
    There are probably many components that will compare files. It's just that the algorithm isn't easy. You might also want to strip whitespaces and ignore empty lines, or have some other special fuunctionality.

    There could be an interesting trick by using a hash table. You calculate a hash over each line in your file and store it in an array. Then you do the same with every line in the second file and check if it's in the hash table or not. Now, the lines in the second file that can be found in the hash table are duplicates. If you do it too in opposite direction too, creating a hash table for the secoond file and check every line of the first file with this hash table, you will know exactly which lines exist in both files. This would allow you to show all lines that don't exist in both files.
    Now, about those lines that do exist in both files, all you would have to do next is check if they exist in the same order.

    But it's one of the more complex algorithms and while I'm working on a hashtable right now, I just don't have the time to write a solution for this. Am curious of BlackTigerX will provide a good solution, though. :-)
    LVL 6

    Expert Comment

    winmerge is an open source version that is IMO better than Windiff. Full soure, written in C.

    Simpler, but more powerful diff source is, is a place where you find the GNU diff source

    If you want to understand the algorithms, try here --

    If you want a commercial delphi component, try

    Ad-hoc solutions are not likely to be very fast or robust -- check the papers referenced in the 3rd link if you are serious about learning how to roll you own diff.

    LVL 17

    Expert Comment

    Another interesting solution: with source.
    LVL 13

    Accepted Solution

    well... here's my shot at it... I stopped writing yesterday since this guy wants a component, but finished this morning... just because...

    hasn't been tested thoughtfully, but is a good start... it uses the concept that I explained, and which is also kinda the same concept WorkShop_Alex has, only without using hash tables...

    I wrote a function that takes 2 sources (F1 and F2), it outputs the lines that are equal and the ones that are different (missing from either file)

    procedure FindDifferences(F1, F2:TStrings; Diff:TStrings);
      P1, P2:Integer; //"pointers", or current row for each file
      tmpP1, tmpP2:Integer; //temp "pointers" to search the other file
      Diffs1, Diffs2:TStringList;
      Match1, Match2:Boolean;
        if (F1.Count=0) or (F2.Count=0) then
          Exit; //nothing to do, one of the files is empty
          if (P1>=F1.Count) or (P2>=F2.Count) then
            Break; //we're done
          if (F1[P1]=F2[P2]) then //there's a match, increment both pointers and go to next line
            Diff.Add(F1[P1]+ ' == '+F2[P2]);
          else //the fun begins here, there's a mismatch
            //Diff.Add(F1[P1]+' <-> '+F2[P2]); //"print" the difference
            tmpP1:=P1+1; //go to the next line in file 1
            tmpP2:=P2+1; //go to the next line in file 2
              if (tmpP1>=F1.Count) or (tmpP2>=F2.Count) then
                Break; //one of the files reached the end, exit
              if (F1[P1]=F2[tmpP2]) or (F2[P2]=F1[tmpP1]) then
                if (F1[P1]=F2[tmpP2]) then
                  Diff.AddStrings(Diffs2)  //add the lines that are missing from the other file
                Break; //we found a match, of the same line in file 1, in another line of file 2
              Diffs1.Add('>>'+F1[tmpP1]); //missing from file 2
              Diffs2.Add('<<'+F2[tmpP2]); //missing from file 1
            until (False);
            //*** only advance in the other file
            if (Match1) then
            else if (Match2) then
            if (Match1) or (Match2) then
              Diff.Add(F1[P1]+ ' == '+F2[P2])
              Diff.Add(F1[P1]+ ' <-> '+F2[P2]);
            //we found a match, increment both pointers
        until (False)

    procedure TForm1.Button1Click(Sender: TObject);
      FindDifferences(Memo1.Lines, Memo2.Lines, Memo3.Lines);

    I used 2 memos to test this, load 2 text files in the first 2, and output to the third

    best regards
    LVL 44

    Expert Comment

    take a look at

    it is very fast and powerful and has several variants that can do comparisons of directory trees and comparisons of files in non-line chunks (words, bytes, etc.)
    LVL 2

    Author Comment

    Hi All,
    The points go to BlackTigerX for a solution that deliveres just what I needed. A big thanks to you, and all that participated.


    Featured Post

    Enabling OSINT in Activity Based Intelligence

    Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

    Join & Write a Comment

    This article explains how to create forms/units independent of other forms/units object names in a delphi project. Have you ever created a form for user input in a Delphi project and then had the need to have that same form in a other Delphi proj…
    In this tutorial I will show you how to use the Windows Speech API in Delphi. I will only cover basic functions such as text to speech and controlling the speed of the speech. SAPI Installation First you need to install the SAPI type library, th…
    Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.
    Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…

    728 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    20 Experts available now in Live!

    Get 1:1 Help Now