[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 450
  • Last Modified:

Comparing textfiles.

Hi all,
I've made an application that compares two similar textfiles line for line, then write an output of both the lines if they are different;


aaaaaaaaaaaaaa       <->        aaaaaaaaaaaaaa
bbbbbbbbbbbbbb       <->        bbbbbbbbbbbbbb
cccccccccccccccc       <->        cccccccccccccccx   -> ccccccccccccccccc
                                                                            ccccccccccccccccx
.....
.....
zzzzzzzzzzzzzzzz        <->        zzzzzzzzzzzzzzzz


The thing is that this won't work if somewhere in one of the files a line has been added, in which case I need only this line to be written, and then the process resynchronized on the next similar line. There can be several instances of added lines after eachother, in numerous instances of the file, but mainly the two files will contain large chunks where the lines are equal. How do I solve this problem?

A.




0
Aqueath
Asked:
Aqueath
  • 4
  • 2
  • 2
  • +3
1 Solution
 
Wim ten BrinkCommented:
This is not as simple as you think. Besides, there's already a tool that's doing this which is called WinDiff. There are probably some sources on the Internet somewhere that show you where to start.

And how would you handle this:

File 1                  File 2
AAAAAA                  AAAAAA
BBBBBB                  CCCCCC
CCCCCC                  BBBBBB
DDDDDD                  DDDDDD

Now, you have two lines that are exchanged... :-)
0
 
BlackTigerXCommented:
it's quite a chalenge

one approach you could take is make one of your files the "master"

then you would go checking the lines of file 1 (master) in file2, if there is a mismatch, you keep searching in file2 for the same line in file1, if you don't find it, you advance a line in file1 and so on...

so in the example that Workshop_Alex showed, it would go like this:

AAAA AAAA == same - both "pointers" advance
BBBB CCCC == diff - stay in same line in file 1, advance in file 2
BBBB BBBB == same - both pinters advance
CCCC DDDD == diff - stay in same line in file 1, advance in file 2

...that's just an idea, haven't tried it... there's many things that could be done... maybe even using multiple threads
0
 
BlackTigerXCommented:
the ideal would probably be that if you find a difference, you start looking in the "other" file at the same time, whichever finds a match in the "other" file stops first and consider that a match (and so both pointers keep advancing from there)

so, the same concept of my previous answer, except that is applied to both files, when there is a difference:
- you have a pointer static in file 1 looking at the next lines of file 2
- you also have a static pointer in file 2, looking at the next lines of file 1
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
AqueathAuthor Commented:
Hi again,
Thanks for the comments so far, and it seems that this was a bigger problem than I first expected. So I've raised the reward ;-)

-Workshop_Alex:
Aren't there components that could do this? I've downloaded something called TDiff, which compares two files, and lists the difference in two separate windows. The thing is it merges the two files, deletes some lines, and I'm too stupid to figure out a way to integrate it into my program. What I would be looking for is a component that is for instance able to compare two StringLists or Listboxes, and expand one or both in the same way that graphical tools do this. Ie if an extra line has been inserted into StringList one there would be a blank one in Stringlist two etc. Then I could compare each line in the StringLists(now containing the same number of lines) with eachother. Something like ;

StrLstDiff(StringList1,StringList2);

Maybe this can be done with WinDiff or the component I downloaded:
TDiff
http://www.users.on.net/johnson/delphi/


-BlackTigerX
I started out down the same path you described above prior to posting my initial message, but it turned out to be quite messy, and I probably did not find a good solution as it turned out to be pretty slow.



So I figured that someone has probably done this thing before, and made a component for this sort of thing that makes this easy. I mean, why reinvent the wheel? ;-)


Anyone know a component that could do this, and know how I could implement it?

A.
0
 
BlackTigerXCommented:
I started writing it, is almost done... but if you want a component, here you go

http://www.users.on.net/johnson/delphi/
0
 
Wim ten BrinkCommented:
There are probably many components that will compare files. It's just that the algorithm isn't easy. You might also want to strip whitespaces and ignore empty lines, or have some other special fuunctionality.

There could be an interesting trick by using a hash table. You calculate a hash over each line in your file and store it in an array. Then you do the same with every line in the second file and check if it's in the hash table or not. Now, the lines in the second file that can be found in the hash table are duplicates. If you do it too in opposite direction too, creating a hash table for the secoond file and check every line of the first file with this hash table, you will know exactly which lines exist in both files. This would allow you to show all lines that don't exist in both files.
Now, about those lines that do exist in both files, all you would have to do next is check if they exist in the same order.

But it's one of the more complex algorithms and while I'm working on a hashtable right now, I just don't have the time to write a solution for this. Am curious of BlackTigerX will provide a good solution, though. :-)
0
 
gwalkeriqCommented:
winmerge http://winmerge.sourceforge.net/ is an open source version that is IMO better than Windiff. Full soure, written in C.

Simpler, but more powerful diff source is, http://ftp.gnu.org/pub/gnu/diffutils/ is a place where you find the GNU diff source

If you want to understand the algorithms, try here -- http://c2.com/cgi/wiki?DiffAlgorithm

If you want a commercial delphi component, try http://www.brothersoft.com/Software_Developer_Delphi_Pass_Diff_Pro_24080.html

Ad-hoc solutions are not likely to be very fast or robust -- check the papers referenced in the 3rd link if you are serious about learning how to roll you own diff.


0
 
geobulCommented:
Another interesting solution: http://www.ascotti.org/programming/lcs/lcs.htm with source.
0
 
BlackTigerXCommented:
well... here's my shot at it... I stopped writing yesterday since this guy wants a component, but finished this morning... just because...

hasn't been tested thoughtfully, but is a good start... it uses the concept that I explained, and which is also kinda the same concept WorkShop_Alex has, only without using hash tables...

I wrote a function that takes 2 sources (F1 and F2), it outputs the lines that are equal and the ones that are different (missing from either file)

procedure FindDifferences(F1, F2:TStrings; Diff:TStrings);
var
  P1, P2:Integer; //"pointers", or current row for each file
  tmpP1, tmpP2:Integer; //temp "pointers" to search the other file
  Diffs1, Diffs2:TStringList;
  Match1, Match2:Boolean;
begin
  Diff.Clear;
  Diffs1:=TStringList.Create;
  Diffs2:=TStringList.Create;
  try
    if (F1.Count=0) or (F2.Count=0) then
      Exit; //nothing to do, one of the files is empty
    P1:=0;
    P2:=0;
    repeat
      if (P1>=F1.Count) or (P2>=F2.Count) then
        Break; //we're done
      if (F1[P1]=F2[P2]) then //there's a match, increment both pointers and go to next line
      begin
        Diff.Add(F1[P1]+ ' == '+F2[P2]);
        Inc(P1);
        Inc(P2);
        Continue;
      end
      else //the fun begins here, there's a mismatch
      begin
        //Diff.Add(F1[P1]+' <-> '+F2[P2]); //"print" the difference
        tmpP1:=P1+1; //go to the next line in file 1
        tmpP2:=P2+1; //go to the next line in file 2
        Diffs1.Clear;
        Diffs2.Clear;
        Diffs1.Add('>>'+F1[P1]);
        Diffs2.Add('<<'+F2[P2]);
        Match1:=False;
        Match2:=False;
        repeat
          if (tmpP1>=F1.Count) or (tmpP2>=F2.Count) then
            Break; //one of the files reached the end, exit
          if (F1[P1]=F2[tmpP2]) or (F2[P2]=F1[tmpP1]) then
          begin
            if (F1[P1]=F2[tmpP2]) then
            begin
              Match1:=True;
              Diff.AddStrings(Diffs2)  //add the lines that are missing from the other file
            end
            else
            begin
              Match2:=True;
              Diff.AddStrings(Diffs1);
            end;
            Break; //we found a match, of the same line in file 1, in another line of file 2
          end;
          Diffs1.Add('>>'+F1[tmpP1]); //missing from file 2
          Diffs2.Add('<<'+F2[tmpP2]); //missing from file 1
          Inc(tmpP1);
          Inc(tmpP2)
        until (False);
        //*** only advance in the other file
        if (Match1) then
          P2:=tmpP2
        else if (Match2) then
          P1:=tmpP1;
        if (Match1) or (Match2) then
          Diff.Add(F1[P1]+ ' == '+F2[P2])
        else
          Diff.Add(F1[P1]+ ' <-> '+F2[P2]);
        //we found a match, increment both pointers
        Inc(P1);
        Inc(P2)
      end
    until (False)
  finally
    Diffs1.Free;
    Diffs2.Free
  end
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
  Memo1.Lines.LoadFromFile('test.txt');
  Memo2.Lines.LoadFromFile('test2.txt');
  FindDifferences(Memo1.Lines, Memo2.Lines, Memo3.Lines);
end;

I used 2 memos to test this, load 2 text files in the first 2, and output to the third

best regards
0
 
aikimarkCommented:
take a look at http://DLSuperC.com

it is very fast and powerful and has several variants that can do comparisons of directory trees and comparisons of files in non-line chunks (words, bytes, etc.)
0
 
AqueathAuthor Commented:
Hi All,
The points go to BlackTigerX for a solution that deliveres just what I needed. A big thanks to you, and all that participated.

A.
0

Featured Post

Vote for the Most Valuable Expert

It’s time to recognize experts that go above and beyond with helpful solutions and engagement on site. Choose from the top experts in the Hall of Fame or on the right rail of your favorite topic page. Look for the blue “Nominate” button on their profile to vote.

  • 4
  • 2
  • 2
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now