Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 461
  • Last Modified:

Algorithm for three-file merger to create fourth merged file.

Looking for an algorithm that merges two different version of text-file from its base file and create the fourth file.
Suppose we have a base file "Base" and suppose this file is modified by two person creating two different version "Version1" and "Version2". Need to create fourth file "Final" that merges the two version of same base. The algorithm should also show the conflicts in the two versions. (This will be similar to way Clear Case merges the file)

Ex:

Base file:

line 1
line 2
line 3

Version1 file:

line 1
line 2 modified Version1
line 3
new line from Version1

Version2 file:

line 1
line 2 modified Version2
new line from Version2
line 3


Final file:

line 1
Report conflict between two line "line 2 modified Version1" and  "line 2 modified Version2"
(No conflict) new line from Version2
line 3
new line from Version1

0
shuklasunil
Asked:
shuklasunil
  • 6
  • 5
  • 3
  • +2
1 Solution
 
sunnycoderCommented:
Hi Sunil,

What you are asking for is a not so simple algorithm ...

In its simplest form, it is possible to read in each file line by line and compare those lines char by char... whenever we see a difference at any position, we reposrt that position and add each of the lines separately to the new file... If all lines appear to be the same, we write it to the new file only once ....

We repeat the process in a loop until atleast one of the file is exhausted.... If others are not, we report difference and continue merging the remaining two...

A simple enhancement to this would be to write code to ignore white spaces...

However, the tricky part would be to judge where did the files become same again ... e.g.

file1 had 5th line
this is a line
file 2 had 5th line
this is not a line
file3 had 5th line
there is a line

Now as is clear here 3 differs from 1 and 2 on location 3 ... however, it is back in sync after 1 and 2 are at location 6 and it is at location 7!!!! If we were using simple char by char comparison, it is impossible to know how and when the things got back to the same !!!

one such utility already exists ... it is called diff command...
you can find the manual for diff here
http://www.gnu.org/manual/diffutils-2.7/html_mono/diff.html

you can download diff from
http://www.gnu.org/software/     --- it has been ported to several platforms

It also a part of almost all *nix distributions

Cheers!
Sunny:o)
0
 
guynumber5764Commented:
GNU's "diff" or "merge" is probably what you want.  If you are stuck on the Dark Side, download the source and port it.

NAME
       merge − three‐way file merge

SYNOPSIS
       merge [ options ] file1 file2 file3

DESCRIPTION
       merge  incorporates  all  changes  that  lead  from file2 to file3 into
       file1.  The result ordinarily goes into file1.   merge  is  useful  for
       combining separate changes to an original.  Suppose file2 is the origi‐
       nal, and both file1 and file3 are modifications of file2.   Then  merge
       combines both changes.
etc.




0
 
mlmccCommented:
Are the existing lines numbered or in some way are they identifiable?
If not then the problem of resyncing the files could be enormous.

File1
Now is the time that tries ....
To be or not
Tis better to have loved

File2
How Now brown cow (actually modified line1)
He hath the most ...
...
...
...
...
Tis better ...

File3
Now some inane statement
...
...
To be or to be
...
...
Now is the time
...
...

How do these compare?
In file 3 are there 6 added lines at the beginning? or was line 1 changed then 2 lines added then line 2 changed then several lines added?

mlmcc
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
bhagyeshtCommented:
use dos command fc and capture the output.
0
 
shuklasunilAuthor Commented:
I am looking for an algorithm that I use in my project.
I am not looking for a tool.
0
 
sunnycoderCommented:
0
 
shuklasunilAuthor Commented:
I have yet not gone through the whole document. But its looks huge and complex, are there simpler version of the algorithm?
0
 
sunnycoderCommented:
what you have given in your problem description fits this algorithm ... Simpler will be inaccurate ... see my first post for simpler algorithms ...
you can get the readymade source code though (under GPL) http://www.gnu.org/software/ 
0
 
guynumber5764Commented:
Unless your problem has signficant constraints that you haven't mentioned,  this is quite a complex problem.  In the real (programming) world, even with the algorithms sunny's mentioned above, human intervention is required to make sure the merger doesn't do something stupid.
0
 
bhagyeshtCommented:
guynumber5764 : you said it. I messed up a couple of days work due to VSS's merge
0
 
shuklasunilAuthor Commented:
I am looking into the logic, will take some time to understand this and will get back ASAP.
0
 
shuklasunilAuthor Commented:
Since the logic presented here is bit complex. Though it is interesting but I will take time to understand this.

Below is the logic that I am planning to implement. I would like have inputs on this and is there an optimum way of doing this?

Background:

I have three text files Base, Version1 and Version2. These files hold definition of tables and forms for an application.

The structure is similar to the following,

Table1
     Field1 char(4)….
….
Table2
     Field1 char(6)
….
….
Form1
    Field1 size(4), position x,y …
….


I have a class definition, which reads this and puts it in a memory structure. The logic I am implementing is the following

DataModel Base; //DataModel is the class that loads the file in the memory
DataModel Version1;
DataModel Version2;
DataModel Final;

/*Loading the respective file in the variables*/

/*Code for using DataModel iterator, traversing to each table*/
while(/*all the table in Version2*/)
{
      /*mark this element, used in next loop*/
      Version1.Find(Version2.Table)
      if (/*exists in version1*/)
            {
            if(/*exists in base*/)
            {
                  /*compare the fields*/
                  /*if conflict, report it and ask user to opt which element to be merged*/
                                   /*compare the attributes*/
                                  /*if conflict, report it and ask user to opt which element to be merged*/
            }
            else
            {
                  /*compare the fields*/
                                      /*if conflict, report it and ask user to opt which element to be merged*/
                                     /*compare the attributes*/
                                     /*if conflict, report it and ask user to opt which element to be merged*/
                          }
                          /*Add it to final*/
      }
      else
                         /*Add it to final*/
}

/*now traverse version1 and compare it to version2*/
while((/*all the table in Version1*/)
{
             Version2.Find(Version1.Table)
             if (Version2.Marked) /*this element is already traversed*/
                  continue;
             if (/*exists in version2*/)
             {
             /*similar to above logic*/
      }
      else
             /*Add it to final*/
}

Please note I have access to source code of class DataModel, if required I can add the functionality to optimize the above code or modify the “Find” function to suit our needs.

Thanks
0
 
sunnycoderCommented:
your algorithm still misses out the major point
* how do you recognize the difference and how do you recognize that the files are now matching after a difference given the fact that the different fields were of different lengths ?

better idea will be to implement a three way merge

>These files hold definition of tables and forms for an application
are the tables expected to change ... i.e. new table added / delation/ renamed etc. etc.
If yes, then you need something of the complexity like the diff algorithm .. If no, the problem gets simplified to fields ...
All you need to to do is compare field names and if they are different, keep reading until at somepoint you have a field name which you have seen in all of them since the point of difference began ...
Next ask the user which ones should be kept ...

this will also make it easier for the user as he wont have to go through the same process twice
0
 
shuklasunilAuthor Commented:
I regret to say that I have  still not gone through this I need some more time. Let me know if I have some other option.
0
 
guynumber5764Commented:
I tuned out for a while there.  It looks like there is considerably more structure to your data than you have revealed to date.  The more structure there is, the easier it is to write a match & merge.  For example, if each data element has a unique key the alorigthm you want becomes trivial.  The opposite case would be free text which is almost impossible to characterize accurately:

To give an example, the following three files do not violate any of your given constraints.  How should they merge and why?

Base:
1000
1000
1000 1
1000 2
1000 3
1001

File1:

1000 1
1000 2
1001
1000
1000
1000 3

File2:

1001
1000
1000
1000 3
1000 2
1000 1
0
 
shuklasunilAuthor Commented:
I am OK with the Administrative comments.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 6
  • 5
  • 3
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now