Solved

# Algorithm for three-file merger to create fourth merged file.

Posted on 2003-11-15
445 Views
Looking for an algorithm that merges two different version of text-file from its base file and create the fourth file.
Suppose we have a base file "Base" and suppose this file is modified by two person creating two different version "Version1" and "Version2". Need to create fourth file "Final" that merges the two version of same base. The algorithm should also show the conflicts in the two versions. (This will be similar to way Clear Case merges the file)

Ex:

Base file:

line 1
line 2
line 3

Version1 file:

line 1
line 2 modified Version1
line 3
new line from Version1

Version2 file:

line 1
line 2 modified Version2
new line from Version2
line 3

Final file:

line 1
Report conflict between two line "line 2 modified Version1" and  "line 2 modified Version2"
(No conflict) new line from Version2
line 3
new line from Version1

0
Question by:shuklasunil
[X]
###### Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

• Help others & share knowledge
• Earn cash & points
• 6
• 5
• 3
• +2

LVL 45

Expert Comment

ID: 9757850
Hi Sunil,

What you are asking for is a not so simple algorithm ...

In its simplest form, it is possible to read in each file line by line and compare those lines char by char... whenever we see a difference at any position, we reposrt that position and add each of the lines separately to the new file... If all lines appear to be the same, we write it to the new file only once ....

We repeat the process in a loop until atleast one of the file is exhausted.... If others are not, we report difference and continue merging the remaining two...

A simple enhancement to this would be to write code to ignore white spaces...

However, the tricky part would be to judge where did the files become same again ... e.g.

this is a line
this is not a line
there is a line

Now as is clear here 3 differs from 1 and 2 on location 3 ... however, it is back in sync after 1 and 2 are at location 6 and it is at location 7!!!! If we were using simple char by char comparison, it is impossible to know how and when the things got back to the same !!!

one such utility already exists ... it is called diff command...
you can find the manual for diff here
http://www.gnu.org/manual/diffutils-2.7/html_mono/diff.html

http://www.gnu.org/software/     --- it has been ported to several platforms

It also a part of almost all *nix distributions

Cheers!
Sunny:o)
0

LVL 3

Expert Comment

ID: 9757859
GNU's "diff" or "merge" is probably what you want.  If you are stuck on the Dark Side, download the source and port it.

NAME
merge &#8722; three&#8208;way file merge

SYNOPSIS
merge [ options ] file1 file2 file3

DESCRIPTION
merge  incorporates  all  changes  that  lead  from file2 to file3 into
file1.  The result ordinarily goes into file1.   merge  is  useful  for
combining separate changes to an original.  Suppose file2 is the origi&#8208;
nal, and both file1 and file3 are modifications of file2.   Then  merge
combines both changes.
etc.

0

LVL 101

Expert Comment

ID: 9758881
Are the existing lines numbered or in some way are they identifiable?
If not then the problem of resyncing the files could be enormous.

File1
Now is the time that tries ....
To be or not
Tis better to have loved

File2
How Now brown cow (actually modified line1)
He hath the most ...
...
...
...
...
Tis better ...

File3
Now some inane statement
...
...
To be or to be
...
...
Now is the time
...
...

How do these compare?
In file 3 are there 6 added lines at the beginning? or was line 1 changed then 2 lines added then line 2 changed then several lines added?

mlmcc
0

LVL 9

Expert Comment

ID: 9761711
use dos command fc and capture the output.
0

Author Comment

ID: 9762533
I am looking for an algorithm that I use in my project.
I am not looking for a tool.
0

LVL 45

Accepted Solution

sunnycoder earned 500 total points
ID: 9762631
0

LVL 45

Expert Comment

ID: 9762651
0

Author Comment

ID: 9763206
I have yet not gone through the whole document. But its looks huge and complex, are there simpler version of the algorithm?
0

LVL 45

Expert Comment

ID: 9763227
what you have given in your problem description fits this algorithm ... Simpler will be inaccurate ... see my first post for simpler algorithms ...
0

LVL 3

Expert Comment

ID: 9766757
Unless your problem has signficant constraints that you haven't mentioned,  this is quite a complex problem.  In the real (programming) world, even with the algorithms sunny's mentioned above, human intervention is required to make sure the merger doesn't do something stupid.
0

LVL 9

Expert Comment

ID: 9768972
guynumber5764 : you said it. I messed up a couple of days work due to VSS's merge
0

Author Comment

ID: 9802892
I am looking into the logic, will take some time to understand this and will get back ASAP.
0

Author Comment

ID: 9804991
Since the logic presented here is bit complex. Though it is interesting but I will take time to understand this.

Below is the logic that I am planning to implement. I would like have inputs on this and is there an optimum way of doing this?

Background:

I have three text files Base, Version1 and Version2. These files hold definition of tables and forms for an application.

The structure is similar to the following,

Table1
Field1 char(4)&#8230;.
&#8230;.
Table2
Field1 char(6)
&#8230;.
&#8230;.
Form1
Field1 size(4), position x,y &#8230;
&#8230;.

I have a class definition, which reads this and puts it in a memory structure. The logic I am implementing is the following

DataModel Base; //DataModel is the class that loads the file in the memory
DataModel Version1;
DataModel Version2;
DataModel Final;

/*Code for using DataModel iterator, traversing to each table*/
while(/*all the table in Version2*/)
{
/*mark this element, used in next loop*/
Version1.Find(Version2.Table)
if (/*exists in version1*/)
{
if(/*exists in base*/)
{
/*compare the fields*/
/*if conflict, report it and ask user to opt which element to be merged*/
/*compare the attributes*/
/*if conflict, report it and ask user to opt which element to be merged*/
}
else
{
/*compare the fields*/
/*if conflict, report it and ask user to opt which element to be merged*/
/*compare the attributes*/
/*if conflict, report it and ask user to opt which element to be merged*/
}
}
else
}

/*now traverse version1 and compare it to version2*/
while((/*all the table in Version1*/)
{
Version2.Find(Version1.Table)
if (Version2.Marked) /*this element is already traversed*/
continue;
if (/*exists in version2*/)
{
/*similar to above logic*/
}
else
}

Please note I have access to source code of class DataModel, if required I can add the functionality to optimize the above code or modify the &#8220;Find&#8221; function to suit our needs.

Thanks
0

LVL 45

Expert Comment

ID: 9816219
your algorithm still misses out the major point
* how do you recognize the difference and how do you recognize that the files are now matching after a difference given the fact that the different fields were of different lengths ?

better idea will be to implement a three way merge

>These files hold definition of tables and forms for an application
are the tables expected to change ... i.e. new table added / delation/ renamed etc. etc.
If yes, then you need something of the complexity like the diff algorithm .. If no, the problem gets simplified to fields ...
All you need to to do is compare field names and if they are different, keep reading until at somepoint you have a field name which you have seen in all of them since the point of difference began ...
Next ask the user which ones should be kept ...

this will also make it easier for the user as he wont have to go through the same process twice
0

Author Comment

ID: 10410084
I regret to say that I have  still not gone through this I need some more time. Let me know if I have some other option.
0

LVL 3

Expert Comment

ID: 10634570
I tuned out for a while there.  It looks like there is considerably more structure to your data than you have revealed to date.  The more structure there is, the easier it is to write a match & merge.  For example, if each data element has a unique key the alorigthm you want becomes trivial.  The opposite case would be free text which is almost impossible to characterize accurately:

To give an example, the following three files do not violate any of your given constraints.  How should they merge and why?

Base:
1000
1000
1000 1
1000 2
1000 3
1001

File1:

1000 1
1000 2
1001
1000
1000
1000 3

File2:

1001
1000
1000
1000 3
1000 2
1000 1
0

Author Comment

ID: 10984395
0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Entering a date in Microsoft Access can be tricky. A typo can cause month and day to be shuffled, entering the day only causes an error, as does entering, say, day 31 in June. This article shows how an inputmask supported by code can help the user aâ€¦
In this post we will learn different types of Android Layout and some basics of an Android App.
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable â€¦
With the power of JIRA, there's an unlimited number of ways you can customize it, use it and benefit from it. With that in mind, there's bound to be things that I wasn't able to cover in this course. With this summary we'll look at some places to goâ€¦
###### Suggested Courses
Course of the Month6 days, 22 hours left to enroll