Solved

Algorithm for three-file merger to create fourth merged file.

Posted on 2003-11-15
21
413 Views
Last Modified: 2010-04-17
Looking for an algorithm that merges two different version of text-file from its base file and create the fourth file.
Suppose we have a base file "Base" and suppose this file is modified by two person creating two different version "Version1" and "Version2". Need to create fourth file "Final" that merges the two version of same base. The algorithm should also show the conflicts in the two versions. (This will be similar to way Clear Case merges the file)

Ex:

Base file:

line 1
line 2
line 3

Version1 file:

line 1
line 2 modified Version1
line 3
new line from Version1

Version2 file:

line 1
line 2 modified Version2
new line from Version2
line 3


Final file:

line 1
Report conflict between two line "line 2 modified Version1" and  "line 2 modified Version2"
(No conflict) new line from Version2
line 3
new line from Version1

0
Comment
Question by:shuklasunil
  • 6
  • 5
  • 3
  • +2
21 Comments
 
LVL 45

Expert Comment

by:sunnycoder
ID: 9757850
Hi Sunil,

What you are asking for is a not so simple algorithm ...

In its simplest form, it is possible to read in each file line by line and compare those lines char by char... whenever we see a difference at any position, we reposrt that position and add each of the lines separately to the new file... If all lines appear to be the same, we write it to the new file only once ....

We repeat the process in a loop until atleast one of the file is exhausted.... If others are not, we report difference and continue merging the remaining two...

A simple enhancement to this would be to write code to ignore white spaces...

However, the tricky part would be to judge where did the files become same again ... e.g.

file1 had 5th line
this is a line
file 2 had 5th line
this is not a line
file3 had 5th line
there is a line

Now as is clear here 3 differs from 1 and 2 on location 3 ... however, it is back in sync after 1 and 2 are at location 6 and it is at location 7!!!! If we were using simple char by char comparison, it is impossible to know how and when the things got back to the same !!!

one such utility already exists ... it is called diff command...
you can find the manual for diff here
http://www.gnu.org/manual/diffutils-2.7/html_mono/diff.html

you can download diff from
http://www.gnu.org/software/     --- it has been ported to several platforms

It also a part of almost all *nix distributions

Cheers!
Sunny:o)
0
 
LVL 3

Expert Comment

by:guynumber5764
ID: 9757859
GNU's "diff" or "merge" is probably what you want.  If you are stuck on the Dark Side, download the source and port it.

NAME
       merge − three‐way file merge

SYNOPSIS
       merge [ options ] file1 file2 file3

DESCRIPTION
       merge  incorporates  all  changes  that  lead  from file2 to file3 into
       file1.  The result ordinarily goes into file1.   merge  is  useful  for
       combining separate changes to an original.  Suppose file2 is the origi‐
       nal, and both file1 and file3 are modifications of file2.   Then  merge
       combines both changes.
etc.




0
 
LVL 100

Expert Comment

by:mlmcc
ID: 9758881
Are the existing lines numbered or in some way are they identifiable?
If not then the problem of resyncing the files could be enormous.

File1
Now is the time that tries ....
To be or not
Tis better to have loved

File2
How Now brown cow (actually modified line1)
He hath the most ...
...
...
...
...
Tis better ...

File3
Now some inane statement
...
...
To be or to be
...
...
Now is the time
...
...

How do these compare?
In file 3 are there 6 added lines at the beginning? or was line 1 changed then 2 lines added then line 2 changed then several lines added?

mlmcc
0
 
LVL 9

Expert Comment

by:bhagyesht
ID: 9761711
use dos command fc and capture the output.
0
 

Author Comment

by:shuklasunil
ID: 9762533
I am looking for an algorithm that I use in my project.
I am not looking for a tool.
0
 
LVL 45

Accepted Solution

by:
sunnycoder earned 500 total points
ID: 9762631
0
 
LVL 45

Expert Comment

by:sunnycoder
ID: 9762651
0
 

Author Comment

by:shuklasunil
ID: 9763206
I have yet not gone through the whole document. But its looks huge and complex, are there simpler version of the algorithm?
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 45

Expert Comment

by:sunnycoder
ID: 9763227
what you have given in your problem description fits this algorithm ... Simpler will be inaccurate ... see my first post for simpler algorithms ...
you can get the readymade source code though (under GPL) http://www.gnu.org/software/
0
 
LVL 3

Expert Comment

by:guynumber5764
ID: 9766757
Unless your problem has signficant constraints that you haven't mentioned,  this is quite a complex problem.  In the real (programming) world, even with the algorithms sunny's mentioned above, human intervention is required to make sure the merger doesn't do something stupid.
0
 
LVL 9

Expert Comment

by:bhagyesht
ID: 9768972
guynumber5764 : you said it. I messed up a couple of days work due to VSS's merge
0
 

Author Comment

by:shuklasunil
ID: 9802892
I am looking into the logic, will take some time to understand this and will get back ASAP.
0
 

Author Comment

by:shuklasunil
ID: 9804991
Since the logic presented here is bit complex. Though it is interesting but I will take time to understand this.

Below is the logic that I am planning to implement. I would like have inputs on this and is there an optimum way of doing this?

Background:

I have three text files Base, Version1 and Version2. These files hold definition of tables and forms for an application.

The structure is similar to the following,

Table1
     Field1 char(4)….
….
Table2
     Field1 char(6)
….
….
Form1
    Field1 size(4), position x,y …
….


I have a class definition, which reads this and puts it in a memory structure. The logic I am implementing is the following

DataModel Base; //DataModel is the class that loads the file in the memory
DataModel Version1;
DataModel Version2;
DataModel Final;

/*Loading the respective file in the variables*/

/*Code for using DataModel iterator, traversing to each table*/
while(/*all the table in Version2*/)
{
      /*mark this element, used in next loop*/
      Version1.Find(Version2.Table)
      if (/*exists in version1*/)
            {
            if(/*exists in base*/)
            {
                  /*compare the fields*/
                  /*if conflict, report it and ask user to opt which element to be merged*/
                                   /*compare the attributes*/
                                  /*if conflict, report it and ask user to opt which element to be merged*/
            }
            else
            {
                  /*compare the fields*/
                                      /*if conflict, report it and ask user to opt which element to be merged*/
                                     /*compare the attributes*/
                                     /*if conflict, report it and ask user to opt which element to be merged*/
                          }
                          /*Add it to final*/
      }
      else
                         /*Add it to final*/
}

/*now traverse version1 and compare it to version2*/
while((/*all the table in Version1*/)
{
             Version2.Find(Version1.Table)
             if (Version2.Marked) /*this element is already traversed*/
                  continue;
             if (/*exists in version2*/)
             {
             /*similar to above logic*/
      }
      else
             /*Add it to final*/
}

Please note I have access to source code of class DataModel, if required I can add the functionality to optimize the above code or modify the “Find” function to suit our needs.

Thanks
0
 
LVL 45

Expert Comment

by:sunnycoder
ID: 9816219
your algorithm still misses out the major point
* how do you recognize the difference and how do you recognize that the files are now matching after a difference given the fact that the different fields were of different lengths ?

better idea will be to implement a three way merge

>These files hold definition of tables and forms for an application
are the tables expected to change ... i.e. new table added / delation/ renamed etc. etc.
If yes, then you need something of the complexity like the diff algorithm .. If no, the problem gets simplified to fields ...
All you need to to do is compare field names and if they are different, keep reading until at somepoint you have a field name which you have seen in all of them since the point of difference began ...
Next ask the user which ones should be kept ...

this will also make it easier for the user as he wont have to go through the same process twice
0
 

Author Comment

by:shuklasunil
ID: 10410084
I regret to say that I have  still not gone through this I need some more time. Let me know if I have some other option.
0
 
LVL 3

Expert Comment

by:guynumber5764
ID: 10634570
I tuned out for a while there.  It looks like there is considerably more structure to your data than you have revealed to date.  The more structure there is, the easier it is to write a match & merge.  For example, if each data element has a unique key the alorigthm you want becomes trivial.  The opposite case would be free text which is almost impossible to characterize accurately:

To give an example, the following three files do not violate any of your given constraints.  How should they merge and why?

Base:
1000
1000
1000 1
1000 2
1000 3
1001

File1:

1000 1
1000 2
1001
1000
1000
1000 3

File2:

1001
1000
1000
1000 3
1000 2
1000 1
0
 

Author Comment

by:shuklasunil
ID: 10984395
I am OK with the Administrative comments.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Batch file output 20 78
Specific format 21 145
mapBully challenge 6 93
Hide vba in gp 7 49
This article will show, step by step, how to integrate R code into a R Sweave document
Whether you've completed a degree in computer sciences or you're a self-taught programmer, writing your first lines of code in the real world is always a challenge. Here are some of the most common pitfalls for new programmers.
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now