Link to home
Start Free TrialLog in
Avatar of bovlk
bovlk

asked on

Line diff algorithm detecting changes, insertion, deletion and transposition

Hello,

suppose I have two source codes like the ones below and I want to compare them. The line numbers just define the sequence and will not be considered parts of the lines. The basic block to compare is line; I'm not interested in differences at the level of single chars, just consider every line a single value.

I want an algorithm that will be able to detect insertion, deletions and updates. So in case 1, it will detect that line 10 was removed, line 35 was inserted and line 40 changed from 'SECOND' to '2nd'. I made such an algorithm using forward lookup. It compares the lines one-by-one and when it finds a difference, it performs forward lookup to detect if any lines were inserted or deleted. If it finds that this line does not exist in the other code, but the next lines are equal, it considers the line updated. The lookup block is one line.

This works fine with the exception of swapping two lines as in case 2. In this case, it detects line 10/60 as not changed and lines 20-60 as deleted in the old code and inserted in the new code, which looks pretty weird. I would need to detect that the codes are similar with the exception of a single change in sequence of lines. The transposition can be detected as insertion+deletion, but I need to retain the ability to detect updates of a line as  updates, not insertion+deletion.

Does anyone know an algorithm that can do this? I will do this in C# in .NET 3.5. Thanks in advance
Case 1:
10 REM My ingenious program to add two numbers
20 PRINT 'ENTER THE FIRST NUMBER'
30 INPUT A
40 PRINT 'ENTER THE SECOND NUMBER'
50 INPUT B
60 PRINT A+B
===================
20 PRINT 'ENTER THE FIRST NUMBER'
30 INPUT A
35 REM Let's read the second number
40 PRINT 'ENTER THE 2nd NUMBER'
50 INPUT B
60 PRINT A+B
===================
===================
===================
Case 2:
 
10 REM My ingenious program to add two numbers
20 PRINT 'ENTER THE FIRST NUMBER'
30 INPUT A
40 PRINT 'ENTER THE SECOND NUMBER'
50 INPUT B
60 PRINT A+B
===================
20 PRINT 'ENTER THE FIRST NUMBER'
30 INPUT A
40 PRINT 'ENTER THE SECOND NUMBER'
50 INPUT B
60 PRINT A+B
70 REM My ingenious program to add two numbers

Open in new window

SOLUTION
Avatar of BartoszJarema
BartoszJarema
Flag of Poland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of bovlk
bovlk

ASKER

PS: The ability to detect updates must be added to the CodeProject library but it's quite simple once you understand that snakes have a head made of changes and a tail made of matches and Snake.XMid, YMid are the first points of the tail. The you can see that a particular snake has a X insertions and Y deletions and can think of it as min(X,Y) updates + the rest of insertion/deletions.