bovlk
asked on
Line diff algorithm detecting changes, insertion, deletion and transposition
Hello,
suppose I have two source codes like the ones below and I want to compare them. The line numbers just define the sequence and will not be considered parts of the lines. The basic block to compare is line; I'm not interested in differences at the level of single chars, just consider every line a single value.
I want an algorithm that will be able to detect insertion, deletions and updates. So in case 1, it will detect that line 10 was removed, line 35 was inserted and line 40 changed from 'SECOND' to '2nd'. I made such an algorithm using forward lookup. It compares the lines one-by-one and when it finds a difference, it performs forward lookup to detect if any lines were inserted or deleted. If it finds that this line does not exist in the other code, but the next lines are equal, it considers the line updated. The lookup block is one line.
This works fine with the exception of swapping two lines as in case 2. In this case, it detects line 10/60 as not changed and lines 20-60 as deleted in the old code and inserted in the new code, which looks pretty weird. I would need to detect that the codes are similar with the exception of a single change in sequence of lines. The transposition can be detected as insertion+deletion, but I need to retain the ability to detect updates of a line as updates, not insertion+deletion.
Does anyone know an algorithm that can do this? I will do this in C# in .NET 3.5. Thanks in advance
suppose I have two source codes like the ones below and I want to compare them. The line numbers just define the sequence and will not be considered parts of the lines. The basic block to compare is line; I'm not interested in differences at the level of single chars, just consider every line a single value.
I want an algorithm that will be able to detect insertion, deletions and updates. So in case 1, it will detect that line 10 was removed, line 35 was inserted and line 40 changed from 'SECOND' to '2nd'. I made such an algorithm using forward lookup. It compares the lines one-by-one and when it finds a difference, it performs forward lookup to detect if any lines were inserted or deleted. If it finds that this line does not exist in the other code, but the next lines are equal, it considers the line updated. The lookup block is one line.
This works fine with the exception of swapping two lines as in case 2. In this case, it detects line 10/60 as not changed and lines 20-60 as deleted in the old code and inserted in the new code, which looks pretty weird. I would need to detect that the codes are similar with the exception of a single change in sequence of lines. The transposition can be detected as insertion+deletion, but I need to retain the ability to detect updates of a line as updates, not insertion+deletion.
Does anyone know an algorithm that can do this? I will do this in C# in .NET 3.5. Thanks in advance
Case 1:
10 REM My ingenious program to add two numbers
20 PRINT 'ENTER THE FIRST NUMBER'
30 INPUT A
40 PRINT 'ENTER THE SECOND NUMBER'
50 INPUT B
60 PRINT A+B
===================
20 PRINT 'ENTER THE FIRST NUMBER'
30 INPUT A
35 REM Let's read the second number
40 PRINT 'ENTER THE 2nd NUMBER'
50 INPUT B
60 PRINT A+B
===================
===================
===================
Case 2:
10 REM My ingenious program to add two numbers
20 PRINT 'ENTER THE FIRST NUMBER'
30 INPUT A
40 PRINT 'ENTER THE SECOND NUMBER'
50 INPUT B
60 PRINT A+B
===================
20 PRINT 'ENTER THE FIRST NUMBER'
30 INPUT A
40 PRINT 'ENTER THE SECOND NUMBER'
50 INPUT B
60 PRINT A+B
70 REM My ingenious program to add two numbers
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER