Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 319
  • Last Modified:

WITH PURE VB CODE: Compare two similar strings (aStr & bStr); make a string that holds the differences (cStr); Use cStr to modify aStr and reproduce bStr

I need the folllowing functionality in pure VB code:

Imagine with me if you will that you have two strings:

         String One (First Draft):
         "I am a silly big green monster"

         String Two (Second Draft):
         "I am a big blue monster"

Okay, now I need two functions like this:

          Private Function GetChrDiff(OldString As String, NewString As String) As String
                    'Returns a string that is only the difference between
                    'OldString and NewString
          End Function
          Private Function ApplyChrDiff(OldString As String, DiffString As String) As String
                    'Takes DiffString and applies it to OldString and
                    'the result should be the same as the NewString
                    'that I gave to GetCharDiff
          End Function
Don't worry about doing a CheckSum, it doesn't matter
if there is an error, I will deal with that later.


Let me put it into a real-world usage example:

Let's say I'm writing a novel, and it is time for me to
hand my publisher the first draft. So I burn the first
draft to a CD, it is 35 MB.

Then, I start on my second draft. I make several changes
to the original, but I decide that I'm still not happy
with the novel. Suddenly I realize that I only have one
floppy disk and no more CD's.

The only way I will have room on the floppy disk to save
my second draft is if I only save the differences between
the two files.

So, using my handy dandy VB program I wrote, I make a text
file that only contains the difference between the first
and second drafts.

Later, when I am on my death bed, I decide that I liked
my second draft better -- but all I have is the original
on CD and the floppy disk with the Difference File on it.

So I open up my handy dandy VB program I wrote and apply
the Difference File on the floppy to the 35 MB file on
the CD and BAM! There is my second draft!


Stupid story, I know. But hopefully it describes what I
need to do.

OH! And remember -- there will be some parts I take out
of the first draft, some parts I replace, and some parts
that I add. So theoretically, the second draft could be
shorter or longer than the original draft.

You can keep the code simple. You don't have to open a file or
save a file in your example. Just deal with the string variables.
2 Solutions
ScribbleMeatAuthor Commented:

I have another similar question that is still open. If you answer this question,
you get the 500 points from the other one as well.

Yay! 1,000 points for you!

-- ScribbleMeat
Is this a data synchronisation question, where you want to send changes of data from one place to another without having to send the whole file?
ScribbleMeatAuthor Commented:
Yes, that could be one implementation.

For instance, there is a 50MB file on a server. You make a few small
changes to the file, why send the whole 50MB again?

But there could be other applications too. For instance, you could use it for
incremental backups to a file. Or you could make one base file and have
many versions of that base read-only file so that you can have unlimited

See what I mean?
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

In order to generate the differences you would need to keep a copy of the data that was last sent. Each time you generate a new disk you save the copy.

When you save the differences you would need so save an address for each difference.  This could be done using paragraphs.

So I would start by defing the format of the difference list.

For example:

change type, para number, from position, to position, [new data]

Where the change type could be C=change or D = delete.

ScribbleMeatAuthor Commented:
What About Add? There could be more OR less data in the new file.
Need to know a bit more about the format of th file.  Is the data continuos text, binary, or is it realy like a novel with paragraphs etc?
ScribbleMeatAuthor Commented:
Well, let's assume the following:

1) StringA is an Ascii string.

2) StringB is an Ascii string as well.

3) StringC -- the difference string -- can be binary or Ascii, whatever is most efficient.

4) We aren't working with files, just strings held in variables.

Does that help?
Some questions:

1) Are there any known seperators within the string example CR/LF?

2) What is the likley length of each string
what do you REALLY mean by the differences?  Character by character?  Word by Word?

for instance, what I a getting at is this:

Consider these two sentences:

         "I am a silly big green monster"

         "I am a big blue monster"

what would you want to see as the 'character by character difference?  How is a 's' (from silly in Sentecnce 1) different from a 'b' (from big in sentence 2)?  How about the fact that sentece 2 has FEWER characters than sentence 1?

Similarly, how is the WORD 'big' different from the WORD 'silly'?  What would you want to have in the 'differences' file, so that somehow you could transform the original text into the new version?  That will be  non-trivial undertaking.

How would you want to indicate that Sentence 2 has REMOVED the word 'silly' and replaced the word 'green' with the word 'blue', but only in that one sentence.  What would be the case if ALL occurences of 'green' (anywhere in the original text) should be replaced with the word 'blue'?

Shiju SasidharanAssoc Project ManagerCommented:
hi ScribbleMeat

'Add reference
'     Microsoft VBScript Regular Expressions
'Place three Text boxes and one command button on ur form
Private Sub Form_Load()
  Text1.Text = "This is Old Text"
  Text2.Text = "This is New Modified Text"
  Text3.Locked = True
  Command1.Caption = "Get New Text"
End Sub
Private Sub Command1_Click()
    MsgBox ApplyChrDiff(Text1.Text, Text3.Text)
End Sub
Private Sub Text1_Change()
    Text3.Text = GetChrDiff(Text1.Text, Text2.Text)
End Sub

Private Sub Text2_Change()
    Text3.Text = GetChrDiff(Text1.Text, Text2.Text)
End Sub
Private Function GetChrDiff(OldString As String, NewString As String) As String
Dim lDiffLen As Long
Dim lCount As Long
Dim iAscDiff As Integer
Dim sTail As String
    If OldString = "" Then
        GetChrDiff = ">><<" & NewString & "@@@0"
        Exit Function
    End If
    If Len(NewString) > Len(OldString) Then
        lDiffLen = Len(NewString) - Len(OldString)
        sTail = Right(NewString, lDiffLen)
        lDiffLen = Len(OldString) - Len(NewString)
    End If
    lDiffLen = IIf(Len(OldString) < Len(NewString), Len(OldString), Len(NewString))
    For lCount = 1 To lDiffLen
       iAscDiff = Asc(Mid(NewString, lCount, 1)) - Asc(Mid(OldString, lCount, 1))
       If iAscDiff = 0 Then GoTo Nxt:
       GetChrDiff = GetChrDiff & Chr(Abs(iAscDiff)) & IIf(iAscDiff < 0, 1, 0) & lCount & "|"
    Next lCount
    GetChrDiff = GetChrDiff & ">><<" & sTail & "@@@" & lDiffLen
End Function

Private Function ApplyChrDiff(OldString As String, DiffString As String) As String
Dim objRegExp As New RegExp
Dim objMatchCol As MatchCollection
Dim sNewString As String
Dim sTail As String
Dim objMatch As Match
Dim lCount As Long
Dim lLength As Long
Dim lIndex As Long
    With objRegExp
        .IgnoreCase = True
        .Global = True
        .Pattern = "(([\x00-\xFF])(\d)(\d+)\|)*>><<([\x00-\xFF]*)@@@(\d+)"
    End With
    Set objMatchCol = objRegExp.Execute(DiffString)
    If objMatchCol.Count <> 1 Then GoTo InValidData
    If objMatchCol.Item(0) <> DiffString Then GoTo InValidData
    sNewString = objMatchCol.Item(0)
    sTail = objRegExp.Replace(sNewString, "$5")
    lLength = objRegExp.Replace(sNewString, "$6")
    objRegExp.Pattern = "(([\x00-\xFF])(\d)(\d+)\|)"
    Set objMatchCol = objRegExp.Execute(sNewString)
    Dim sReplace As String
    sNewString = Left(OldString, lLength)
    For lCount = 0 To objMatchCol.Count - 1
        Set objMatch = objMatchCol.Item(lCount)
        lIndex = objRegExp.Replace(objMatch.Value, "$4")
        sReplace = Chr(Asc(Mid(OldString, lIndex, 1)) + (-1) ^ objRegExp.Replace(objMatch.Value, "$3") * Asc(objRegExp.Replace(objMatch.Value, "$2")))
        sNewString = Left(sNewString, lIndex - 1) & Replace(sNewString, Mid(OldString, lIndex, 1), sReplace, lIndex, 1)
    Next lCount
    ApplyChrDiff = sNewString & sTail
    Exit Function
    ApplyChrDiff = "Error: Data Invalid"
End Function
hope this will help u

ScribbleMeatAuthor Commented:

Cool. Let me check that out. It will be a few hours before I can get to it, but it certainly looks promising.

-- ScribbleMeat
ScribbleMeatAuthor Commented:

There would be no known or predictable separators.


ScribbleMeatAuthor Commented:

If it were as easy as it sounds on the surface, I wouldn't be waiting 3
months for a concept -- so I agree it isn't trivial. This is the third post
I have made on this subject matter and until the last few minutes I have
recieved no real concepts or even theoretical guesses as to how this
might be done.

Shiju's code looks promising, I happen to know he is very handy with
the Regular Expressions -- so I have high hopes for his code.

But with deference to Shiju (I haven't tried his code yet), I will tell you
my concept on how it could be done...


Okay, first let's make a custom type to work with:

                  Private Type DiffType
                              Method as String
                              Start as Long
                              Length as Long
                              Insert as String
                  End Type

                       Private DiffArr() as DiffType

So now look at these two strings:

            aStr = "1234567890abcdefg"
            bStr = "12345abcdefg"

Okay, we know that bStr will be shorter than aStr. So we can do this:

            bStr = "12345#####abcdefg"

where the pound signs represent padded characters. So now all we have
to do is this:

                  DiffArr(0).Method = "Delete"
                  DiffArr(0).Start = 6
                  DiffArr(0).Length = 5


Now, what if we had this?

            aStr = "1234567890abcdefg"
            bStr = "12345sillyabcdefg"

Well, now what we need is this:

                  DiffArr(0).Method = "Insert"
                  DiffArr(0).Start = 6
                  DiffArr(0).Length = 5
                  DiffArr(0).InsertStr = "silly"


And for this (where string 2 is shorter)...

            aStr = "1234567890abcdefg"
            bStr = "12345Xabcdefg"

Well, now what we need is this:

                  DiffArr(0).Method = "Insert"
                  DiffArr(0).Start = 6
                  DiffArr(0).Length = 5
                  DiffArr(0).InsertStr = "X"


And for this (where string 2 is longer)...

            aStr = "1234567890abcdefg"
            bStr = "1234556890--I'm a long string --abcdefg"

Well, now what we need is this:

                  DiffArr(0).Method = "Insert"
                  DiffArr(0).Start = 10
                  DiffArr(0).Length = 0
                  DiffArr(0).InsertStr = "--I'm a long string --"



Obviously there is till a lot of logic that would need to go into that code. And Obviously
I don't have that figured out as of yet. But I'm working on it.

I will look at Shiju's code as soon as possible -- he may have the perfect solution in
RegExp. He is the man when it comes to that.

-- ScribbleMeat

You might want to consider a more generic approach. This is exactly what that old "Patch.exe" program does (and I think the new InstallShield does also), which many large software firms use to distribute application updates via the web. If you think of the data as simply a non-stop stream of binary bits, you can not only computer the diff of the changes, but you have the added benefit of being able to compress the diff file using any one of many efficient compression algorithms (LZH, Zip, RAR, and so on). Code libraries exist to do the compression, and they all work on either files or "data streams" which is what you would want. This sounds like overkill for your project, but I don't think it is. You would be treating your variables as a long stream of bits instead of an array of characters.

The big problem is not what to do when the original string is larger than the new string, it is what to do when the original string is smaller than the new one...when data has been removed from the first string, and now the second string is larger. This causes complications...also, VB isn't the best at doing binary operations since it likes to treat everything as single byte characters.

I haven't written any code for you but the concept is sound. If you don't need a full-blown general solution to this problem, then I'd certainly go with the code above and be done with it!

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

Tackle projects and never again get stuck behind a technical roadblock.
Join Now