Link to home
Start Free TrialLog in
Avatar of ScribbleMeat
ScribbleMeat

asked on

WITH PURE VB CODE: Compare two similar strings (aStr & bStr); make a string that holds the differences (cStr); Use cStr to modify aStr and reproduce bStr

I need the folllowing functionality in pure VB code:
===================================

Imagine with me if you will that you have two strings:

         String One (First Draft):
         "I am a silly big green monster"

         String Two (Second Draft):
         "I am a big blue monster"

Okay, now I need two functions like this:


          Private Function GetChrDiff(OldString As String, NewString As String) As String
               
                    'Returns a string that is only the difference between
                    'OldString and NewString
         
          End Function
         
         
          Private Function ApplyChrDiff(OldString As String, DiffString As String) As String
         
                    'Takes DiffString and applies it to OldString and
                    'the result should be the same as the NewString
                    'that I gave to GetCharDiff
         
          End Function
         
         
Don't worry about doing a CheckSum, it doesn't matter
if there is an error, I will deal with that later.

========================================================

Let me put it into a real-world usage example:

Let's say I'm writing a novel, and it is time for me to
hand my publisher the first draft. So I burn the first
draft to a CD, it is 35 MB.

Then, I start on my second draft. I make several changes
to the original, but I decide that I'm still not happy
with the novel. Suddenly I realize that I only have one
floppy disk and no more CD's.

The only way I will have room on the floppy disk to save
my second draft is if I only save the differences between
the two files.

So, using my handy dandy VB program I wrote, I make a text
file that only contains the difference between the first
and second drafts.

Later, when I am on my death bed, I decide that I liked
my second draft better -- but all I have is the original
on CD and the floppy disk with the Difference File on it.

So I open up my handy dandy VB program I wrote and apply
the Difference File on the floppy to the 35 MB file on
the CD and BAM! There is my second draft!

========================================================

Stupid story, I know. But hopefully it describes what I
need to do.

OH! And remember -- there will be some parts I take out
of the first draft, some parts I replace, and some parts
that I add. So theoretically, the second draft could be
shorter or longer than the original draft.

You can keep the code simple. You don't have to open a file or
save a file in your example. Just deal with the string variables.
Avatar of ScribbleMeat
ScribbleMeat

ASKER

BONUS:

I have another similar question that is still open. If you answer this question,
you get the 500 points from the other one as well.

Yay! 1,000 points for you!

-- ScribbleMeat
Avatar of inthedark
Is this a data synchronisation question, where you want to send changes of data from one place to another without having to send the whole file?
Yes, that could be one implementation.

For instance, there is a 50MB file on a server. You make a few small
changes to the file, why send the whole 50MB again?

But there could be other applications too. For instance, you could use it for
incremental backups to a file. Or you could make one base file and have
many versions of that base read-only file so that you can have unlimited
undo's.

See what I mean?
In order to generate the differences you would need to keep a copy of the data that was last sent. Each time you generate a new disk you save the copy.

When you save the differences you would need so save an address for each difference.  This could be done using paragraphs.

So I would start by defing the format of the difference list.

For example:

change type, para number, from position, to position, [new data]

Where the change type could be C=change or D = delete.

 
What About Add? There could be more OR less data in the new file.
Need to know a bit more about the format of th file.  Is the data continuos text, binary, or is it realy like a novel with paragraphs etc?
Well, let's assume the following:

1) StringA is an Ascii string.

2) StringB is an Ascii string as well.

3) StringC -- the difference string -- can be binary or Ascii, whatever is most efficient.

4) We aren't working with files, just strings held in variables.


Does that help?
Some questions:

1) Are there any known seperators within the string example CR/LF?

2) What is the likley length of each string
what do you REALLY mean by the differences?  Character by character?  Word by Word?

for instance, what I a getting at is this:

Consider these two sentences:

     
         "I am a silly big green monster"

         
         "I am a big blue monster"

what would you want to see as the 'character by character difference?  How is a 's' (from silly in Sentecnce 1) different from a 'b' (from big in sentence 2)?  How about the fact that sentece 2 has FEWER characters than sentence 1?

Similarly, how is the WORD 'big' different from the WORD 'silly'?  What would you want to have in the 'differences' file, so that somehow you could transform the original text into the new version?  That will be  non-trivial undertaking.

How would you want to indicate that Sentence 2 has REMOVED the word 'silly' and replaced the word 'green' with the word 'blue', but only in that one sentence.  What would be the case if ALL occurences of 'green' (anywhere in the original text) should be replaced with the word 'blue'?

AW
ASKER CERTIFIED SOLUTION
Avatar of Shiju S
Shiju S
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Shiju,

Cool. Let me check that out. It will be a few hours before I can get to it, but it certainly looks promising.

-- ScribbleMeat
inthedark,

There would be no known or predictable separators.


=============================

Arther,

If it were as easy as it sounds on the surface, I wouldn't be waiting 3
months for a concept -- so I agree it isn't trivial. This is the third post
I have made on this subject matter and until the last few minutes I have
recieved no real concepts or even theoretical guesses as to how this
might be done.

Shiju's code looks promising, I happen to know he is very handy with
the Regular Expressions -- so I have high hopes for his code.

But with deference to Shiju (I haven't tried his code yet), I will tell you
my concept on how it could be done...

================================================

Okay, first let's make a custom type to work with:

                  Private Type DiffType
                              Method as String
                              Start as Long
                              Length as Long
                              Insert as String
                  End Type

                       Private DiffArr() as DiffType

So now look at these two strings:

            aStr = "1234567890abcdefg"
            bStr = "12345abcdefg"

Okay, we know that bStr will be shorter than aStr. So we can do this:

            bStr = "12345#####abcdefg"

where the pound signs represent padded characters. So now all we have
to do is this:

                  DiffArr(0).Method = "Delete"
                  DiffArr(0).Start = 6
                  DiffArr(0).Length = 5

            ====================

Now, what if we had this?

            aStr = "1234567890abcdefg"
            bStr = "12345sillyabcdefg"

Well, now what we need is this:

                  DiffArr(0).Method = "Insert"
                  DiffArr(0).Start = 6
                  DiffArr(0).Length = 5
                  DiffArr(0).InsertStr = "silly"

            ====================

And for this (where string 2 is shorter)...

            aStr = "1234567890abcdefg"
            bStr = "12345Xabcdefg"

Well, now what we need is this:

                  DiffArr(0).Method = "Insert"
                  DiffArr(0).Start = 6
                  DiffArr(0).Length = 5
                  DiffArr(0).InsertStr = "X"

            ====================

And for this (where string 2 is longer)...

            aStr = "1234567890abcdefg"
            bStr = "1234556890--I'm a long string --abcdefg"

Well, now what we need is this:

                  DiffArr(0).Method = "Insert"
                  DiffArr(0).Start = 10
                  DiffArr(0).Length = 0
                  DiffArr(0).InsertStr = "--I'm a long string --"

            ====================

WELL, HERE'S THE THING:

Obviously there is till a lot of logic that would need to go into that code. And Obviously
I don't have that figured out as of yet. But I'm working on it.

I will look at Shiju's code as soon as possible -- he may have the perfect solution in
RegExp. He is the man when it comes to that.

-- ScribbleMeat

--
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial