Solved

Comparing Strings Between Two Columns and Highlight difference in RED

Posted on 2011-02-22
20
1,044 Views
Last Modified: 2012-05-11
Hi  Excel Experts,
I would like to compare column A of Sheet 1 with the column A of Sheet2 and highlight the word difference in RED.
The final results should look like in the attached file. Any assistance would be appreciated….


Sheet1
Column A

StringID
Angelina27-11 Florida123Order has been processed.
Michael345Order has been processed
23ScottGeorgia


Sheet2
Column A
Michael345Order has been processed
Tanisha78notProcessed234
23ScottGeorgia
Angelina27-11 Florida123Order has been processed.


FinalResultExample.bmp
0
Comment
Question by:smc1234
  • 12
  • 7
20 Comments
 
LVL 50

Expert Comment

by:Dave Brett
ID: 34957576
Is your data layout identical, ie in the case above the first String is 4 lines on sheet 1, 5 lines on sheet. Does the remaining data "line up" or given the 'missing' line in sheet 1?

An actual Excel sample would help

If your data layout is indentical then if you have a one off need then you can use SpreadSheet Advantage to produce the non-match output, you can get a 30 day free trial from http://www.spreadsheetadvantage.com/

Cheers

Dave
0
 
LVL 41

Expert Comment

by:dlmille
ID: 34957658
This link gets you very close:  http://excel.bigresource.com/Track/excel-SzmrTZeT/

Dave
0
 
LVL 41

Expert Comment

by:dlmille
ID: 34957827
I ran the code supplied from the tip (above) with the data in two successive columns as an experiment.

Here are the results.  If this is "sufficient", I can modify to work with two workbooks.
  sampleDave  
0
 
LVL 41

Expert Comment

by:dlmille
ID: 34957835
Let me try that again...

Example below:

sampleDave
0
 
LVL 41

Expert Comment

by:dlmille
ID: 34958912
Ok.  I adapted the code from the link I suggested.  It is about as good as I can get without spending significant time back and forth with you to understand what is or isn't a match (you might have gleaned that quandry from the support thread on that link) :)

I have attached two examples.  the r1 example compares two columns side-by-side so you can quickly see how it works.  the r2 example compares column A in Sheet1 with column A in Sheet2

Methodology:  The app compares from left to right, WORD compare to WORD (continuous text separated by a space), then character by character.  In a WORD comparison, if "shipped" is matched with "has", the "h" in the "shipped" word is deemed a match.  Once done, if you've selected the option, the algorithm will process by WORD from right to left.  In this way, if there is a gap or garbage in the middle of a text string, the ending words have a chance to match with the other dataset.

For example "rain in asdfdjfd Spain" compared with "rain in Spain" would only match "rain in" on the left to right, but would match also "Spain" processing right to left.  This may be the preferred option, though the default is Left To Right, just select NO at the prompt to process both ways.

I hope this gets you close enough for what you're trying to do (as your purpose was not stated) - if its to highlight where edits need to be, then close may be "good enough"?

Finally, as the other Dave mentioned, this isn't an easy exercise, unless the data is aligned correctly.  It would take a completely additional algorithm which may not be trivial to first somehow MATCH a row in Sheet1 with a row in Sheet2 to do the compare/highlight.  I'm assuming you can manage this.

So, here are both attachments.  Again r1 is just a side-by-side comparison for you to quickly test the veracity of the app.   r2 is the final product with data in Sheet1 and Sheet2

Dave
CompareAndHighlight-r1.xlsm
CompareAndHighlight-r2.xlsm
0
 

Author Comment

by:smc1234
ID: 34967482
Thank you so much for your suggestions. I will try it tomorrow.
0
 

Author Comment

by:smc1234
ID: 34985856
0
 
LVL 41

Expert Comment

by:dlmille
ID: 34985869
@smc - you have a comment to go with that file?

:)

Dave
0
 

Author Comment

by:smc1234
ID: 34986070
Thank you Dave so much for your help.
 r2.xlsm is very close to what I am looking for except that is comparing row 1 sheet 1 with row 2 sheet 2 etc.   I need to compare StringID sheet 1 with the closest StringID on sheet2 and highlight the difference.  The attached excelSample3 contains how the final results should look like.  Can you please let me know if that would be possible to accomplish via vb code?

excelSample3.xls
0
 
LVL 41

Expert Comment

by:dlmille
ID: 34986232
Do you mind if I reorder the list, if done programmatically?
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 41

Expert Comment

by:dlmille
ID: 34986869
This was quite a challenge and generally more than you might expect at E-E, except as experts we like to take on a challenge from time to time and this is one of those challenges.  So here goes!

Ok - I found a fuzzy search algorithm for string comparison here: http://excellerando.blogspot.com/2010/03/vlookup-with-fuzzy-matching-to-get.html
Some neat fuzzyVlookup, fuzzyHlookup, and fuzzymatchscore routines that helped me match as best as possible rows in Sheet 1 with rows in Sheet 2.  Once I had a match, I used the HIGHLIGHT routine by WORD to highlight the differences.

I believe I found an error in this public domain code, in the subroutine SumOfCommonStrings() as it didn't carry through the iScore tally on recursive call - potentially an errant mistake that was somehow not caught.  When I corrected for it, my test data and yours performed MUCH better - actually, the row matches were exactly correct.  I also posted back to my source to ensure the original blogger concurs, or at least has a chance to concur at this point.

For the sample data you provide, the solution works quite well.  Why don't you run it on a more intensive dataset and advise if it serves?

PS - if you can sort/align the two datasets, then this fuzzy search routine is not really necessary, but your prompts lead me to believe that that's not necessarily going to happen...  So, I sought a method for matching that I believe could work a high % of the time...

Let me know how it works out on a larger dataset!

There's significant amount of code here (re: fuzzy match routines, and highlight routines), so I hesitate to just post it in the code below.  In both cases, I believe I improved on the public sources I found that gave me a good start.  Hopefully, anyone interested enough will open the attached spreadsheet and see for his/her self!

Dave
CompareAndHighlight-r3.xlsm
0
 

Author Comment

by:smc1234
ID: 35014208
Hi expert,

That was exactly what I was looking for.
You did a great job! :)
However, it does not work on a large sample of data. In my case, I have more than 1500 rows of data. When I run the code it freezes everything. Do you know how to solve that?  Thank you…
0
 
LVL 41

Expert Comment

by:dlmille
ID: 35014683
provide me some/all of your data, if confidential then fudge it up, and I can take a look
0
 
LVL 41

Accepted Solution

by:
dlmille earned 500 total points
ID: 35014707
Also, I created some messages in this version. While it runs, look at the statusbar at the bottom.  Make note on what its doing when it crashes - it will tell us which steps its on, etc...


Dave
CompareAndHighlight-r4.xlsm
0
 

Author Comment

by:smc1234
ID: 35051694
Thank you so much for all your help and for the status bar message. It is very helpful. :)
I really appreciate it.
0
 
LVL 41

Expert Comment

by:dlmille
ID: 35052392
So it wasn't freezing?  and the statusbar helped you understand that?

Let me know - and that's great.

Dave
0
 

Author Comment

by:smc1234
ID: 35055387
It only starts freezing if Sheets have more than 500 rows. I have reduced my data into subcategories, and now macro works great. :) Thank you.
0
 
LVL 41

Expert Comment

by:dlmille
ID: 35058745
If you can send me a sample of > 500 rows I can test to see where the error is - I like to ensure my tools work in all cases, if possible.  I want to see if its me or the fuzzymatch function that needs adaptation.

Appreciated,

Dave
0
 

Author Comment

by:smc1234
ID: 35091908
Hi,
Sorry for not being able to answer to you sooner. While running your code for two-three days, I realized that screen does not always freeze after 500 rows. The screen will freeze as soon as I touch the computer mouse, or if I want to work in a different application.  To unfreeze the screen, I press ESC button and then Continue on the message” The code has been interrupted, do you want to continue?” It would be nice to be able to work in any other application, while code is running. Do you know how to update code to allow multitasking? Thanks again for all your help. I really appreciate it.
0
 
LVL 41

Expert Comment

by:dlmille
ID: 35097538
I believe if we disable interrupt key while the macro is running then Excel should run unperturbed while you do other things...
Of course disabling interrupt means you can't stop the algorithm till it ends :)

Try this.
Private Sub CommandButton1_Click()
    Application.EnableCancelKey = xlDisabled
    'Application.CalculationInterruptKey = xlEscKey
    Application.Interactive = False

    Call CheckAgainstColumnA
    
    Application.Interactive = True
    Application.EnableCancelKey = xlInterrupt

End Sub

Open in new window

CompareAndHighlight-r5.xlsm
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Suggested Solutions

Improved? Move/Copy Add-in Replacement - How to avoid the annoying, “A formula or sheet you want to move or copy contains the name XXX, which already exists on the destination worksheet.” David Miller (dlmille)  It was one of those days… I wa…
This article will guide you to convert a grid from a picture into Excel format using Microsoft OneNote and no other 3rd party application.
The viewer will learn how to use the =DISCRINV command to create a discrete random variable, use this command to model a set of probabilities and outcomes in a Monte Carlo simulation, and learn how to find the standard deviation of a set of probabil…
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now