Link to home
Start Free TrialLog in
Avatar of MitchSavage
MitchSavage

asked on

Help with search and replace.....

Hello All,

I have a main spreadsheet that contains approximately 10,000 names and addresses.....  There are about 20 columns.....  I am using the code below to replace abbreviations with whole words......  The abbreviations are in several of the columns in Sheet1, which is the main list.....
Sheet2 is the substitution list.  It is in the following form....

abbr         abbreviation
lst             list
acct          account
Thos         Thomas
amer         America

etc.....         etc.....

There are about 400 items on Sheet2.  (The substitution list)

This code does a very good job of replacing the abbreviation with the whole word.....  Problem is, it replaces the abbreviation if it is part of another word!  Example: "South American Airlines" is changed to "South Americaican Airlines".  It finds "Amer" embedded inside the word American.  I want it to be able to find the abbreviation   anywhere in the cell, but not inside another word.  If it is at the first of the cell, then it would not be preceeded with a space, but could have a trailing space.  If at the end of the cell, then it could be preceeded with a space, but would not have a trailing space.  These 2 situations could indicate that the abbreviation is imbedded as the first part or last part of another word.........     Also if the abbrevation is the ONLY word in the cell then it should be substituted......   However if it is embedded in another word, then it should be ignored......  Of course we should search to see if the abbreviation is the 1st, 2nd, 3rd etc...  part of the cell.....

I am using the inputbox to select which columns the sub is applied to............

Any help on this will be much Appreciated.................
Best Regards,
MitchSavage

Sub ReplaceAbbreviations()
 
Dim rColumn As Range, vAbbr As Variant
Dim i As Long
 
On Error Resume Next
Set rColumn = Application.InputBox(Prompt:="Please select a range with your Mouse to fix Abbreviations.", Title:="SPECIFY RANGE", Type:=8)
On Error GoTo 0
 
If Not rColumn Is Nothing Then
  Set rColumn = Application.Intersect(ActiveSheet.UsedRange, rColumn.EntireColumn)
  With Sheets("Sheet2")
    vAbbr = .Range("A2:B" & .Cells(.Rows.Count, "A").End(xlUp).Row).Value
  End With
  For i = 1 To UBound(vAbbr, 1)
    rColumn.Replace vAbbr(i, 1), vAbbr(i, 2), Lookat:=xlPart, MatchCase:=False
  Next i
Else
  MsgBox "No range selected! Exiting Sub...": Exit Sub
End If
End Sub

Open in new window

Avatar of RichardSchollar
RichardSchollar
Flag of United Kingdom of Great Britain and Northern Ireland image

Hi Mitch

That looks familiar!

Try this:

It uses a reasonably interesting method to concatenate leading and trailing spaces to the range (it removes this following termination of the replace).  This is not a foolproof method however, as punctuation can prevent this from working.  If this doesn't meet your requirements I thing the macro will need to be re-written to iterate thru the range, rather than doing the replace in one step.

Richard
Sub ReplaceAbbreviations()
 
Dim rColumn As Range, vAbbr As Variant
Dim i As Long
 
On Error Resume Next
Set rColumn = Application.InputBox(Prompt:="Please select a range with your Mouse to fix Abbreviations.", Title:="SPECIFY RANGE", Type:=8)
On Error GoTo 0
 
If Not rColumn Is Nothing Then
  Set rColumn = Application.Intersect(ActiveSheet.UsedRange, rColumn.EntireColumn)
  With Sheets("Sheet2")
    vAbbr = .Range("A2:B" & .Cells(.Rows.Count, "A").End(xlUp).Row).Value
  End With
  For i = 1 To UBound(vAbbr, 1)
    rColumn.Value = Evaluate("="" "" & " & rColumn.address & "& "" """
    rColumn.Replace " " & vAbbr(i, 1) & " ", " " & vAbbr(i, 2) & " ", Lookat:=xlPart, MatchCase:=False
    rColumn.Value = Evaluate("=IF(ROW(),TRIM(" & rColumn.Address & "))")
  Next i
Else
  MsgBox "No range selected! Exiting Sub...": Exit Sub
End If
End Sub

Open in new window

Something got lost in the translation there - this is the code that should work:


Sub ReplaceAbbreviations()
 
Dim rColumn As Range, vAbbr As Variant
Dim i As Long
 
On Error Resume Next
Set rColumn = Application.InputBox(Prompt:="Please select a range with your Mouse to fix Abbreviations.", Title:="SPECIFY RANGE", Type:=8)
On Error GoTo 0
 
If Not rColumn Is Nothing Then
  Set rColumn = Application.Intersect(ActiveSheet.UsedRange, rColumn.EntireColumn)
  With Sheets("Sheet2")
    vAbbr = .Range("A2:B" & .Cells(.Rows.Count, "A").End(xlUp).Row).Value
  End With
  For i = 1 To UBound(vAbbr, 1)
    rColumn.Value = Evaluate("="" "" & " & rColumn.address & "& "" """)
    rColumn.Replace " " & vAbbr(i, 1) & " ", " " & vAbbr(i, 2) & " ", Lookat:=xlPart, MatchCase:=False
    rColumn.Value = Evaluate("=IF(ROW(),TRIM(" & rColumn.Address & "))")
  Next i
Else
  MsgBox "No range selected! Exiting Sub...": Exit Sub
End If
End Sub

Open in new window

Avatar of Dave
Richard,
Will your method capture the word at the start of a string, ie no leading space?
I think this one is probably best suited to a regex to indentify any found matches as genuine or false.
Cheers
Dave
Dave, the code actually concatenates a space onto the start of the range and onto the end of the range to ensure there's always leading and trailing spaces.  This naturally has some limitations:

1. It is altering the original values (space concatenation) and then Trims the result after the abbreviation substitution.  This is not ideal if the original values correctly include multiple spaces (as these iwll be lost).

2. It assumes that a word which needs to be replaced does not have puntuation next to it eg Amer,  won't get replaced as it is not surrounded with a space.

I wasn't sure exactly what Mitch's data looks like (I created the original code in his query for him).  If it is more complicated than just a few words then you're absolutely right - a more detailed method will be required.

Richard
Dave

I have just been experimenting with a regex solution but it takes quite a long time to do cell by cell.  I look forward to seeing a post by you on how to use it for a range.

Chris
Sub ReplaceAbbreviations()
Dim regExp As Object
Dim cel As Range
Dim rColumn As Range, vAbbr As Variant
Dim i As Long
 
On Error Resume Next
Set rColumn = Application.InputBox(Prompt:="Please select a range with your Mouse to fix Abbreviations.", Title:="SPECIFY RANGE", Type:=8)
On Error GoTo 0
 
If Not rColumn Is Nothing Then
    Set regExp = CreateObject("vbscript.regexp")
    regExp.IgnoreCase = True
    regExp.Global = True
    Set rColumn = Application.Intersect(ActiveSheet.UsedRange, rColumn.EntireColumn)
    With Sheets("Sheet2")
        vAbbr = .Range("A2:B" & .Cells(.Rows.Count, "A").End(xlUp).Row).Value
    End With
    For i = 1 To UBound(vAbbr, 1)
        regExp.Pattern = "(.*)(\b" & vAbbr(i, 1) & "\b)(.*)"
        For Each cel In rColumn
            cel = regExp.Replace(cel, "$1" & vAbbr(i, 2) & "$3")
        Next
    Next i
    Else
        MsgBox "No range selected! Exiting Sub...": Exit Sub
    End If
End Sub

Open in new window

Richard,
> Dave, the code actually concatenates a space onto the start of the range and onto the end of the range to ensure there's always leading and trailing spaces
Works for me, I did a quick eyeball not a test :)
Chris,
I would suggest using a Find Method rather than Repalce to identify the possible string matches, then run a Regex Test to determine which matches should be converted
If you were going to work with a range, then dumping this to a variant array would hasten the process
Cheers
Dave
 
Avatar of MitchSavage
MitchSavage

ASKER

Hello everyone !!

Yes, Richard, it should look very familiar indeed !!!!  We are so close to having this solved it is just great....  I am evaluating the codes, and some other things on the macros, so it may be a few hours before I can give you feedback.....  Thanks for all the great work....  It really is appreciated....

Best Regards,
MitchSavage

Richard,
The latest iteration of your code does work great.....  It does find the abbreviation at the first of the cell.  Finds it somewhere in the middle of the cell between words.  It does ignore the abbreviation if it is part of another word.  Seems to find all combinations of the abbreviation scattered throughout the cell.  That is just almost perfect.......  As stated herein, it will not find the abbreviation if there are any characters either immediately before, or immediately after the abbreviation.....  Example:  /acct  or acct. or acct:  Is there a simple fix that will ignore a list of punctuation characters? Everything else will meet my needs perfectly well.....  Things such as  |     acct          and   acct   and               acct  |  are changed to:   |acct and acct and acct|  ..........  I like the way you are manipulating the spaces because I need excess spaces removed anyway.  I have another Sub that removes excess spaces........  It took 2 (Two) seconds for the code to operate on a relatively small dataset.....

Chris,

I ran some extensive tests on the code you posted......  Here is what I found.......  It does find the abbreviation at the first of the cell.....  It does find the abbreviation somewhere in the middle between words.  It does ignore the abbreviation if it is contained within another word.......  It WILL NOT find multiple instances of the abbreviation in a single cell.  It finds the last instance only and replaces it.  Things such as  |     acct          and   acct   and               acct  |  are changed to:    |     acct        and   acct and           Account  |  Interestingly, It will find the abbreviation if there are certain punctuation characters immediately before the abbreviation or immediately after.....  It could happen in the real world that there could be punctuation immediately after.......  People tend to put a  .  after abbreviations....   so maybe that is handy.  It took 73 (Seventy Three) seconds for the code to run on 6 columns of a relatively small dataset.....  
 
Speed will become important as I finish this up.......  I have about 85 large recordsets......  they range in size from about 300,000 line items to about 800,000 line items each......  they have about 18 columns.....

Soon, really soon, I will run this across real data and see how we do.......  All this help I have received is surely appreciated.....

Best Regards,
MitchSavage

Richard,
I made an error in my post above......  Middle of the night here.....  Sorry......
It Should read:

Things such as  |     acct          and   acct   and               acct  |  are changed to:   |Account and Account and Account|  ...  The |'s in all cases indicate the cell border......  So, It removed the excess spaces and did a proper substitution.........

Best Regards,
MitchSavage
SPent a fair bit of time figuring out the fix for the partial replacements, (stoopid mistake) and a way to speed it up i.e. using ranges.

The resultant is as attached.  I have tested a number of error cases, (hence the time taken) but it looks reasonably robust to me.

See what you think

Chris
Sub ReplaceAbbreviations2()
Dim regExp As Object
Dim startCell As String
Dim replaceRange As Range
Dim lastCell As Range
Dim cel As Range
Dim vAbbr As Variant
Dim abbrevCount As Long
Dim newCell As Boolean
 
On Error Resume Next
Set replaceRange = Application.InputBox(Prompt:="Please select a range with your Mouse to fix Abbreviations.", Title:="SPECIFY RANGE", Type:=8)
 
On Error GoTo 0
 
    If Not replaceRange Is Nothing Then
        Set regExp = CreateObject("vbscript.regexp")
        regExp.IgnoreCase = True
        regExp.Global = True
        With Sheets("Sheet2")
            vAbbr = .Range("A2:B" & .Cells(.Rows.Count, "A").End(xlUp).Row).Value
        End With
        For abbrevCount = 1 To UBound(vAbbr, 1)
            regExp.Pattern = "\b" & vAbbr(abbrevCount, 1) & "\b"
            On Error Resume Next
            Set cel = replaceRange.Parent.Range("a1").Find(What:=CStr(vAbbr(abbrevCount, 1)), SearchDirection:=xlNext, LookAt:=xlPart)
            If cel Is Nothing Then Set cel = replaceRange.Find(What:=vAbbr(abbrevCount, 1), After:=replaceRange.Parent.Range("a1"))
            On Error GoTo 0
            If Not cel Is Nothing Then
                newCell = True
                Do While newCell
                    Debug.Print cel.Address
                    Set lastCell = cel
                    cel = regExp.Replace(cel, vAbbr(abbrevCount, 2))
                    Set cel = replaceRange.Find(What:=vAbbr(abbrevCount, 1), After:=cel)
                    newCell = (cel.Column > lastCell.Column And cel.Row = lastCell.Row) Or cel.Row > lastCell.Row
                Loop
            End If
        Next
    End If
 
End Sub

Open in new window

Chris,
There must be an error somewhere.  I could not get the code to work at all......  It does not give an error, just does nothing.......   Help......
Looks like you put some effort into this......  I do appreciate it very much........

I will continue looking at it and see what I can find......  Wish me luck.....  You guys are way past my level of understanding.......

Thanks,
MitchSavage.....
You are running or renaming the correct sub note the "2" affix to seperate it ftom the previous?

Chris
Chris,

I did determine that I am running the correct macro.  I placed all the macros about abbreviations onto a sample spreadsheet.......  they all perform pretty much as expected......  I just cannot get your code to operate....  It looks intriguing, so I really want to see what it does......  I cannot yet see what I could be doing wrong......  Are there other Excel settings outside the macro that can affect how it operates???

Thanks Chris

I have included my test file so as to help see what I have set up wrong ... since it works for me in this file.

Chris
replace-column.xls
Chris,
Downloaded your xls file...  Please tell me how to use it....  should I apply your complete sub from above??  This code was in the xls file:

Sub Macro2()
'
' Macro2 Macro
'

'
    Columns("A:G").Select
    Range("A368").Activate
    Selection.Find(What:="del", After:=ActiveSheet.Cells(ActiveSheet.Rows.Count, ActiveSheet.Columns.Count).Address, LookIn:=xlFormulas, _
        LookAt:=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, _
        MatchCase:=False, SearchFormat:=False).Activate
End Sub

Remember,  I  inotcht very good with vba....  :>)

Thanks,
MitchSavage
tools | macros | run select ReplaceAbbreviations2 now it ought to continue as per your own.

Next of course would be to udnderstand why it doesn't work in your file ... can you provide it or an equivalent?

Chris
For your information ... in order to simplfy recalling the syntax I will often record an action then pull the essential components from teh recorded sub ... and that is what is in the module 2/3 as therein ... essentially rubbish.

Chris

Richard,

I am pleased with the abbreviations code you supplied......  Thanks.....  Is it possible for the code to find  the abbreviation if it has a punctuation character before or after the abbreviation......  I  have quite a few instances of punctuation characters......  Example:  /acct    or acct/      or  acct.       or acct:      or acct;     and quite a few others......   Can there be an exceptions list of characters???   Almost Done with this one !!!!!  I need to do this last thing, and I am off to the next challenge......  Thanks for all the help, I do Appreciate it..................

Besr Regards,
MitchSavage
PArdon my own confusion but you can run Richards code whereas you don't know how to run mine.  I included a test file and you can't use it.  Both require you to run a similarly named macro.

Perhaps then you have a mechanism for calling the macro and it is the name that needs to be the same?, I am pasting my earlier post and calling it via the original name below in case that is somehow the problem.

PAste this in the workbook and try to call it as normally.

Chris


Sub ReplaceAbbreviations()
Dim regExp As Object
Dim startCell As String
Dim replaceRange As Range
Dim lastCell As Range
Dim cel As Range
Dim vAbbr As Variant
Dim abbrevCount As Long
Dim newCell As Boolean
 
On Error Resume Next
Set replaceRange = Application.InputBox(Prompt:="Please select a range with your Mouse to fix Abbreviations.", Title:="SPECIFY RANGE", Type:=8)
 
On Error GoTo 0
 
    If Not replaceRange Is Nothing Then
        Set regExp = CreateObject("vbscript.regexp")
        regExp.IgnoreCase = True
        regExp.Global = True
        With Sheets("Sheet2")
            vAbbr = .Range("A2:B" & .Cells(.Rows.Count, "A").End(xlUp).Row).Value
        End With
        For abbrevCount = 1 To UBound(vAbbr, 1)
            regExp.Pattern = "\b" & vAbbr(abbrevCount, 1) & "\b"
            On Error Resume Next
            Set cel = replaceRange.Parent.Range("a1").Find(What:=CStr(vAbbr(abbrevCount, 1)), SearchDirection:=xlNext, LookAt:=xlPart)
            If cel Is Nothing Then Set cel = replaceRange.Find(What:=vAbbr(abbrevCount, 1), After:=replaceRange.Parent.Range("a1"))
            On Error GoTo 0
            If Not cel Is Nothing Then
                newCell = True
                Do While newCell
                    Debug.Print cel.Address
                    Set lastCell = cel
                    cel = regExp.Replace(cel, vAbbr(abbrevCount, 2))
                    Set cel = replaceRange.Find(What:=vAbbr(abbrevCount, 1), After:=cel)
                    newCell = (cel.Column > lastCell.Column And cel.Row = lastCell.Row) Or cel.Row > lastCell.Row
                Loop
            End If
        Next
    End If
 
End Sub

Open in new window

Chris,

Thanks for being persistent, and not giving up on me......  This website will not allow me to upload an xlsm file with a macro imbedded.  I am sending you an xls file with some live data.....  I cut this this from a name and address spreadsheet.  The original has about 300,000......  This one is about 10,000......
A sample Abbreviations list is on Sheet2......  

Take a look at this and you will see what I am trying to do......  This one has everything Uppercase.  Some have everything Lowercase........  Some are a really screwed up mix of UC/LC......  What I am trying to do is "Normalize" these various lists.......  In total, right now, I have about 85 of these lists, with a total line items totalling millions........  A bit much to do by hand, that's for sure ..............   I am setting up this macro to do "Proper Case" , "Remove Extra Spaces" , "Remove extra Hyphens" , "Change Abbreviations to "their Whole Word", "Change 2 and 3 letter company names from this:  "Aaa Designs" to this: "AAA designs""...........    So you see , Chris, there is a lot to do to get this finished..............

Anyway, Hope this helps.........  Perhaps you can show me the next step to make this work........

Thanks so much for the help................
Best Regards,
MitchSavage
Chris.....
Looks like the xls may not have attached correctly......

Trying again...........

Mitch
I presume you are trying to save as 97-2003 .xls which should work ok for uploading.

Chris
Chris,

Trying to upload .........  Same data, just cut it down to about 20 rows........

Mitch......

Doesn't help anyone else initially but try emailing it to me and i'll see if I can upload it for you.

My addy can be derived from my profile page.

Chris
Chris,

Not so sure why I cannot attach a file.....  I check "Attach File", Then the little window opens with a button "Add File"...........   Click the "Add File" button and navigate locally to my file, click "Open", get a progress bar from the website "Uploading"  .........  When that finishes, I click "Submit"  ......  Beats me, so I will email you direct.....  Where do I send it to......   I will prepare the xlsm file with all the test macros so you can see what I am doing......

Thanks and,
Best Regards,
MitchSavage
As I said earlier see the profile which is displayed when clicking my name on a post.  The email is detailed there.

Chris
Chris,

Sent it to you at your experts address.

Mitch
Mitch

Haven't looked yet but this is hopefully an upload of your file as 97-2003 .xls

Chris
This.xls
ASKER CERTIFIED SOLUTION
Avatar of Chris Bottomley
Chris Bottomley
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Chris,
code you posted I cannot get to work..........
Now I am distraught.......   The code you posted I cannot get to work.......  This is insane......  When I execute your code, it just blinks, and nothing changes......  No error message, No anything........  Something is dreadfully wrong here......  I haven't a clue right now.....  I am going brain dead working on this.....  I'm going to crash for a few hours and re-approach this tomorrow......  At least you have it working on my data, so we are on the right track..............  Maybe you can get it to run in my spreadsheet and just email it back to me.......  If we come up with a good solution, we can always post it here......
Thank you so much,
MitchSavage
I'll post what I did here, I cannot say it 'worked' but it did make a few changes so I was hoping with your greater knowledge of the expected that you would be better placed to spot any errors.

Select your sheet to be 'corrected' and then select alt + F8 then ReplaceAbbreviations_CRB_1.

Chris
This.xls
The same file but with the changed cells highlighted

Chris
This.xls
Chris,

I have tried without success to get your code to run......  I am going to award points and close this question soon....  wish I was better at vba, so i could contribute more to my own question... :>)

Thanks for all the help......  Any More suggestions???

Best Regards,
MitchSavage
Can't get it to run, can you give any clues, since I have run the supplied files and as long as macros run it should therefore work for you.

Chris
Chris,

Sure beats me.......  I'll try again tonight................

Thanks,
MitchSavage


ALso in the processed file I uploaded does it seem to replace what it should?

i.e. two actions as I see it:

1. Does it do the right job.
2. How to get it to work for you.

Chris
Mitch

I appreciate you had the frustration of never getting my code to work but can you indicate a status to this Q?

Chris
Thanks, Chris....   I did get it working.....  Surely do appreciate the help.

Best Regards,
Mitch Savage