Remove non-english language characters using vba

I recieve a spreadsheet from France, Russia, and Saudi Arabia separately.  The description field always contains the English and the Non-English verbage separated by numerous spaces, return characters and or special characters. Sometimes the Non-English verbage is first sometimes the English verbage is first.  The length of each is also variable.

I need to remove the Non-English verbage while leaving the English verbage. I have looked all over and can not find how to do this.  I would share the spreadsheet but that data is sensitive.

Please assist if up to the challenge
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Patrick MatthewsCommented:
Try adding the code below into a regular VBA module.  It is fully described in my article here:

The formula would be:

=RegExprReplace(A2,"[^a-zA-Z0-9,\.;: -]")

Without some sample data, I can only cross my fingers that it works :)
Function RegExpReplace(LookIn As String, PatternStr As String, Optional ReplaceWith As String = "", _
    Optional ReplaceAll As Boolean = True, Optional MatchCase As Boolean = True, _
    Optional MultiLine As Boolean = False)
    ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
    ' as long as you properly credit and attribute authorship and the URL of where you
    ' found the code
    ' For more info, please see:
    ' This function relies on the VBScript version of Regular Expressions, and thus some of
    ' the functionality available in Perl and/or .Net may not be available.  The full extent
    ' of what functionality will be available on any given computer is based on which version
    ' of the VBScript runtime is installed on that computer
    ' This function uses Regular Expressions to parse a string, and replace parts of the string
    ' matching the specified pattern with another string.  The optional argument ReplaceAll
    ' controls whether all instances of the matched string are replaced (True) or just the first
    ' instance (False)
    ' If you need to replace the Nth match, or a range of matches, then use RegExpReplaceRange
    ' instead
    ' By default, RegExp is case-sensitive in pattern-matching.  To keep this, omit MatchCase or
    ' set it to True
    ' If you use this function from Excel, you may substitute range references for all the arguments
    ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
    ' where a large number of calls to this function are made, making RegX a static variable that
    ' preserves its state in between calls significantly improves performance
    Static RegX As Object
    If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = ReplaceAll
        .IgnoreCase = Not MatchCase
        .MultiLine = MultiLine
    End With
    RegExpReplace = RegX.Replace(LookIn, ReplaceWith)
End Function

Open in new window


Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
rbpartviAuthor Commented:
Thank you Patrick.  I have placed the code into Module 1 and ran but the function is asking for a macro to run.  Any insight?
Patrick MatthewsCommented:
You do not run it as a macro.  Use it in a formula in your worksheet.
The Ultimate Tool Kit for Technolgy Solution Provi

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy for valuable how-to assets including sample agreements, checklists, flowcharts, and more!

rbpartviAuthor Commented:
I must be missing something.  I press 'ALT F11" placed the code into my first sheet,  but I keep seeing the "choose Macro box" with Run at the top right when I try t run the code.
TracyVBA DeveloperCommented:
Once you copy the code into the VBE then close the VBE.  Go into Excel, and add the formula that patrick suggested: =RegExprReplace(A2,"[^a-zA-Z0-9,\.;: -]")
Patrick MatthewsCommented:


Put the code in a regular module (NOT a sheet or ThisWorkbook module), and then use a formula in your worksheet.

For example, if your data is in A2, use a formula like this:

=RegExprReplace(A2,"[^a-zA-Z0-9,\.;: -]")

That formula will try to remove anything that is not a letter, digit, period, comma, semicolon, colon, space, or hyphen.  To expand to other punctuation marks:

=RegExprReplace(A2,"[^a-zA-Z0-9,\.;: -!\?'""]")

(that adds excalamtion point, question mark, singlequote, and doublequote).
rbpartviAuthor Commented:
The data (the language text) I am needing to remove is in 500 lines of one column.  Would I make a range like =RegExprReplace(H6:H506,"[^a-zA-Z0-9,\.;: -!\?'""]")?
Patrick MatthewsCommented:
No.  If the first cell is H6, then in, say, I6, you would put:

=RegExprReplace(H6,"[^a-zA-Z0-9,\.;: -!\?'""]")

You would then copy that down Column I through I506, or however far you have to go.
rbpartviAuthor Commented:
Thank you very much for the assistance.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Excel

From novice to tech pro — start learning today.