Remove non-english language characters using vba

I recieve a spreadsheet from France, Russia, and Saudi Arabia separately.  The description field always contains the English and the Non-English verbage separated by numerous spaces, return characters and or special characters. Sometimes the Non-English verbage is first sometimes the English verbage is first.  The length of each is also variable.

I need to remove the Non-English verbage while leaving the English verbage. I have looked all over and can not find how to do this.  I would share the spreadsheet but that data is sensitive.

Please assist if up to the challenge
Who is Participating?
Patrick MatthewsConnect With a Mentor Commented:
Try adding the code below into a regular VBA module.  It is fully described in my article here:

The formula would be:

=RegExprReplace(A2,"[^a-zA-Z0-9,\.;: -]")

Without some sample data, I can only cross my fingers that it works :)
Function RegExpReplace(LookIn As String, PatternStr As String, Optional ReplaceWith As String = "", _
    Optional ReplaceAll As Boolean = True, Optional MatchCase As Boolean = True, _
    Optional MultiLine As Boolean = False)
    ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
    ' as long as you properly credit and attribute authorship and the URL of where you
    ' found the code
    ' For more info, please see:
    ' This function relies on the VBScript version of Regular Expressions, and thus some of
    ' the functionality available in Perl and/or .Net may not be available.  The full extent
    ' of what functionality will be available on any given computer is based on which version
    ' of the VBScript runtime is installed on that computer
    ' This function uses Regular Expressions to parse a string, and replace parts of the string
    ' matching the specified pattern with another string.  The optional argument ReplaceAll
    ' controls whether all instances of the matched string are replaced (True) or just the first
    ' instance (False)
    ' If you need to replace the Nth match, or a range of matches, then use RegExpReplaceRange
    ' instead
    ' By default, RegExp is case-sensitive in pattern-matching.  To keep this, omit MatchCase or
    ' set it to True
    ' If you use this function from Excel, you may substitute range references for all the arguments
    ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
    ' where a large number of calls to this function are made, making RegX a static variable that
    ' preserves its state in between calls significantly improves performance
    Static RegX As Object
    If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = ReplaceAll
        .IgnoreCase = Not MatchCase
        .MultiLine = MultiLine
    End With
    RegExpReplace = RegX.Replace(LookIn, ReplaceWith)
End Function

Open in new window

rbpartviAuthor Commented:
Thank you Patrick.  I have placed the code into Module 1 and ran but the function is asking for a macro to run.  Any insight?
Patrick MatthewsCommented:
You do not run it as a macro.  Use it in a formula in your worksheet.
Never miss a deadline with

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

rbpartviAuthor Commented:
I must be missing something.  I press 'ALT F11" placed the code into my first sheet,  but I keep seeing the "choose Macro box" with Run at the top right when I try t run the code.
TracyVBA DeveloperCommented:
Once you copy the code into the VBE then close the VBE.  Go into Excel, and add the formula that patrick suggested: =RegExprReplace(A2,"[^a-zA-Z0-9,\.;: -]")
Patrick MatthewsCommented:


Put the code in a regular module (NOT a sheet or ThisWorkbook module), and then use a formula in your worksheet.

For example, if your data is in A2, use a formula like this:

=RegExprReplace(A2,"[^a-zA-Z0-9,\.;: -]")

That formula will try to remove anything that is not a letter, digit, period, comma, semicolon, colon, space, or hyphen.  To expand to other punctuation marks:

=RegExprReplace(A2,"[^a-zA-Z0-9,\.;: -!\?'""]")

(that adds excalamtion point, question mark, singlequote, and doublequote).
rbpartviAuthor Commented:
The data (the language text) I am needing to remove is in 500 lines of one column.  Would I make a range like =RegExprReplace(H6:H506,"[^a-zA-Z0-9,\.;: -!\?'""]")?
Patrick MatthewsCommented:
No.  If the first cell is H6, then in, say, I6, you would put:

=RegExprReplace(H6,"[^a-zA-Z0-9,\.;: -!\?'""]")

You would then copy that down Column I through I506, or however far you have to go.
rbpartviAuthor Commented:
Thank you very much for the assistance.
All Courses

From novice to tech pro — start learning today.