• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1552
  • Last Modified:

Remove Chinese characters in SQL

I am working with data from Hong Kong and getting alot of Chinese characters mixed in. The software I am importing into does not read these. Is there a simple way in SQL to clear out the Chinese characters from the fields?

  • 3
  • 2
1 Solution
Presumably that the acceptable characters for the software you are using are the standard alpanumerics, as opposed to languages using pictorial script such as Chinese/Japanese/Korean etc.

The 'acceptable' characters will have character codes, just as the unacceptable characters will have character codes...  so filter your data to remove the characters that are not in the acceptable 'range'.  It means a lot of processing as you will be checking every character, but it should work.  

Alternatively you you could some kind of regular expression/pattern matching.  As to how to implement that, I found a useful article on StackOverflow Strip unwanted characters from string

I modified it slightly for your needs, so give this a go:

Create Function [dbo].[RemoveNonAlphaCharacters](@Temp VarChar(1000))
Returns VarChar(1000)

    Declare @KeepValues as varchar(100) = '%[^a-zA-Z0-9]%'
    While PatIndex(@KeepValues, @Temp) > 0
        Set @Temp = Stuff(@Temp, PatIndex(@KeepValues, @Temp), 1, '')

    Return @Temp

Open in new window

charkerrAuthor Commented:
So far, a manual search and replace is sounding much simpler.  Typically the data is just a name in alphanumeric followed by the name in Chinese characters, like:

Kobe Huang ¿¿¿

I'd be willing to do the work in Excel if that makes things simpler.

charkerrAuthor Commented:
Haha... those symbols were Chinese characters until I hit submit... :)
OK, so you want an Excel VBA solution, not a SQL one?

How about the following:

Function RemoveNonAlphaNumerics(strOriginal As String) As String 
    Dim intLoop        As Integer
    Dim strCurrentChar As String 
    Dim strProcessed   As String   

    For intLoop = 1 To Len(strOriginal) 
        strCurrentChar = Mid(strOriginal, intLoop, 1) 
        If strCurrentChar Like "[A-Z,a-z,0-9]" Then 
            strProcessed = strProcessed & strCurrentChar
        End If 
    Next intLoop

    RemoveNonAlphaNumerics = strProcessed
End Function

Open in new window

I've attached an example spreadsheet using the above function.  Click the button to call the function that will process all of the cells in Column A from cell A1 until it finds a blank cell.
charkerrAuthor Commented:
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now