Solved

How to find (and replace) characters like ö, ü, ä, etc

Posted on 2008-10-13
7
3,066 Views
Last Modified: 2012-05-05
Hi,
Sometimes some of our fields in a SQL Server database contain characters like ö, Ç ä, Ö etc.
One of our applications that has to work with this data really doesn't like these.
I am looking for a way to find these characters; Let's say i want to see all records in the name column of the customer table that contain non-ascii characters like these. How to do that?
And if i'd like to replace these characters with 'normal' characters (like Ö -> O, ä -> a, etc), does anyone know of an elegant way how to do that?

Thanks
0
Comment
Question by:dready
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
7 Comments
 
LVL 60

Expert Comment

by:chapmandew
ID: 22703322
First, you have to find the ASCII codes for the characters...then you can do this:

select ascii('ü')

select replace('aosidmfüasdfom', 'ü', 'X')

with a table it would be:

select replace(fieldvalue', 'ü', '')
from tablename

0
 
LVL 60

Expert Comment

by:chapmandew
ID: 22703340
actually....my example doesn't take the ascii code into account..

0
 
LVL 60

Expert Comment

by:chapmandew
ID: 22703347
so, just omit my first line from my example.
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 69

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 150 total points
ID: 22703984
At the lowest level, you will probably need a function to replace the individual characters.  For example:


create function dbo.replace_extended_char (
    @char char(1)
)
returns char(1)
as
begin
return (
    select case when ascii(@char) < 127 then @char else
           case @char
                when 'Ç' then 'C'
                when 'ö' then 'o'
                when 'ä' then 'a'
                when 'Ö' then 'O'
                else '?' end end
    )
end
go


Then you need "driver code" that sends each extended character (only) to the function.  More on that later if needed :-) .
0
 
LVL 51

Accepted Solution

by:
Mark Wills earned 350 total points
ID: 22712601
Can be a problem to replace... There has been a couple of these in EE, one of which had a reasonable function...

The normal "printable" characters range from 32 to 126 inclusive - before that you have carriage returns, tabs, line feeds etc... It goes up to 255 in the Ansi character set, so, there is potentially 255-126 characters to be checked. Ouch.

The challenge will be what codeset, language, binaries are being used, or, are you assuming just ascii characters and the English language are being used.

In which case, using physical character representations as acperkins does above will work OK...

In which case, first step is to create a character map... Normally create a table for that :


create table uCharMap (AsciiNumber int primary key, AsciiCharacter char(1), Printable char(1))
GO
declare @int int
set @int= 127
while @int < 256
begin
  insert uCharMap (AsciiNumber,AsciiCharacter) values (@int, char(@int))
  set @int = @int + 1
end
GO


then open the table and manually decide the most appropriate characters to substitute (csv is included for one prepared earlier) ...

Then can do the function business (created below) as part of a select, or update or what ever e.g.

select dbo.ufix_characters('ABCDefg hij 1233128¬E134 +140RÈÉÊËÌÍÏÐÑÒÓÔÕÖ×ØÙÚÛÝÞ')



create function uFix_Characters(@incoming varchar(max))
returns varchar(max)
as
begin
declare @AsciiNumber int
declare @c char(1)
declare @p char(1)
declare @i int
set @i = 0
 
if patindex('%[^0-9 ,.";:-=~!@#$%*?()+}{a-zA-Z]%',@incoming) = 0
return @incoming
 
while @i < len(rtrim(@incoming)) 
begin
 
  set @i = @i + 1
  set @asciinumber = ascii(substring(@incoming,@i,1)) 
  if @asciinumber > 126
  begin
     select @c = asciicharacter, @p = printable from ucharmap where asciinumber = @asciinumber
     set @incoming = replace (@incoming,@c,@p)
  end
 
end
 
return @incoming
end
go

Open in new window

ucharmap.csv.txt
0
 

Expert Comment

by:susanys
ID: 26041472
What code should i use exactly (i've created the function and table with extended characters and appropriate replacements).

I want to run it on a table called products, on a field called name for all possible replacements.

Thanks!
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 26041882
Well...

First you test it with

select name as oldname, dbo.ufix_characters(name) as newname from products

then if you are happy...

update products set name = dbo.ufix_characters(name)

Could probably add in a "where" clause - something like a pattern match (patindex) for characters not in the range of 0-9 and A-Z, but if a once off job, just choose a quite time...

But test first !!
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function

617 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question