• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 246
  • Last Modified:

Foreign language conversion

I have a database containing some letters which aren't very good to be used at websites, so I need to convert them to the following:
ø // ø
å // å
æ // æ
and their capitalized versions...

I have a clue, but I don't know how to do it exactly.
My thought was to have a string parameter inserted into the function, and the function would search through the string looking for one or several instances of these letters and then change them, and return the fixed string.

Appreciate every comment,

Gaute Rønningen
Gaute Rønningen
  • 3
1 Solution
Bernard S.CTOCommented:
... there are other options, as displaying web pages with the corect character code, or directly with UTF-8.

Making this decision depends heavily on your problem.
- What is YOUR language? (From your profile, I would assume Norvegian)
- Which are the languages expected to be used on the site, ie which languages do visitors expect/ appreciate to find

Now some technical questions:
- which version of MySQL are you using
- is the "mbstring" (multibyte) libraray available (check with phpinfo).

If Norvegian or "non-pure English" languages are used significantly, I would recommend that you switch to UTF-8 asap. It might be a pain...but lots more if you wait for the site to grow.
I have realised at http://www.mae.u-paris10.fr/limc-france/ a site which uses MySQL 4.0, PHP and handles this character problem in UTF-8, even though I'm not using mbstring.

I would suggest you take this route rather than using the str_replace function, wich is the one you seem to be looking for.
This will work perfectly for you:

$text = 'Søme strånge text.';

function htmlchar($char)
      return '&#' . ord($char) . ';';
$text = preg_replace('/[^\x09\x0A\x0D\x20-\x7F]/e', 'htmlchar("$0")', $text);

To explain what this does:
First of all, not only can you reference 'ø' as 'ø', but also as 'ø' because 248 is the ASCII code for 'ø'. The built-in function ord() gives this ASCII value. So the function we have declared, htmlchar(), changes a character into its HTML character entity equivalent. However, this function does not care whether you input strange characters like 'ø' or basic characters like A through Z. The next line does, though, when we call preg_replace(). The way I have the regex set up, it grabs any single character that is not your standard character (A through Z, 0 through 9, !, @, #, etc.), and calls our function to replace it with &#(num); This will output exactly what you want!
So, in the example I just gave, it would change 'Søme strånge text.' into 'Søme strånge text.', which is a lot more browser-friendly.
In fact, rather than creating an additional function, you could translate the whole thing in one fell swoop like this:

$text = preg_replace('/[^\x09\x0A\x0D\x20-\x7F]/e', '"&#" . ord("$0") . ";"', $text);
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now