• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1207
  • Last Modified:

Special Characters in UTF8

Hi there,

I'm having some issues with converting 'html' utf8 charcters to 'xml' style. So, for instance, I want to convert á to á because at present, I'm getting the error 'Entity 'aacute' not defined in Entity' from the DomDocument LoadXML function.

When I do a simple str_replace("á", "á", $xml) it works, no errors. So, I found a list of special characters are their codes (here) and built them into a mysql table.
 Screenshot of MySQL table
From there, I built two arrays and populated them like so:
	$srch=array();
	$fnd=array();
	$qry=$db->Execute("SELECT * FROM cfg_utf8");
	while($utf=$qry->FetchRow()) {
		$srch[]=htmlspecialchars($utf['html'], ENT_QUOTES);
		$fnd[]=htmlspecialchars($utf['xml'], ENT_QUOTES);
	}

Open in new window


From there, according to the examples given on php.net, I should be able to the same str_replace with the $srch and $fnd arrays to achieve the same result I get when I do it for a one-off. However, it doesn't work. I get the same message as usual whcih makes me think that the str_replace isn't working (as it still mentions 'aacute' which should have been translated).

Can anyone spot where I'm going wrong?

Thanks,
John
0
worldofwires
Asked:
worldofwires
  • 5
  • 4
  • 2
1 Solution
 
worldofwiresAuthor Commented:
I've got a workaround which will do me for now. I created the two columns of data from the link in teh post in an array wihtout using teh databse table. That works fine so it's something to do with the way it retrieves the data from the databse (so probably the htmlspecialchars function).

I'll leave it open to see if anyone can spot the issue with the DB method.
0
 
Lukasz ChmielewskiCommented:
There's missing ";" in the xml's character in the first row. I'm working on the rest.
0
 
Lukasz ChmielewskiCommented:
How do you use your str_replace ?
This seems to be working:

                // your code
	while($utf=$qry->FetchRow()) {
		$srch[]=htmlspecialchars($utf['html'], ENT_QUOTES);
		$fnd[]=htmlspecialchars($utf['xml'], ENT_QUOTES);
	}

	$test = str_replace($srch,$fnd,$srch);
	print_r($test);

Open in new window

0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Ray PaseurCommented:
You might want to read up on this issue here.   The article is old, but the problem seems to be an enduring one!
http://www.joelonsoftware.com/articles/Unicode.html

I have used this to let "westernized" characters survive in the UTF-8 environment.  Maybe it will help you with your thinking about how to solve the problem.  You could change the $normal array to use numeric entities instead of my pidgin-language character set.

HTH, ~Ray
<?php // RAY_westernize_letters.php
error_reporting(E_ALL);


// DEMONSTRATE HOW TO TRANSLATE SOME WESTERN CHARACTERS INTO ENGLISH-PRINTABLE


// TEST CASES
$arr
= array
( 'Françoise'
, 'ßeta or Beta?'
, 'ENCYCLOPÆDIA'
, 'ça va! mon élève mi niña?'
, 'A stealthy ƒart'
, 'Jean "Ðango" Reinhardt of Pont-à-Celles'
)
;

// DISPLAY EACH TEST CASE
foreach ($arr as $str)
{
    echo PHP_EOL
    . '<br/>'
    . $str
    . ' = '
    . '<strong>'
    . mungstring($str)
    . '</strong>'
    ;
}

// EXAMPLE SHOWING HOW TO TURN A PORTUGESE NAME INTO PART OF A URL STRING
$str = 'Armação de Pêra';
$new = mungString($str);
$new = strtolower($new);
$new = str_replace(' ', '-', $new);

// SHOW THE URL STRING
echo PHP_EOL
. '<br/>'
. '<strong>'
. '<a target="blank" href="http://lmgtfy.com?q='
. htmlentities(mungstring($new))
. '">'
. $str
. '</a>'
. '</strong>'
;

// A FUNCTION TO RETURN THE WESTERNIZED STRING
function mungString($str, $return='TEXT')
{
    // OUR REPLACEMENT ARRAY (MAY WANT SOME CHANGES HERE)
    static
    $normal
    = array
    ( 'ƒ' => 'f'  // http://en.wikipedia.org/wiki/%C6%91 florin
    , 'Š' => 'S'  // http://en.wikipedia.org/wiki/%C5%A0 S-caron (voiceless postalveolar fricative)
    , 'š' => 's'  // http://en.wikipedia.org/wiki/%C5%A0 s-caron
    , 'Ð' => 'Dj' // http://en.wikipedia.org/wiki/Eth (voiced dental fricative)
    , 'Ž' => 'Z'  // http://en.wikipedia.org/wiki/%C5%BD Z-caron (voiced postalveolar fricative)
    , 'ž' => 'z'  // http://en.wikipedia.org/wiki/%C5%BD z-caron
    , 'À' => 'A'
    , 'Á' => 'A'
    , 'Â' => 'A'
    , 'Ã' => 'A'
    , 'Ä' => 'A'
    , 'Å' => 'A'
    , 'Æ' => 'E'
    , 'Ç' => 'C'
    , 'È' => 'E'
    , 'É' => 'E'
    , 'Ê' => 'E'
    , 'Ë' => 'E'
    , 'Ì' => 'I'
    , 'Í' => 'I'
    , 'Î' => 'I'
    , 'Ï' => 'I'
    , 'Ñ' => 'N'
    , 'Ò' => 'O'
    , 'Ó' => 'O'
    , 'Ô' => 'O'
    , 'Õ' => 'O'
    , 'Ö' => 'O'
    , 'Ø' => 'O'
    , 'Ù' => 'U'
    , 'Ú' => 'U'
    , 'Û' => 'U'
    , 'Ü' => 'U'
    , 'Ý' => 'Y'
    , 'Þ' => 'B'
    , 'ß' => 'Ss'
    , 'à' => 'a'
    , 'á' => 'a'
    , 'â' => 'a'
    , 'ã' => 'a'
    , 'ä' => 'a'
    , 'å' => 'a'
    , 'æ' => 'e'
    , 'ç' => 'c'
    , 'è' => 'e'
    , 'é' => 'e'
    , 'ê' => 'e'
    , 'ë' => 'e'
    , 'ì' => 'i'
    , 'í' => 'i'
    , 'î' => 'i'
    , 'ï' => 'i'
    , 'ð' => 'o'
    , 'ñ' => 'n'
    , 'ò' => 'o'
    , 'ó' => 'o'
    , 'ô' => 'o'
    , 'õ' => 'o'
    , 'ö' => 'o'
    , 'ø' => 'o'
    , 'ù' => 'u'
    , 'ú' => 'u'
    , 'û' => 'u'
    , 'ý' => 'y'
    , 'ý' => 'y'
    , 'þ' => 'b'
    , 'ÿ' => 'y'
    )
    ;
    // RETURN THE "TRANSLATED" TEXT
    if ($return == 'TEXT') return strtr($str, $normal);

    // MIGHT BE USEFUL TO GET THE LIST OF ORIGINAL LETTERS
    return array_keys($normal);
}

Open in new window

0
 
worldofwiresAuthor Commented:
Thank you both for your responses, apologies for my tardy reply. Roads, I've got it working with the arrays when the array's are declared in the PHP script. It's when I drag the values from the SQL table that things go awry. In case anyone wants to use the arrays, I've included them below:
$arr=array("html"=>array(), "xml"=>array());

	$arr['html'][]="&quot;";
	$arr['html'][]="&amp;";
	$arr['html'][]="&lt;";
	$arr['html'][]="&gt;";
	$arr['html'][]="&nbsp;";
	$arr['html'][]="&iexcl;";
	$arr['html'][]="&cent;";
	$arr['html'][]="&pound;";
	$arr['html'][]="&curren;";
	$arr['html'][]="&yen;";
	$arr['html'][]="&brvbar;";
	$arr['html'][]="&sect;";
	$arr['html'][]="&uml;";
	$arr['html'][]="&copy;";
	$arr['html'][]="&ordf;";
	$arr['html'][]="&laquo;";
	$arr['html'][]="&not;";
	$arr['html'][]="&shy;";
	$arr['html'][]="&reg;";
	$arr['html'][]="&macr;";
	$arr['html'][]="&deg;";
	$arr['html'][]="&plusmn;";
	$arr['html'][]="&sup2;";
	$arr['html'][]="&sup3;";
	$arr['html'][]="&acute;";
	$arr['html'][]="&micro;";
	$arr['html'][]="&para;";
	$arr['html'][]="&middot;";
	$arr['html'][]="&cedil;";
	$arr['html'][]="&sup1;";
	$arr['html'][]="&ordm;";
	$arr['html'][]="&raquo;";
	$arr['html'][]="&frac14;";
	$arr['html'][]="&frac12;";
	$arr['html'][]="&frac34;";
	$arr['html'][]="&iquest;";
	$arr['html'][]="&Agrave;";
	$arr['html'][]="&Aacute;";
	$arr['html'][]="&Acirc;";
	$arr['html'][]="&Atilde;";
	$arr['html'][]="&Auml;";
	$arr['html'][]="&Aring;";
	$arr['html'][]="&AElig;";
	$arr['html'][]="&Ccedil;";
	$arr['html'][]="&Egrave;";
	$arr['html'][]="&Eacute;";
	$arr['html'][]="&Ecirc;";
	$arr['html'][]="&Euml;";
	$arr['html'][]="&Igrave;";
	$arr['html'][]="&Iacute;";
	$arr['html'][]="&Icirc;";
	$arr['html'][]="&Iuml;";
	$arr['html'][]="&ETH;";
	$arr['html'][]="&Ntilde;";
	$arr['html'][]="&Ograve;";
	$arr['html'][]="&Oacute;";
	$arr['html'][]="&Ocirc;";
	$arr['html'][]="&Otilde;";
	$arr['html'][]="&Ouml;";
	$arr['html'][]="&times;";
	$arr['html'][]="&Oslash;";
	$arr['html'][]="&Ugrave;";
	$arr['html'][]="&Uacute;";
	$arr['html'][]="&Ucirc;";
	$arr['html'][]="&Uuml;";
	$arr['html'][]="&Yacute;";
	$arr['html'][]="&THORN;";
	$arr['html'][]="&szlig;";
	$arr['html'][]="&agrave;";
	$arr['html'][]="&aacute;";
	$arr['html'][]="&acirc;";
	$arr['html'][]="&atilde;";
	$arr['html'][]="&auml;";
	$arr['html'][]="&aring;";
	$arr['html'][]="&aelig;";
	$arr['html'][]="&ccedil;";
	$arr['html'][]="&egrave;";
	$arr['html'][]="&eacute;";
	$arr['html'][]="&ecirc;";
	$arr['html'][]="&euml;";
	$arr['html'][]="&igrave;";
	$arr['html'][]="&iacute;";
	$arr['html'][]="&icirc;";
	$arr['html'][]="&iuml;";
	$arr['html'][]="&eth;";
	$arr['html'][]="&ntilde;";
	$arr['html'][]="&ograve;";
	$arr['html'][]="&oacute;";
	$arr['html'][]="&ocirc;";
	$arr['html'][]="&otilde;";
	$arr['html'][]="&ouml;";
	$arr['html'][]="&divide;";
	$arr['html'][]="&oslash;";
	$arr['html'][]="&ugrave;";
	$arr['html'][]="&uacute;";
	$arr['html'][]="&ucirc;";
	$arr['html'][]="&uuml;";
	$arr['html'][]="&yacute;";
	$arr['html'][]="&thorn;";
	$arr['html'][]="&yuml;";
	$arr['html'][]="&euro;";

	$arr['xml'][]="&#34;";
	$arr['xml'][]="&#38;";
	$arr['xml'][]="&#60;";
	$arr['xml'][]="&#62;";
	$arr['xml'][]="&#160;";
	$arr['xml'][]="&#161;";
	$arr['xml'][]="&#162;";
	$arr['xml'][]="&#163;";
	$arr['xml'][]="&#164;";
	$arr['xml'][]="&#165;";
	$arr['xml'][]="&#166;";
	$arr['xml'][]="&#167;";
	$arr['xml'][]="&#168;";
	$arr['xml'][]="&#169;";
	$arr['xml'][]="&#170;";
	$arr['xml'][]="&#171;";
	$arr['xml'][]="&#172;";
	$arr['xml'][]="&#173;";
	$arr['xml'][]="&#174;";
	$arr['xml'][]="&#175;";
	$arr['xml'][]="&#176;";
	$arr['xml'][]="&#177;";
	$arr['xml'][]="&#178;";
	$arr['xml'][]="&#179;";
	$arr['xml'][]="&#180;";
	$arr['xml'][]="&#181;";
	$arr['xml'][]="&#182;";
	$arr['xml'][]="&#183;";
	$arr['xml'][]="&#184;";
	$arr['xml'][]="&#185;";
	$arr['xml'][]="&#186;";
	$arr['xml'][]="&#187;";
	$arr['xml'][]="&#188;";
	$arr['xml'][]="&#189;";
	$arr['xml'][]="&#190;";
	$arr['xml'][]="&#191;";
	$arr['xml'][]="&#192;";
	$arr['xml'][]="&#193;";
	$arr['xml'][]="&#194;";
	$arr['xml'][]="&#195;";
	$arr['xml'][]="&#196;";
	$arr['xml'][]="&#197;";
	$arr['xml'][]="&#198;";
	$arr['xml'][]="&#199;";
	$arr['xml'][]="&#200;";
	$arr['xml'][]="&#201;";
	$arr['xml'][]="&#202;";
	$arr['xml'][]="&#203;";
	$arr['xml'][]="&#204;";
	$arr['xml'][]="&#205;";
	$arr['xml'][]="&#206;";
	$arr['xml'][]="&#207;";
	$arr['xml'][]="&#208;";
	$arr['xml'][]="&#209;";
	$arr['xml'][]="&#210;";
	$arr['xml'][]="&#211;";
	$arr['xml'][]="&#212;";
	$arr['xml'][]="&#213;";
	$arr['xml'][]="&#214;";
	$arr['xml'][]="&#215;";
	$arr['xml'][]="&#216;";
	$arr['xml'][]="&#217;";
	$arr['xml'][]="&#218;";
	$arr['xml'][]="&#219;";
	$arr['xml'][]="&#220;";
	$arr['xml'][]="&#221;";
	$arr['xml'][]="&#222;";
	$arr['xml'][]="&#223;";
	$arr['xml'][]="&#224;";
	$arr['xml'][]="&#225;";
	$arr['xml'][]="&#226;";
	$arr['xml'][]="&#227;";
	$arr['xml'][]="&#228;";
	$arr['xml'][]="&#229;";
	$arr['xml'][]="&#230;";
	$arr['xml'][]="&#231;";
	$arr['xml'][]="&#232;";
	$arr['xml'][]="&#233;";
	$arr['xml'][]="&#234;";
	$arr['xml'][]="&#235;";
	$arr['xml'][]="&#236;";
	$arr['xml'][]="&#237;";
	$arr['xml'][]="&#238;";
	$arr['xml'][]="&#239;";
	$arr['xml'][]="&#240;";
	$arr['xml'][]="&#241;";
	$arr['xml'][]="&#242;";
	$arr['xml'][]="&#243;";
	$arr['xml'][]="&#244;";
	$arr['xml'][]="&#245;";
	$arr['xml'][]="&#246;";
	$arr['xml'][]="&#247;";
	$arr['xml'][]="&#248;";
	$arr['xml'][]="&#249;";
	$arr['xml'][]="&#250;";
	$arr['xml'][]="&#251;";
	$arr['xml'][]="&#252;";
	$arr['xml'][]="&#253;";
	$arr['xml'][]="&#254;";
	$arr['xml'][]="&#255;";
	$arr['xml'][]="&#8364;";

Open in new window


You're right about the missing semi-colon, thanks for that. I've corected it but it wasn't causing an issue.

Ray, I'm not really wanting to westernise the text, I just want the special formatting to survive into the XML. That array that you pasted will be very useful in testing the str_replace which uses the above arrays. Thanks for your post.
0
 
worldofwiresAuthor Commented:
Excellent link too Ray, thanks for that.
0
 
Ray PaseurCommented:
Check this post.  I will try to come up with a code snippet that would be helpful to you.
http://us3.php.net/manual/en/function.ord.php#103277
0
 
Ray PaseurCommented:
Install this and run it, then do "view source" (or just look at the source from my site here).
http://www.laprbass.com/RAY_entitize_western_letters.php

You might find a more efficient way to handle this issue.  For example, you might just test each character to see if its ord() was 128 or above, and entitize those with numerical values.  Or you might find that UTF-8 is not the encoding you want to use.

Best regards, ~Ray
<?php // RAY_entitize_western_letters.php
error_reporting(E_ALL);


// DEMONSTRATE HOW TO TRANSLATE SOME WESTERN CHARACTERS INTO ENGLISH-PRINTABLE OR ENTITIES


// TEST CASES
$arr
= array
( 'Françoise'
, 'ßeta or Beta?'
, 'ENCYCLOPÆDIA'
, 'ça va! mon élève mi niña?'
, 'A stealthy ƒart'
, 'Jean "Ðango" Reinhardt of Pont-à-Celles'
)
;

// DISPLAY EACH TEST CASE
foreach ($arr as $str)
{
    echo PHP_EOL
    . '<br/>'
    . $str
    . ' = '
    . '<strong>'
    . mungstring($str)
    . '</strong>'
    ;
}

// EXAMPLE SHOWING HOW TO TURN A PORTUGESE NAME INTO PART OF A URL STRING
$str = 'Armação de Pêra';
$new = mungString($str);
$new = strtolower($new);
$new = str_replace(' ', '-', $new);

// SHOW THE URL STRING
echo PHP_EOL
. '<br/>'
. '<strong>'
. '<a target="blank" href="http://lmgtfy.com?q='
. htmlentities(mungstring($new))
. '">'
. $str
. '</a>'
. '</strong>'
;


$str = 'Armação de Pêra';
$new = mungString($str, 'FOO');
echo "<pre>";
foreach ($new as $chr)
{
    echo PHP_EOL . $chr . '=' . '&#' . ord($chr) . ';' ;
}

// A FUNCTION TO RETURN THE WESTERNIZED STRING
function mungString($str, $return='TEXT')
{
    // OUR REPLACEMENT ARRAY (MAY WANT SOME CHANGES HERE)
    static
    $normal
    = array
    ( 'ƒ' => 'f'  // http://en.wikipedia.org/wiki/%C6%91 florin
    , 'Š' => 'S'  // http://en.wikipedia.org/wiki/%C5%A0 S-caron (voiceless postalveolar fricative)
    , 'š' => 's'  // http://en.wikipedia.org/wiki/%C5%A0 s-caron
    , 'Ð' => 'Dj' // http://en.wikipedia.org/wiki/Eth (voiced dental fricative)
    , 'Ž' => 'Z'  // http://en.wikipedia.org/wiki/%C5%BD Z-caron (voiced postalveolar fricative)
    , 'ž' => 'z'  // http://en.wikipedia.org/wiki/%C5%BD z-caron
    , 'À' => 'A'
    , 'Á' => 'A'
    , 'Â' => 'A'
    , 'Ã' => 'A'
    , 'Ä' => 'A'
    , 'Å' => 'A'
    , 'Æ' => 'E'
    , 'Ç' => 'C'
    , 'È' => 'E'
    , 'É' => 'E'
    , 'Ê' => 'E'
    , 'Ë' => 'E'
    , 'Ì' => 'I'
    , 'Í' => 'I'
    , 'Î' => 'I'
    , 'Ï' => 'I'
    , 'Ñ' => 'N'
    , 'Ò' => 'O'
    , 'Ó' => 'O'
    , 'Ô' => 'O'
    , 'Õ' => 'O'
    , 'Ö' => 'O'
    , 'Ø' => 'O'
    , 'Ù' => 'U'
    , 'Ú' => 'U'
    , 'Û' => 'U'
    , 'Ü' => 'U'
    , 'Ý' => 'Y'
    , 'Þ' => 'B'
    , 'ß' => 'Ss'
    , 'à' => 'a'
    , 'á' => 'a'
    , 'â' => 'a'
    , 'ã' => 'a'
    , 'ä' => 'a'
    , 'å' => 'a'
    , 'æ' => 'e'
    , 'ç' => 'c'
    , 'è' => 'e'
    , 'é' => 'e'
    , 'ê' => 'e'
    , 'ë' => 'e'
    , 'ì' => 'i'
    , 'í' => 'i'
    , 'î' => 'i'
    , 'ï' => 'i'
    , 'ð' => 'o'
    , 'ñ' => 'n'
    , 'ò' => 'o'
    , 'ó' => 'o'
    , 'ô' => 'o'
    , 'õ' => 'o'
    , 'ö' => 'o'
    , 'ø' => 'o'
    , 'ù' => 'u'
    , 'ú' => 'u'
    , 'û' => 'u'
    , 'ý' => 'y'
    , 'ý' => 'y'
    , 'þ' => 'b'
    , 'ÿ' => 'y'
    )
    ;
    // RETURN THE "TRANSLATED" TEXT
    if ($return == 'TEXT') return strtr($str, $normal);

    // MIGHT BE USEFUL TO GET THE LIST OF ORIGINAL LETTERS
    return array_keys($normal);
}

Open in new window

0
 
Ray PaseurCommented:
This seems to do the trick.  See if it makes sense for your needs, ~Ray
<?php // RAY_entitize_western_letters.php
error_reporting(E_ALL);


// DEMONSTRATE HOW TO TRANSLATE SOME WESTERN CHARACTERS INTO ENGLISH-PRINTABLE OR ENTITIES


// TEST CASES
$arr
= array
( 'Françoise'
, 'ßeta or Beta?'
, 'ENCYCLOPÆDIA'
, 'ça va! mon élève mi niña?'
, 'A stealthy ƒart'
, 'Jean "Ðango" Reinhardt of Pont-à-Celles'
)
;

// DISPLAY EACH TEST CASE
foreach ($arr as $str)
{
    echo PHP_EOL
    . '<br/>'
    . $str
    . ' = '
    . '<strong>'
    . mungstring($str)
    . '</strong>'
    ;
}


// EXAMPLE SHOWING HOW TO TURN A PORTUGESE NAME INTO PART OF A URL STRING
$str = 'Armação de Pêra';
$new = mungString($str);
$new = strtolower($new);
$new = str_replace(' ', '-', $new);

// SHOW THE URL STRING
echo PHP_EOL
. '<br/>'
. '<strong>'
. '<a target="blank" href="http://lmgtfy.com?q='
. htmlentities(mungstring($new))
. '">'
. $str
. '</a>'
. '</strong>'
;


// EXAMPLE SHOWING HOW TO TURN A STRING INTO A NUMERICALLY ENTITIZED STRING
$str = 'Armação de Pêra';
$new = mungString($str, 'ENTITIES');
echo "<pre>";
echo PHP_EOL
. $new
. ' = '
. '<strong>'
. htmlentities($new)
. '</strong>'
;


// A FUNCTION TO RETURN THE WESTERNIZED/ENTITIZED STRING
function mungString($str, $return='TEXT')
{
    // OUR REPLACEMENT ARRAY OF ENTITIES
    static
    $entity
    = array();

    // OUR REPLACEMENT ARRAY OF CHARACTERS (YOU MAY WANT SOME CHANGES HERE)
    static
    $normal
    = array
    ( 'ƒ' => 'f'  // http://en.wikipedia.org/wiki/%C6%91 florin
    , 'Š' => 'S'  // http://en.wikipedia.org/wiki/%C5%A0 S-caron (voiceless postalveolar fricative)
    , 'š' => 's'  // http://en.wikipedia.org/wiki/%C5%A0 s-caron
    , 'Ð' => 'Dj' // http://en.wikipedia.org/wiki/Eth (voiced dental fricative)
    , 'Ž' => 'Z'  // http://en.wikipedia.org/wiki/%C5%BD Z-caron (voiced postalveolar fricative)
    , 'ž' => 'z'  // http://en.wikipedia.org/wiki/%C5%BD z-caron
    , 'À' => 'A'
    , 'Á' => 'A'
    , 'Â' => 'A'
    , 'Ã' => 'A'
    , 'Ä' => 'A'
    , 'Å' => 'A'
    , 'Æ' => 'E'
    , 'Ç' => 'C'
    , 'È' => 'E'
    , 'É' => 'E'
    , 'Ê' => 'E'
    , 'Ë' => 'E'
    , 'Ì' => 'I'
    , 'Í' => 'I'
    , 'Î' => 'I'
    , 'Ï' => 'I'
    , 'Ñ' => 'N'
    , 'Ò' => 'O'
    , 'Ó' => 'O'
    , 'Ô' => 'O'
    , 'Õ' => 'O'
    , 'Ö' => 'O'
    , 'Ø' => 'O'
    , 'Ù' => 'U'
    , 'Ú' => 'U'
    , 'Û' => 'U'
    , 'Ü' => 'U'
    , 'Ý' => 'Y'
    , 'Þ' => 'B'
    , 'ß' => 'Ss'
    , 'à' => 'a'
    , 'á' => 'a'
    , 'â' => 'a'
    , 'ã' => 'a'
    , 'ä' => 'a'
    , 'å' => 'a'
    , 'æ' => 'e'
    , 'ç' => 'c'
    , 'è' => 'e'
    , 'é' => 'e'
    , 'ê' => 'e'
    , 'ë' => 'e'
    , 'ì' => 'i'
    , 'í' => 'i'
    , 'î' => 'i'
    , 'ï' => 'i'
    , 'ð' => 'o'
    , 'ñ' => 'n'
    , 'ò' => 'o'
    , 'ó' => 'o'
    , 'ô' => 'o'
    , 'õ' => 'o'
    , 'ö' => 'o'
    , 'ø' => 'o'
    , 'ù' => 'u'
    , 'ú' => 'u'
    , 'û' => 'u'
    , 'ý' => 'y'
    , 'ý' => 'y'
    , 'þ' => 'b'
    , 'ÿ' => 'y'
    )
    ;
    // RETURN THE "TRANSLATED" TEXT
    if (substr(strtoupper($return),0,1) == 'T') return strtr($str, $normal);

    // RETURN THE "ENTITIZED" TEXT
    if (substr(strtoupper($return),0,1) == 'E')
    {
        if (empty($entity))
        {
            foreach ($normal as $key => $nothing)
            {
                $entity[$key] = '&#' . ord($key) . ';';
            }
        }
        return strtr($str, $entity);
    }

    // MIGHT BE USEFUL TO GET THE LIST OF ORIGINAL LETTERS
    return array_keys($normal);
}

Open in new window

0
 
worldofwiresAuthor Commented:
Hi Ray,

Yes, using Entities through the mungstring function provides the right output. Now I'll see if I can implement it into my project. Thanks for your help on this one, I've learnt a lot about unicode!

John
0
 
Ray PaseurCommented:
Thanks for the points - it's a great question, ~Ray
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 5
  • 4
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now