Character encoding problem

I am trying to write some foreign characters into an xml file but there seems to be a problem encoding some of the characters. I am using UTF-8 for the XML header.
Following characters work fine.

ÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜàáâãäåæç

but the following characters œ c a e l n s z z
they gets converted into the following html when writing into the XML.
&#263 &#261 &#281 &#322 &#324 &#347 &#378 &#380

How can I write them in the exact format?

Thanks.
HerciAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Ray PaseurCommented:
You may find some ideas in this article.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_11880-Unicode-PHP-and-Character-Collisions.html

For us to offer any specific help, we would need to see the test data set and see how it interacts with the program code that creates the XML document.  The numeric character entities would seem to be good "visually" when the XML is rendered by a browser, but there is nothing that inherently changes the UTF-8 characters into numeric entities without a specific programmatic step.
0
Slick812Commented:
greetings Herci, , unfortunately for you, this problem with your "foreign characters"  may not be something you can solve without some awkward changes in using PHP, I will express my opinion that just having the "UTF-8  header" in any document almost never solves any problems with "foreign characters" if they they are "TWOBYTE" characters (multi-byte), PHP used in the English language is set up to only use single byte characters, although there is the PHP multi-byte strings and functions, you can see some of the functions from Manual here -
      http://php.net/manual/en/ref.mbstring.php
at the top o that page it says - "Multibyte character encoding schemes and their related issues are fairly complicated, ", so the issues for this are many times difficult to deal with.
When I see this question in my browser it says that the last two characters are -
&#378 &#380

and yet in my browser I see them as two English language "z", , so I see this -
       but the following characters  œ c a e l n s z z
so the NUMBERS in  &#378 &#380 show me that these are Multibyte characters, as the single byte can NOT GO ABOVE 255 as    &#255
I would think that these -
    &#263 &#261 &#281 &#322 &#324 &#347 &#378 &#380
where sent up from a post from a form, and that post translated the multi-byte characters to the decimal HTML equivalents,
but either way these HTML as &#347 can NOT be set into single byte character sets (english, french).
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Ray PaseurCommented:
To try to shed a little more light on it, here are links to two scripts.  The scripts are identical, except that one of the scripts is stored in ANSI and the other is stored in UTF-8.  As you can see, they produce different output.  Single-byte characters above code point 127 are not valid UTF-8, and all of these "special" characters are above that code point in ANSI.  So in UTF-8 they have to be represented by a multi-byte character.

http://www.laprbass.com/RAY_temp_herci_ansi.php
http://www.laprbass.com/RAY_temp_herci_utf8.php

You might try copying the utf8 version of script and adding a meta tag to tell the browser that you've got UTF-8 output.

<?php // RAY_temp_herci_ansi.php
ini_set('display_errors', TRUE);
error_reporting(E_ALL);
echo '<pre>';


// SEE http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_28354854.html
// THIS VERSION OF THE SCRIPT IS CREATED IN UTF8 AND STORED IN UTF8


$str = 'ÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜàáâãäåæç';

hexdump($str);

// SHOW A SHORT HEX STRING BYTE-BY-BYTE
function hexdump($str, $br=PHP_EOL)
{
    if (empty($str)) return FALSE;

    // GET THE HEX BYTE VALUES IN A STRING
    $hex = str_split(implode(NULL, unpack('H*', $str)));

    // ALLOCATE BYTES INTO HI AND LO NIBBLES
    $hi  = NULL;
    $lo  = NULL;
    $mod = 0;
    foreach ($hex as $nib)
    {
        $mod++;
        $mod = $mod % 2;
        if ($mod)
        {
            $hi .= $nib;
        }
        else
        {
            $lo .= $nib;
        }
    }

    // SHOW THE SCALE, THE STRING AND THE HEX
    $num = substr('1...5...10...15...20...25...30...35...40...45...50...55...60...65...70...75...80...85...90...95..100..105..110..115..120..125..130', 0, strlen($str));
    echo $br . $num;
    echo $br . $str;
    echo $br . $hi;
    echo $br . $lo;
    echo $br;
}

Open in new window

<?php // RAY_temp_herci_utf8.php
ini_set('display_errors', TRUE);
error_reporting(E_ALL);
echo '<pre>';


// SEE http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_28354854.html
// THIS VERSION OF THE SCRIPT IS CREATED IN ANSI AND STORED IN ANSI


$str = 'ÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜàáâãäåæç';

hexdump($str);

// SHOW A SHORT HEX STRING BYTE-BY-BYTE
function hexdump($str, $br=PHP_EOL)
{
    if (empty($str)) return FALSE;

    // GET THE HEX BYTE VALUES IN A STRING
    $hex = str_split(implode(NULL, unpack('H*', $str)));

    // ALLOCATE BYTES INTO HI AND LO NIBBLES
    $hi  = NULL;
    $lo  = NULL;
    $mod = 0;
    foreach ($hex as $nib)
    {
        $mod++;
        $mod = $mod % 2;
        if ($mod)
        {
            $hi .= $nib;
        }
        else
        {
            $lo .= $nib;
        }
    }

    // SHOW THE SCALE, THE STRING AND THE HEX
    $num = substr('1...5...10...15...20...25...30...35...40...45...50...55...60...65...70...75...80...85...90...95..100..105..110..115..120..125..130', 0, strlen($str));
    echo $br . $num;
    echo $br . $str;
    echo $br . $hi;
    echo $br . $lo;
    echo $br;
}

Open in new window

0
HerciAuthor Commented:
I've still not figured out a solution for this yet and that's why it took a long time to give an update. I've decided to close this question but I will keep your answers in mind and carry on doing further research on this. Thanks a lot.
0
Ray PaseurCommented:
A month and a half?  And you still could not respond, then you gave a bad grade?  What were you expecting?  Please read the grading guidelines then explain why you gave the bad grade without any response or explanation!  Nobody does this at EE.  What was wrong?
http://support.experts-exchange.com/customer/portal/articles/481419
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.