Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Character encoding problem

Posted on 2014-02-03
5
Medium Priority
?
307 Views
Last Modified: 2014-04-01
I am trying to write some foreign characters into an xml file but there seems to be a problem encoding some of the characters. I am using UTF-8 for the XML header.
Following characters work fine.

ÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜàáâãäåæç

but the following characters œ c a e l n s z z
they gets converted into the following html when writing into the XML.
&#263 &#261 &#281 &#322 &#324 &#347 &#378 &#380

How can I write them in the exact format?

Thanks.
0
Comment
Question by:Herci
  • 3
5 Comments
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 39829190
You may find some ideas in this article.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_11880-Unicode-PHP-and-Character-Collisions.html

For us to offer any specific help, we would need to see the test data set and see how it interacts with the program code that creates the XML document.  The numeric character entities would seem to be good "visually" when the XML is rendered by a browser, but there is nothing that inherently changes the UTF-8 characters into numeric entities without a specific programmatic step.
0
 
LVL 34

Accepted Solution

by:
Slick812 earned 750 total points
ID: 39830057
greetings Herci, , unfortunately for you, this problem with your "foreign characters"  may not be something you can solve without some awkward changes in using PHP, I will express my opinion that just having the "UTF-8  header" in any document almost never solves any problems with "foreign characters" if they they are "TWOBYTE" characters (multi-byte), PHP used in the English language is set up to only use single byte characters, although there is the PHP multi-byte strings and functions, you can see some of the functions from Manual here -
      http://php.net/manual/en/ref.mbstring.php
at the top o that page it says - "Multibyte character encoding schemes and their related issues are fairly complicated, ", so the issues for this are many times difficult to deal with.
When I see this question in my browser it says that the last two characters are -
&#378 &#380

and yet in my browser I see them as two English language "z", , so I see this -
       but the following characters  œ c a e l n s z z
so the NUMBERS in  &#378 &#380 show me that these are Multibyte characters, as the single byte can NOT GO ABOVE 255 as    &#255
I would think that these -
    &#263 &#261 &#281 &#322 &#324 &#347 &#378 &#380
where sent up from a post from a form, and that post translated the multi-byte characters to the decimal HTML equivalents,
but either way these HTML as &#347 can NOT be set into single byte character sets (english, french).
0
 
LVL 111

Assisted Solution

by:Ray Paseur
Ray Paseur earned 750 total points
ID: 39830216
To try to shed a little more light on it, here are links to two scripts.  The scripts are identical, except that one of the scripts is stored in ANSI and the other is stored in UTF-8.  As you can see, they produce different output.  Single-byte characters above code point 127 are not valid UTF-8, and all of these "special" characters are above that code point in ANSI.  So in UTF-8 they have to be represented by a multi-byte character.

http://www.laprbass.com/RAY_temp_herci_ansi.php
http://www.laprbass.com/RAY_temp_herci_utf8.php

You might try copying the utf8 version of script and adding a meta tag to tell the browser that you've got UTF-8 output.

<?php // RAY_temp_herci_ansi.php
ini_set('display_errors', TRUE);
error_reporting(E_ALL);
echo '<pre>';


// SEE http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_28354854.html
// THIS VERSION OF THE SCRIPT IS CREATED IN UTF8 AND STORED IN UTF8


$str = 'ÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜàáâãäåæç';

hexdump($str);

// SHOW A SHORT HEX STRING BYTE-BY-BYTE
function hexdump($str, $br=PHP_EOL)
{
    if (empty($str)) return FALSE;

    // GET THE HEX BYTE VALUES IN A STRING
    $hex = str_split(implode(NULL, unpack('H*', $str)));

    // ALLOCATE BYTES INTO HI AND LO NIBBLES
    $hi  = NULL;
    $lo  = NULL;
    $mod = 0;
    foreach ($hex as $nib)
    {
        $mod++;
        $mod = $mod % 2;
        if ($mod)
        {
            $hi .= $nib;
        }
        else
        {
            $lo .= $nib;
        }
    }

    // SHOW THE SCALE, THE STRING AND THE HEX
    $num = substr('1...5...10...15...20...25...30...35...40...45...50...55...60...65...70...75...80...85...90...95..100..105..110..115..120..125..130', 0, strlen($str));
    echo $br . $num;
    echo $br . $str;
    echo $br . $hi;
    echo $br . $lo;
    echo $br;
}

Open in new window

<?php // RAY_temp_herci_utf8.php
ini_set('display_errors', TRUE);
error_reporting(E_ALL);
echo '<pre>';


// SEE http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_28354854.html
// THIS VERSION OF THE SCRIPT IS CREATED IN ANSI AND STORED IN ANSI


$str = 'ÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜàáâãäåæç';

hexdump($str);

// SHOW A SHORT HEX STRING BYTE-BY-BYTE
function hexdump($str, $br=PHP_EOL)
{
    if (empty($str)) return FALSE;

    // GET THE HEX BYTE VALUES IN A STRING
    $hex = str_split(implode(NULL, unpack('H*', $str)));

    // ALLOCATE BYTES INTO HI AND LO NIBBLES
    $hi  = NULL;
    $lo  = NULL;
    $mod = 0;
    foreach ($hex as $nib)
    {
        $mod++;
        $mod = $mod % 2;
        if ($mod)
        {
            $hi .= $nib;
        }
        else
        {
            $lo .= $nib;
        }
    }

    // SHOW THE SCALE, THE STRING AND THE HEX
    $num = substr('1...5...10...15...20...25...30...35...40...45...50...55...60...65...70...75...80...85...90...95..100..105..110..115..120..125..130', 0, strlen($str));
    echo $br . $num;
    echo $br . $str;
    echo $br . $hi;
    echo $br . $lo;
    echo $br;
}

Open in new window

0
 

Author Closing Comment

by:Herci
ID: 39968004
I've still not figured out a solution for this yet and that's why it took a long time to give an update. I've decided to close this question but I will keep your answers in mind and carry on doing further research on this. Thanks a lot.
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 39968258
A month and a half?  And you still could not respond, then you gave a bad grade?  What were you expecting?  Please read the grading guidelines then explain why you gave the bad grade without any response or explanation!  Nobody does this at EE.  What was wrong?
http://support.experts-exchange.com/customer/portal/articles/481419
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article discusses how to create an extensible mechanism for linked drop downs.
Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
The viewer will learn the basics of jQuery including how to code hide show and toggles. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery…
Suggested Courses

876 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question