BToTheAToTheBABA
asked on
hyphen inside CDATA makes error. xml - not well formed (working in IE, but NOT working in firefox & chrome)
not well-formed
Line 132
Error in ajax response xml.
In the mentioned line-132 the CDATA contains a hyphen. (I need that hyphen. I cant remove it)
<?xml version="1.0" encoding="UTF-8"?>
<response>
<status>1</status>
<results>
<product>
<productItemId>457865</productItemId>
<productItemName><![CDATA[Whatever this is]]></productItemName>
<productPrice>1460</productPrice>
<productItemCode><![CDATA[AGAINSOMETHING]]></productItemCode>
<productType>1</productType>
</product>
<product>
<productItemId>457865</productItemId>
<productItemName><![CDATA[Whatever this is]]></productItemName>
<productPrice>1460</productPrice>
<productItemCode><![CDATA[Error-is-in-this-line.]]></productItemCode>
<productType>1</productType>
</product>
</results>
</response>
Where is line 132? Could you please post all the XML?
This works perfectly, parsing the XML into an object in PHP. So whatever is processing your AJAX response is where the problem lies. There is really nothing wrong with the XML - the error message appears to be a false positive.
Best regards, ~Ray
Best regards, ~Ray
<?php // RAY_temp_XML_example_48.php
error_reporting(E_ALL);
echo "<pre>\n";
// TEST DATA FROM THE OP
$xml = '<?xml version="1.0" encoding="UTF-8"?>
<response>
<status>1</status>
<results>
<product>
<productItemId>457865</productItemId>
<productItemName><![CDATA[Whatever this is]]></productItemName>
<productPrice>1460</productPrice>
<productItemCode><![CDATA[AGAINSOMETHING]]></productItemCode>
<productType>1</productType>
</product>
<product>
<productItemId>457865</productItemId>
<productItemName><![CDATA[Whatever this is]]></productItemName>
<productPrice>1460</productPrice>
<productItemCode><![CDATA[Error-is-in-this-line.]]></productItemCode>
<productType>1</productType>
</product>
</results>
</response>';
// MAKE AN OBJECT
$obj = SimpleXML_Load_String($xml);
// VISUALIZE THE OBJECT
var_dump($obj);
ASKER
I used a xml parser and it said the following error.
===> An invalid XML character (Unicode: 0xb) was found in the CDATA section.
===> An invalid XML character (Unicode: 0xb) was found in the CDATA section.
0xb is the value for VT (Vertical Tab) and has nothing to do with a hyphen. When you posted the XML, E-E drops such characters so we can't see them. Please post the actual file with the "Attach file" option.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thats great. Ive thought about doing something like this before but there are a few differences.
The input to the function "return_only clean_xml_chars()" is from database and the 'cleaned' output is sent to the browser. i.e The input is not from request.(not a big difference though)
"known good values" are within the range of data in a particular column in database table.
But unfortunately Im working in a japanese environment where most of utf-8 chars are used.
All possible unclean data exist in that huge database.
But its fine (in this case) to ignore the unclean data & sending the readables only to the output.
What If some more bad data like 0xb exists which wasnt processed by this strip_invalid_xml_chars function. I think I need to work on it again when i found that data.(as you've mentioned)
Any other suggestions and advice appreciated.
The input to the function "return_only clean_xml_chars()" is from database and the 'cleaned' output is sent to the browser. i.e The input is not from request.(not a big difference though)
"known good values" are within the range of data in a particular column in database table.
But unfortunately Im working in a japanese environment where most of utf-8 chars are used.
All possible unclean data exist in that huge database.
But its fine (in this case) to ignore the unclean data & sending the readables only to the output.
What If some more bad data like 0xb exists which wasnt processed by this strip_invalid_xml_chars function. I think I need to work on it again when i found that data.(as you've mentioned)
Any other suggestions and advice appreciated.
Well, the choice of input - browser request versus data base - is pretty much limited by my ability to demonstrate a concept here. I don't have your data base, so I can only use what I know we both have, and that's usually browser input.
In addition to handwritten filters like what I have shown above, there is the PHP filter_var()
http://us.php.net/manual/en/function.filter-var.php
All of the PHP filter functions seem to be a little bit "in their infancy" but as the use of these gains popularity, you can be sure that they will mature, and that they will be the fastest way to clean up external data. More here:
http://us.php.net/manual/en/book.filter.php
And finally, UTF-8 support has its share of quirks and OS-dependencies. Maybe some of these links will help. ;-)
http://lmgtfy.comq=php+regex+utf8
Best regards, ~Ray
In addition to handwritten filters like what I have shown above, there is the PHP filter_var()
http://us.php.net/manual/en/function.filter-var.php
All of the PHP filter functions seem to be a little bit "in their infancy" but as the use of these gains popularity, you can be sure that they will mature, and that they will be the fastest way to clean up external data. More here:
http://us.php.net/manual/en/book.filter.php
And finally, UTF-8 support has its share of quirks and OS-dependencies. Maybe some of these links will help. ;-)
http://lmgtfy.comq=php+regex+utf8
Best regards, ~Ray
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
>
>But my data contains ONLY numbers and hyphen at that field.
>
As mentioned i believe that ignored char could be a zenkaku-hyphen(japanese) or simple hyphen or a number. Ignoring looks fine in my case.
>But my data contains ONLY numbers and hyphen at that field.
>
As mentioned i believe that ignored char could be a zenkaku-hyphen(japanese) or simple hyphen or a number. Ignoring looks fine in my case.
How does the problem actually occur?