Link to home
Start Free TrialLog in
Avatar of BToTheAToTheBABA
BToTheAToTheBABAFlag for United States of America

asked on

hyphen inside CDATA makes error. xml - not well formed (working in IE, but NOT working in firefox & chrome)



not well-formed
Line 132

Error in ajax response xml.

In the mentioned line-132 the CDATA contains a hyphen. (I need that hyphen. I cant remove it)

<?xml version="1.0" encoding="UTF-8"?>
<response>
	<status>1</status>
<results>
		<product>
				<productItemId>457865</productItemId>
				<productItemName><![CDATA[Whatever this is]]></productItemName>
				<productPrice>1460</productPrice>
				<productItemCode><![CDATA[AGAINSOMETHING]]></productItemCode>
				<productType>1</productType>
		</product>
                <product>
				<productItemId>457865</productItemId>
				<productItemName><![CDATA[Whatever this is]]></productItemName>
				<productPrice>1460</productPrice>
				<productItemCode><![CDATA[Error-is-in-this-line.]]></productItemCode>
				<productType>1</productType>
		</product>
</results>
</response>

Open in new window

Avatar of BigRat
BigRat
Flag of France image

I have put your XML into a file and opened it in Mozilla Firefox (3.0.10) and it displays the XML OK. The only odd thing is that the CDATA keywords are missing - either because the content does not contain any markup or their display is different to Microsoft. In IE and other tools I have found no problem, so I don't believe that the problem lies exactly there.

How does the problem actually occur?
Where is line 132?  Could you please post all the XML?
This works perfectly, parsing the XML into an object in PHP.  So whatever is processing your AJAX response is where the problem lies.  There is really nothing wrong with the XML - the error message appears to be a false positive.

Best regards, ~Ray
<?php // RAY_temp_XML_example_48.php
error_reporting(E_ALL);
echo "<pre>\n";
 
// TEST DATA FROM THE OP
$xml = '<?xml version="1.0" encoding="UTF-8"?>
<response>
        <status>1</status>
<results>
                <product>
                                <productItemId>457865</productItemId>
                                <productItemName><![CDATA[Whatever this is]]></productItemName>
                                <productPrice>1460</productPrice>
                                <productItemCode><![CDATA[AGAINSOMETHING]]></productItemCode>
                                <productType>1</productType>
                </product>
                <product>
                                <productItemId>457865</productItemId>
                                <productItemName><![CDATA[Whatever this is]]></productItemName>
                                <productPrice>1460</productPrice>
                                <productItemCode><![CDATA[Error-is-in-this-line.]]></productItemCode>
                                <productType>1</productType>
                </product>
</results>
</response>';
 
// MAKE AN OBJECT
$obj = SimpleXML_Load_String($xml);
 
// VISUALIZE THE OBJECT
var_dump($obj);

Open in new window

Avatar of BToTheAToTheBABA

ASKER

I used a xml parser and it said the following error.

===>  An invalid XML character (Unicode: 0xb) was found in the CDATA section.


0xb is the value for VT (Vertical Tab) and has nothing to do with a hyphen. When you posted the XML, E-E drops such characters so we can't see them. Please post the actual file with the "Attach file" option.
ASKER CERTIFIED SOLUTION
Avatar of BToTheAToTheBABA
BToTheAToTheBABA
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thats great. Ive thought about doing something like this before but there are a few differences.


The input to the function "return_only clean_xml_chars()" is from database and the 'cleaned' output is sent to the browser.  i.e The input is not from request.(not a big difference though)

"known good values" are within the range of data in a particular column in database table.
But unfortunately Im working in a japanese environment where most of utf-8 chars are used.
All possible unclean data exist in that huge database.

But its fine (in this case) to ignore the unclean data & sending the readables only to the output.

What If some more bad data like 0xb exists which wasnt processed by this strip_invalid_xml_chars function. I think I need to work on it again when i found that data.(as you've mentioned)

Any other suggestions and advice appreciated.  
Well, the choice of input - browser request versus data base - is pretty much limited by my ability to demonstrate a concept here.  I don't have your data base, so I can only use what I know we both have, and that's usually browser input.

In addition to handwritten filters like what I have shown above, there is the PHP filter_var()
http://us.php.net/manual/en/function.filter-var.php

All of the PHP filter functions seem to be a little bit "in their infancy" but as the use of these gains popularity, you can be sure that they will mature, and that they will be the fastest way to clean up external data.  More here:
http://us.php.net/manual/en/book.filter.php

And finally, UTF-8 support has its share of quirks and OS-dependencies.  Maybe some of these links will help. ;-)
http://lmgtfy.comq=php+regex+utf8

Best regards, ~Ray
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
>
>But my data contains ONLY numbers and hyphen at that field.
>
As mentioned i believe that ignored char could be a zenkaku-hyphen(japanese) or simple hyphen or a number. Ignoring looks fine in my case.