asked on

hyphen inside CDATA makes error. xml - not well formed (working in IE, but NOT working in firefox & chrome)

not well-formed
Line 132

Error in ajax response xml.

In the mentioned line-132 the CDATA contains a hyphen. (I need that hyphen. I cant remove it)

<?xml version="1.0" encoding="UTF-8"?>
<response>
	<status>1</status>
<results>
		<product>
				<productItemId>457865</productItemId>
				<productItemName><![CDATA[Whatever this is]]></productItemName>
				<productPrice>1460</productPrice>
				<productItemCode><![CDATA[AGAINSOMETHING]]></productItemCode>
				<productType>1</productType>
		</product>
                <product>
				<productItemId>457865</productItemId>
				<productItemName><![CDATA[Whatever this is]]></productItemName>
				<productPrice>1460</productPrice>
				<productItemCode><![CDATA[Error-is-in-this-line.]]></productItemCode>
				<productType>1</productType>
		</product>
</results>
</response>

Open in new window

BigRat

I have put your XML into a file and opened it in Mozilla Firefox (3.0.10) and it displays the XML OK. The only odd thing is that the CDATA keywords are missing - either because the content does not contain any markup or their display is different to Microsoft. In IE and other tools I have found no problem, so I don't believe that the problem lies exactly there.

How does the problem actually occur?

Ray Paseur

Where is line 132? Could you please post all the XML?

Ray Paseur

This works perfectly, parsing the XML into an object in PHP. So whatever is processing your AJAX response is where the problem lies. There is really nothing wrong with the XML - the error message appears to be a false positive.

Best regards, ~Ray

<?php // RAY_temp_XML_example_48.php
error_reporting(E_ALL);
echo "<pre>\n";
 
// TEST DATA FROM THE OP
$xml = '<?xml version="1.0" encoding="UTF-8"?>
<response>
        <status>1</status>
<results>
                <product>
                                <productItemId>457865</productItemId>
                                <productItemName><![CDATA[Whatever this is]]></productItemName>
                                <productPrice>1460</productPrice>
                                <productItemCode><![CDATA[AGAINSOMETHING]]></productItemCode>
                                <productType>1</productType>
                </product>
                <product>
                                <productItemId>457865</productItemId>
                                <productItemName><![CDATA[Whatever this is]]></productItemName>
                                <productPrice>1460</productPrice>
                                <productItemCode><![CDATA[Error-is-in-this-line.]]></productItemCode>
                                <productType>1</productType>
                </product>
</results>
</response>';
 
// MAKE AN OBJECT
$obj = SimpleXML_Load_String($xml);
 
// VISUALIZE THE OBJECT
var_dump($obj);

Open in new window

BToTheAToTheBABA

ASKER

I used a xml parser and it said the following error.

===> An invalid XML character (Unicode: 0xb) was found in the CDATA section.

BigRat

0xb is the value for VT (Vertical Tab) and has nothing to do with a hyphen. When you posted the XML, E-E drops such characters so we can't see them. Please post the actual file with the "Attach file" option.

ASKER CERTIFIED SOLUTION

BToTheAToTheBABA

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

Ray Paseur

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

BToTheAToTheBABA

ASKER

Thats great. Ive thought about doing something like this before but there are a few differences.

The input to the function "return_only clean_xml_chars()" is from database and the 'cleaned' output is sent to the browser. i.e The input is not from request.(not a big difference though)

"known good values" are within the range of data in a particular column in database table.
But unfortunately Im working in a japanese environment where most of utf-8 chars are used.
All possible unclean data exist in that huge database.

But its fine (in this case) to ignore the unclean data & sending the readables only to the output.

What If some more bad data like 0xb exists which wasnt processed by this strip_invalid_xml_chars function. I think I need to work on it again when i found that data.(as you've mentioned)

Any other suggestions and advice appreciated.

Ray Paseur

Well, the choice of input - browser request versus data base - is pretty much limited by my ability to demonstrate a concept here. I don't have your data base, so I can only use what I know we both have, and that's usually browser input.

In addition to handwritten filters like what I have shown above, there is the PHP filter_var()
http://us.php.net/manual/en/function.filter-var.php

All of the PHP filter functions seem to be a little bit "in their infancy" but as the use of these gains popularity, you can be sure that they will mature, and that they will be the fastest way to clean up external data. More here:
http://us.php.net/manual/en/book.filter.php

And finally, UTF-8 support has its share of quirks and OS-dependencies. Maybe some of these links will help. ;-)
http://lmgtfy.comq=php+regex+utf8

Best regards, ~Ray

SOLUTION

BigRat

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

BToTheAToTheBABA

ASKER

>
>But my data contains ONLY numbers and hyphen at that field.
>
As mentioned i believe that ignored char could be a zenkaku-hyphen(japanese) or simple hyphen or a number. Ignoring looks fine in my case.

hyphen inside CDATA makes error. xml - not well formed (working in IE, but NOT working in firefox &amp; chrome)

hyphen inside CDATA makes error. xml - not well formed (working in IE, but NOT working in firefox & chrome)