asked on

Decoding special characters while doing XML Parse

Hello All

I am looking for a quick and simple solution to a xml_parse issue I have. When I parse the following element:-

<large_image>http://xxx.xxx.xxx.xxx/imagegallery/parax/Largeimg/All/L_D012039.Jpg</large_image>

All I receive is "D012039.Jpg" because the special character is causing my parser to chop the URL. I know that the _ should return as a underline or _. How can I make this return correctly as an underline while parsing.

I have tried htmlentities and that did not work. The site is running on PHP 4.4.8, so I need a solution that does not involve a PHP 5.0 code base.

Many thanks

ASKER CERTIFIED SOLUTION

Lordgobbledegook

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

LordOfPorts

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ParadyneDesigns

ASKER

Unfortunately, I am not in a position to be able to modify the XML document as its being supplied by a third party and the plan is to simply run a cron job once a day to parse any updates.

The original file (which contains all the initial import data) is 301mb in size and is actually too big to open in any text editors I have (I tend to use crimson editor which crashes when I attempt to open this file).

If there is a way to open this document, modify it, then resave and parse that would work (so long as it was automatic). The underscore issue is not the only none ascii value in the document. It appears to be full of them causing issues on many fields.

Does anyone have any ideas how to parse this XML document in its current format?

SOLUTION

LordOfPorts

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial