Solved

Clean XML file of non utf-8 characters

Posted on 2011-02-26
5
939 Views
Last Modified: 2012-05-11
I have a file of XML which I am loading in php using

$xml = simplexml_load_file('test.xml');

foreach ($xml->event as $event) {
    do_something();
}

The XML file starts with <?xml version="1.0" encoding="UTF-8"?> however there are various non UTF-8 characters in there such as umlauts (sp?) etc.

How can I clean up the file and remove the offending characters?

Thanks

Mike
0
Comment
Question by:hungoveragain
  • 3
5 Comments
 
LVL 27

Accepted Solution

by:
Lukasz Chmielewski earned 250 total points
ID: 34987745
0
 

Author Comment

by:hungoveragain
ID: 34987789
Can you please explain how I would insert that into my code?

$xml = iconv("UTF-8", "ISO-8859-1//TRANSLIT", simplexml_load_file('test.xml'));

??

Thanks

Mike
0
 
LVL 48

Assisted Solution

by:hernst42
hernst42 earned 250 total points
ID: 34988167
You can try something like:

$sx = simplexml_lod_string(iconv('ISO-8859-1', 'UTF-8', iconv('UTF-8', "ISO-8859-1//TRANSLIT", file_get_conents('test.xml'))));
0
 

Author Comment

by:hungoveragain
ID: 34991123
I can't seem to get that working either.

Here is the file

http://xml.betclick.com/odds_en.xml

I need to get that into

$xml = simplexml_load_file('http://xml.betclick.com/odds_en.xml');

However there are multiple characters in there such as é, ä, etc which makes it fall over.

Thanks

Mike
0
 

Author Comment

by:hungoveragain
ID: 34991311
Managed to do it.

$in = file("http://xml.betclick.com/odds_en.xml");
$out = fopen("today.xml", "w");

foreach ($in as $line) {
      $line = preg_replace('/&(.)(acute|cedil|circ|lig|grave|ring|tilde|uml);/', "$1", $line);
      fputs($out, $line);
}

Thanks

Mike
0

Featured Post

Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
3 proven steps to speed up Magento powered sites. The article focus is on optimizing time to first byte (TTFB), full page caching and configuring server for optimal performance.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

773 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question