Solved

Clean XML file of non utf-8 characters

Posted on 2011-02-26
5
930 Views
Last Modified: 2012-05-11
I have a file of XML which I am loading in php using

$xml = simplexml_load_file('test.xml');

foreach ($xml->event as $event) {
    do_something();
}

The XML file starts with <?xml version="1.0" encoding="UTF-8"?> however there are various non UTF-8 characters in there such as umlauts (sp?) etc.

How can I clean up the file and remove the offending characters?

Thanks

Mike
0
Comment
Question by:hungoveragain
  • 3
5 Comments
 
LVL 27

Accepted Solution

by:
Lukasz Chmielewski earned 250 total points
ID: 34987745
0
 

Author Comment

by:hungoveragain
ID: 34987789
Can you please explain how I would insert that into my code?

$xml = iconv("UTF-8", "ISO-8859-1//TRANSLIT", simplexml_load_file('test.xml'));

??

Thanks

Mike
0
 
LVL 48

Assisted Solution

by:hernst42
hernst42 earned 250 total points
ID: 34988167
You can try something like:

$sx = simplexml_lod_string(iconv('ISO-8859-1', 'UTF-8', iconv('UTF-8', "ISO-8859-1//TRANSLIT", file_get_conents('test.xml'))));
0
 

Author Comment

by:hungoveragain
ID: 34991123
I can't seem to get that working either.

Here is the file

http://xml.betclick.com/odds_en.xml

I need to get that into

$xml = simplexml_load_file('http://xml.betclick.com/odds_en.xml');

However there are multiple characters in there such as é, ä, etc which makes it fall over.

Thanks

Mike
0
 

Author Comment

by:hungoveragain
ID: 34991311
Managed to do it.

$in = file("http://xml.betclick.com/odds_en.xml");
$out = fopen("today.xml", "w");

foreach ($in as $line) {
      $line = preg_replace('/&(.)(acute|cedil|circ|lig|grave|ring|tilde|uml);/', "$1", $line);
      fputs($out, $line);
}

Thanks

Mike
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
Part of the Global Positioning System A geocode (https://developers.google.com/maps/documentation/geocoding/) is the major subset of a GPS coordinate (http://en.wikipedia.org/wiki/Global_Positioning_System), the other parts being the altitude and t…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now