Solved

Clean XML file of non utf-8 characters

Posted on 2011-02-26
5
959 Views
Last Modified: 2012-05-11
I have a file of XML which I am loading in php using

$xml = simplexml_load_file('test.xml');

foreach ($xml->event as $event) {
    do_something();
}

The XML file starts with <?xml version="1.0" encoding="UTF-8"?> however there are various non UTF-8 characters in there such as umlauts (sp?) etc.

How can I clean up the file and remove the offending characters?

Thanks

Mike
0
Comment
Question by:hungoveragain
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
5 Comments
 
LVL 27

Accepted Solution

by:
Lukasz Chmielewski earned 250 total points
ID: 34987745
0
 

Author Comment

by:hungoveragain
ID: 34987789
Can you please explain how I would insert that into my code?

$xml = iconv("UTF-8", "ISO-8859-1//TRANSLIT", simplexml_load_file('test.xml'));

??

Thanks

Mike
0
 
LVL 48

Assisted Solution

by:hernst42
hernst42 earned 250 total points
ID: 34988167
You can try something like:

$sx = simplexml_lod_string(iconv('ISO-8859-1', 'UTF-8', iconv('UTF-8', "ISO-8859-1//TRANSLIT", file_get_conents('test.xml'))));
0
 

Author Comment

by:hungoveragain
ID: 34991123
I can't seem to get that working either.

Here is the file

http://xml.betclick.com/odds_en.xml

I need to get that into

$xml = simplexml_load_file('http://xml.betclick.com/odds_en.xml');

However there are multiple characters in there such as é, ä, etc which makes it fall over.

Thanks

Mike
0
 

Author Comment

by:hungoveragain
ID: 34991311
Managed to do it.

$in = file("http://xml.betclick.com/odds_en.xml");
$out = fopen("today.xml", "w");

foreach ($in as $line) {
      $line = preg_replace('/&(.)(acute|cedil|circ|lig|grave|ring|tilde|uml);/', "$1", $line);
      fputs($out, $line);
}

Thanks

Mike
0

Featured Post

DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Foreword (July, 2015) Since I first wrote this article, years ago, a great many more people have begun using the internet.  They are coming online from every part of the globe, learning, reading, shopping and spending money at an ever-increasing ra…
This article discusses four methods for overlaying images in a container on a web page
The viewer will learn how to count occurrences of each item in an array.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

735 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question