How not to loose accents from a XML RSS feed that is ISO-8859-1 encoded when reading with ? (I'm getting "US-ASCII"...)

Hi All,

I am trying to use ROME (Rss and atOM utilitiEs - to build a Java program to read an RSS feed that is ISO-8859-1 encoded.

I use to read the remote file, but all the accents ("´", "`", "^", "~", etc.) are being lost, probably because the encoding is not being properly recognized.

Here is my example code:


import com.sun.syndication.feed.synd.SyndFeed;

            String feed = "http://somedomain/some_rss_feed.xml";
            URL feedUrl = new URL(feed);
            XmlReader reader = new XmlReader(feedUrl);
            SyndFeedInput input = new SyndFeedInput();
            SyndFeed result =;

The structure of the RSS feed (which is NOT under my control, so I have no ways to correct anything wrong related to it...) is like below:

<?xml version="1.0" encoding="ISO-8859-1"?><!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "">
<rss version="2.0">
<title>Some title already wíth ãny àccênts intö it</title>





When variable "reader" gets the result of "new XmlReader(feedUrl)", it already shows me a property named "_encoding" filled with value US-ASCII instead of ISO-8859-1.

And when I check the variable "result" for its contents, it has already all the attributes filled with the values which were read from the XML feed, but with all my accents already corrupted...

Plz help...!
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Did you try using Replace() function?

teufelsdaumenAuthor Commented:
Hello YZlat,

The problem is that it does not help to change the attribute which contains the encoding inside the "reader" object, since this is already the result of the transfer of a HTTP stream comming from the computer serving the feed, and in order to transfer the XML file, XmlReader class does some "magic" (or "Voodo", according to ROME website...) trying to detect the encoding of the file *before* transfering it.

The "US-ASCII" encoding is thus the encoding that XmlReader understood to be the encoding of the document (and so when I get the document *it is already "corrupted" since was transfered as an US-ASCII* and not as an ISO-8859-1).

Here is an excerpt from the ROME website about XmlReader class (

public class XmlReader

Character stream that handles (or at least attemtps to) all the necessary Voodo to figure out the charset encoding of the XML document within the stream.

IMPORTANT: This class is not related in any way to the org.xml.sax.XMLReader. This one IS a character stream.

All this has to be done without consuming characters from the stream, if not the XML parser will not recognized the document as a valid XML. This is not 100% true, but it's close enough (UTF-8 BOM is not handled by all parsers right now, XmlReader handles it and things work in all parsers).

The XmlReader class handles the charset encoding of XML documents in Files, raw streams and HTTP streams by offering a wide set of constructors.

By default the charset encoding detection is lenient, the constructor with the lenient flag can be used for an script (following HTTP MIME and XML specifications). All this is nicely explained by Mark Pilgrim in his blog, Determining the character encoding of a feed.


I ended up searching for some RSS online validators to see if the feed had any problems. I ended up finding this excelent validator by Mark Pilgrim and Sam Ruby: "" and discovered the reason why!!! (well, now that I have figured out what´s happening, I still have to find the way to go around the problem...).

As I expected, the feed had not one but various problems, most of them regarding a non compliance to the DTD. Follows just an excerpt of all the reported errors. The first error is the reason why I am getting US-ASCII:

This feed does not validate.

Your feed appears to be encoded as "ISO-8859-1", but your server is reporting "US-ASCII" [help]

line 1, column 164: XML parsing error: No declaration for element publisher (2 occurrences) [help]

...">                                            ^
line 1, column 164: XML parsing error: Element channel content does not follow the DTD, Misplaced publisher [help]




Looking at the HELP ( I could find the following information:

Your feed appears to be encoded as “foo”, but your server is reporting “bar”

The XML appears to be using one encoding, but the HTTP headers from the web server indicate a different charset. Internet standards require that the web server's version takes preference, but many aggregators ignore this. Note that, if you are serving content as 'text/*', then the default charset is US-ASCII, which is probably not what you want. (See RFC 3023 for technical details.)

RSS feeds should be served as application/rss+xml (RSS 1.0 is an RDF format, so it may be served as application/rdf+xml instead). Atom feeds should use application/atom+xml. Alternatively, for compatibility with widely-deployed web browsers, any of these feeds can use one of the more general XML types - preferably application/xml.

Another possible cause is the use of single quotes to delimit the charset parameter in the http header, whereas the http definition of Basic Rules only permits the use of double quotes. The result is somewhat confusing messages such as:

Your feed appears to be encoded as “utf-8”, but your server is reporting “'utf-8'”

Either ensure that the charset parameter of the HTTP Content-Type header matches the encoding declaration, or ensure that the server makes no claims about the encoding. Serving the feed as application/xml means that the encoding will be taken from the file's declaration.

The W3C has published information on how to set the HTTP charset parameter with various popular web servers.

If you are unable to control your server's charset declaration, Character and Entity References may be used to specify the full range of Unicode characters in an feed served as US-ASCII.

Not clear? Disagree?
Let us know on the feedvalidator-users discussion list!


And going through the mailing list "feedvalidator-users" at SourceForge I found out the following message replied by Sam Ruby:


> However, some news items have ASCII characters such as the copyright
 > symbol, trademark symbol etc. These stop the XML feed from validating,
 > and the validator says "Your feed appears to be encoded as "iso-8859-1"
 > but your server is reporting "US-ASCII". It sends me to the following
 > page: which
 > then links to another page of techy stuff, but it is way over my head,
 > far too technical for me.


2) The message you cited is only a warning
3) Adding either or both of these lines to your Apache server config,
    virtual host, directory, or .htaccess files will eliminate this
      AddCharset iso-8859-1 .xml
      AddType application/xml .xml


So, the solution for the feed provider is clear to be the one above!

I'll try to solve things here though, before contacting folks there asking to change anything regarding the feed... (although I think I will be of help to point their attention to the problems regarding their feed...).

So if anyone has any other suggestions....
teufelsdaumenAuthor Commented:
I have found the solution myself. Taking the information about setting the content type to "application/xml" into account ("alternatively, for compatibility with widely-deployed web browsers, any of these feeds can use one of the more general XML types - preferably application/xml."), I changed my code from

            XmlReader reader = new XmlReader(feedUrl);


            InputStream is = feedUrl.openStream();
            XmlReader reader = new XmlReader(is, "application/xml");

And now XmlReader treats the HTTP stream as being "ISO-8859-1" and the accents are preserved.

Thanks anyway.
Closed, 500 points refunded.
Community Support Moderator (Graveyard shift)

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Web Languages and Standards

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.