• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 159
  • Last Modified:

utf 8 encoding problem in Java in Transformer class

I'm getting some UTF-8 xml which I am processing (Removing some nodes) and then writing out again to another xml file
However some of the UTF-8 characters (French letters) are screwed up

Is there a way around it ?

I don't need to use Transformer class

	File origFile = new File(dataFile.getCanonicalFile() + ".orig");
                File origFilex = new File(dataFile.getCanonicalFile() + ".orig1x");  
                
            
		dataFile.renameTo(origFile);
                
                DocumentBuilder dBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
                Document doc = dBuilder.parse(origFile);
                              
                modifyxml(doc,"Contributor");
                modifyxml(doc,"Author");
                
                TransformerFactory transformerFactory = TransformerFactory.newInstance();
                Transformer transformer = transformerFactory.newTransformer();
                transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
                DOMSource source = new DOMSource(doc);
                StreamResult result = new StreamResult(origFilex.getPath());
                transformer.transform(source, result);

Open in new window

0
sniger
Asked:
sniger
  • 4
  • 3
  • 2
1 Solution
 
zzynxSoftware engineerCommented:
Does this help?

        FileInputStream in = new FileInputStream(origFile);
        Document doc = dBuilder.parse(in, "UTF-8");

Open in new window


So in fact, replacing
 Document doc = dBuilder.parse(origFile);

Open in new window

by
Document doc = dBuilder.parse(new FileInputStream(origFile), "UTF-8");

Open in new window

0
 
CEHJCommented:
Use an InputSource instead

Document doc = dBuilder.parse(new InputSource(new InputStreamReader(in, "UTF-8")));

Open in new window


To serialize the result:

http://technojeeves.com/index.php/aliasjava1/96-serialize-xml-to-file-in-java
0
 
snigerAuthor Commented:
unfortunately it did not
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
CEHJCommented:
If my code doesn't (obviously you'll need to ensure that you write UTF-8 too if that's appropriate) then please attach an input file that is problematic
0
 
snigerAuthor Commented:
  <FullName LanguageAndScriptCode="en">Carlos  Fauré</FullName> 

Open in new window

It gets converted to:

</ResourceContributor>
                <ResourceContributor SequenceNumber="2">
                    <PartyName LanguageAndScriptCode="en">
                        <FullName LanguageAndScriptCode="en"> Carlos  Fauré</FullName>
                    </PartyName>
                    <PartyId>8293</PartyId>
                    <ResourceContributorRole Namespace="PA-DP-2007032-I" UserDefinedValue="Composer">UserDefined</ResourceContributorRole>
                </ResourceContributor>

Open in new window

0
 
zzynxSoftware engineerCommented:
Maybe you should post your complete code (or a simplified version) so that we can run it.
0
 
CEHJCommented:
then please attach an input file
Quoting from one won't help. Of course, my code will only fix the problem if your input actually is encoded as UTF-8. Otherwise the actual encoding should be specified instead.
0
 
zzynxSoftware engineerCommented:
Thanx 4 axxepting.
However, some explanation about why you do close the question as you do is always welcome.
0
 
CEHJCommented:
I too would welcome an explanation, especially since i'm almost certain the accepted comment would not have helped ;)
0

Featured Post

Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

  • 4
  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now