Solved

utf 8 encoding problem in Java  in Transformer class

Posted on 2015-02-18
9
120 Views
Last Modified: 2015-03-13
I'm getting some UTF-8 xml which I am processing (Removing some nodes) and then writing out again to another xml file
However some of the UTF-8 characters (French letters) are screwed up

Is there a way around it ?

I don't need to use Transformer class

	File origFile = new File(dataFile.getCanonicalFile() + ".orig");
                File origFilex = new File(dataFile.getCanonicalFile() + ".orig1x");  
                
            
		dataFile.renameTo(origFile);
                
                DocumentBuilder dBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
                Document doc = dBuilder.parse(origFile);
                              
                modifyxml(doc,"Contributor");
                modifyxml(doc,"Author");
                
                TransformerFactory transformerFactory = TransformerFactory.newInstance();
                Transformer transformer = transformerFactory.newTransformer();
                transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
                DOMSource source = new DOMSource(doc);
                StreamResult result = new StreamResult(origFilex.getPath());
                transformer.transform(source, result);

Open in new window

0
Comment
Question by:sniger
  • 4
  • 3
  • 2
9 Comments
 
LVL 37

Accepted Solution

by:
zzynx earned 500 total points
ID: 40618394
Does this help?

        FileInputStream in = new FileInputStream(origFile);
        Document doc = dBuilder.parse(in, "UTF-8");

Open in new window


So in fact, replacing
 Document doc = dBuilder.parse(origFile);

Open in new window

by
Document doc = dBuilder.parse(new FileInputStream(origFile), "UTF-8");

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 40618579
Use an InputSource instead

Document doc = dBuilder.parse(new InputSource(new InputStreamReader(in, "UTF-8")));

Open in new window


To serialize the result:

http://technojeeves.com/index.php/aliasjava1/96-serialize-xml-to-file-in-java
0
 

Author Comment

by:sniger
ID: 40618858
unfortunately it did not
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 40618880
If my code doesn't (obviously you'll need to ensure that you write UTF-8 too if that's appropriate) then please attach an input file that is problematic
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 

Author Comment

by:sniger
ID: 40618888
  <FullName LanguageAndScriptCode="en">Carlos  Fauré</FullName> 

Open in new window

It gets converted to:

</ResourceContributor>
                <ResourceContributor SequenceNumber="2">
                    <PartyName LanguageAndScriptCode="en">
                        <FullName LanguageAndScriptCode="en"> Carlos  Fauré</FullName>
                    </PartyName>
                    <PartyId>8293</PartyId>
                    <ResourceContributorRole Namespace="PA-DP-2007032-I" UserDefinedValue="Composer">UserDefined</ResourceContributorRole>
                </ResourceContributor>

Open in new window

0
 
LVL 37

Expert Comment

by:zzynx
ID: 40618929
Maybe you should post your complete code (or a simplified version) so that we can run it.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 40619007
then please attach an input file
Quoting from one won't help. Of course, my code will only fix the problem if your input actually is encoded as UTF-8. Otherwise the actual encoding should be specified instead.
0
 
LVL 37

Expert Comment

by:zzynx
ID: 40662789
Thanx 4 axxepting.
However, some explanation about why you do close the question as you do is always welcome.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 40662861
I too would welcome an explanation, especially since i'm almost certain the accepted comment would not have helped ;)
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
sumDigits  challenge 7 60
computer science syllabus 3 52
HashMap Vs TreeMap 12 49
JList custom Cell Renderer refresh 15 43
After being asked a question last year, I went into one of my moods where I did some research and code just for the fun and learning of it all.  Subsequently, from this journey, I put together this article on "Range Searching Using Visual Basic.NET …
Introduction This article is the second of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers the basic installation and configuration of the test automation tools used by…
Viewers learn about the third conditional statement “else if” and use it in an example program. Then additional information about conditional statements is provided, covering the topic thoroughly. Viewers learn about the third conditional statement …
Viewers will learn about the regular for loop in Java and how to use it. Definition: Break the for loop down into 3 parts: Syntax when using for loops: Example using a for loop:

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now