Solved

utf 8 encoding problem in Java  in Transformer class

Posted on 2015-02-18
9
133 Views
Last Modified: 2015-03-13
I'm getting some UTF-8 xml which I am processing (Removing some nodes) and then writing out again to another xml file
However some of the UTF-8 characters (French letters) are screwed up

Is there a way around it ?

I don't need to use Transformer class

	File origFile = new File(dataFile.getCanonicalFile() + ".orig");
                File origFilex = new File(dataFile.getCanonicalFile() + ".orig1x");  
                
            
		dataFile.renameTo(origFile);
                
                DocumentBuilder dBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
                Document doc = dBuilder.parse(origFile);
                              
                modifyxml(doc,"Contributor");
                modifyxml(doc,"Author");
                
                TransformerFactory transformerFactory = TransformerFactory.newInstance();
                Transformer transformer = transformerFactory.newTransformer();
                transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
                DOMSource source = new DOMSource(doc);
                StreamResult result = new StreamResult(origFilex.getPath());
                transformer.transform(source, result);

Open in new window

0
Comment
Question by:sniger
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
9 Comments
 
LVL 37

Accepted Solution

by:
zzynx earned 500 total points
ID: 40618394
Does this help?

        FileInputStream in = new FileInputStream(origFile);
        Document doc = dBuilder.parse(in, "UTF-8");

Open in new window


So in fact, replacing
 Document doc = dBuilder.parse(origFile);

Open in new window

by
Document doc = dBuilder.parse(new FileInputStream(origFile), "UTF-8");

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 40618579
Use an InputSource instead

Document doc = dBuilder.parse(new InputSource(new InputStreamReader(in, "UTF-8")));

Open in new window


To serialize the result:

http://technojeeves.com/index.php/aliasjava1/96-serialize-xml-to-file-in-java
0
 

Author Comment

by:sniger
ID: 40618858
unfortunately it did not
0
Instantly Create Instructional Tutorials

Contextual Guidance at the moment of need helps your employees adopt to new software or processes instantly. Boost knowledge retention and employee engagement step-by-step with one easy solution.

 
LVL 86

Expert Comment

by:CEHJ
ID: 40618880
If my code doesn't (obviously you'll need to ensure that you write UTF-8 too if that's appropriate) then please attach an input file that is problematic
0
 

Author Comment

by:sniger
ID: 40618888
  <FullName LanguageAndScriptCode="en">Carlos  Fauré</FullName> 

Open in new window

It gets converted to:

</ResourceContributor>
                <ResourceContributor SequenceNumber="2">
                    <PartyName LanguageAndScriptCode="en">
                        <FullName LanguageAndScriptCode="en"> Carlos  Fauré</FullName>
                    </PartyName>
                    <PartyId>8293</PartyId>
                    <ResourceContributorRole Namespace="PA-DP-2007032-I" UserDefinedValue="Composer">UserDefined</ResourceContributorRole>
                </ResourceContributor>

Open in new window

0
 
LVL 37

Expert Comment

by:zzynx
ID: 40618929
Maybe you should post your complete code (or a simplified version) so that we can run it.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 40619007
then please attach an input file
Quoting from one won't help. Of course, my code will only fix the problem if your input actually is encoded as UTF-8. Otherwise the actual encoding should be specified instead.
0
 
LVL 37

Expert Comment

by:zzynx
ID: 40662789
Thanx 4 axxepting.
However, some explanation about why you do close the question as you do is always welcome.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 40662861
I too would welcome an explanation, especially since i'm almost certain the accepted comment would not have helped ;)
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
In this post we will learn how to connect and configure Android Device (Smartphone etc.) with Android Studio. After that we will run a simple Hello World Program.
Viewers learn about the “while” loop and how to utilize it correctly in Java. Additionally, viewers begin exploring how to include conditional statements within a while loop and avoid an endless loop. Define While Loop: Basic Example: Explanatio…
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
Suggested Courses

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question