Improve company productivity with a Business Account.Sign Up

x
?
Solved

utf 8 encoding problem in Java  in Transformer class

Posted on 2015-02-18
9
Medium Priority
?
166 Views
Last Modified: 2015-03-13
I'm getting some UTF-8 xml which I am processing (Removing some nodes) and then writing out again to another xml file
However some of the UTF-8 characters (French letters) are screwed up

Is there a way around it ?

I don't need to use Transformer class

	File origFile = new File(dataFile.getCanonicalFile() + ".orig");
                File origFilex = new File(dataFile.getCanonicalFile() + ".orig1x");  
                
            
		dataFile.renameTo(origFile);
                
                DocumentBuilder dBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
                Document doc = dBuilder.parse(origFile);
                              
                modifyxml(doc,"Contributor");
                modifyxml(doc,"Author");
                
                TransformerFactory transformerFactory = TransformerFactory.newInstance();
                Transformer transformer = transformerFactory.newTransformer();
                transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
                DOMSource source = new DOMSource(doc);
                StreamResult result = new StreamResult(origFilex.getPath());
                transformer.transform(source, result);

Open in new window

0
Comment
Question by:sniger
  • 4
  • 3
  • 2
9 Comments
 
LVL 37

Accepted Solution

by:
zzynx earned 2000 total points
ID: 40618394
Does this help?

        FileInputStream in = new FileInputStream(origFile);
        Document doc = dBuilder.parse(in, "UTF-8");

Open in new window


So in fact, replacing
 Document doc = dBuilder.parse(origFile);

Open in new window

by
Document doc = dBuilder.parse(new FileInputStream(origFile), "UTF-8");

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 40618579
Use an InputSource instead

Document doc = dBuilder.parse(new InputSource(new InputStreamReader(in, "UTF-8")));

Open in new window


To serialize the result:

http://technojeeves.com/index.php/aliasjava1/96-serialize-xml-to-file-in-java
0
 

Author Comment

by:sniger
ID: 40618858
unfortunately it did not
0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 86

Expert Comment

by:CEHJ
ID: 40618880
If my code doesn't (obviously you'll need to ensure that you write UTF-8 too if that's appropriate) then please attach an input file that is problematic
0
 

Author Comment

by:sniger
ID: 40618888
  <FullName LanguageAndScriptCode="en">Carlos  Fauré</FullName> 

Open in new window

It gets converted to:

</ResourceContributor>
                <ResourceContributor SequenceNumber="2">
                    <PartyName LanguageAndScriptCode="en">
                        <FullName LanguageAndScriptCode="en"> Carlos  Fauré</FullName>
                    </PartyName>
                    <PartyId>8293</PartyId>
                    <ResourceContributorRole Namespace="PA-DP-2007032-I" UserDefinedValue="Composer">UserDefined</ResourceContributorRole>
                </ResourceContributor>

Open in new window

0
 
LVL 37

Expert Comment

by:zzynx
ID: 40618929
Maybe you should post your complete code (or a simplified version) so that we can run it.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 40619007
then please attach an input file
Quoting from one won't help. Of course, my code will only fix the problem if your input actually is encoded as UTF-8. Otherwise the actual encoding should be specified instead.
0
 
LVL 37

Expert Comment

by:zzynx
ID: 40662789
Thanx 4 axxepting.
However, some explanation about why you do close the question as you do is always welcome.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 40662861
I too would welcome an explanation, especially since i'm almost certain the accepted comment would not have helped ;)
0

Featured Post

Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

In this post we will learn how to connect and configure Android Device (Smartphone etc.) with Android Studio. After that we will run a simple Hello World Program.
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
Viewers will learn one way to get user input in Java. Introduce the Scanner object: Declare the variable that stores the user input: An example prompting the user for input: Methods you need to invoke in order to properly get  user input:
How to fix incompatible JVM issue while installing Eclipse While installing Eclipse in windows, got one error like above and unable to proceed with the installation. This video describes how to successfully install Eclipse. How to solve incompa…

595 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question