Solved

XML Parsing for special characters

Posted on 2004-04-12
7
767 Views
Last Modified: 2008-02-01
Hello
I have a XML file which contains some special characters like 'cent symbol', 'pound symbol' .My java program parses the special characters and replaces them with #xA2;( for cent).When i run it in the unix box, the cent symbol is being interpreted as '?'.Whereas when i run the same code in Windows machine, it is being interpreted as cent and is working fine.Can u please tell me why the specific code is not working in Unix box but working in Windows. I would really appreciate if i get some very quick replies.

InputSteam is = new FileInputStream("some.xml");
BufferedReader br = new BufferedReader(new InputStreamReader(is));
while ((line = br.readLine())!= null) {
if(line.indexOf('\u00A2')!=-1) {
//or if(line.indexOf('#xA2;')!=-1) {
System.out.println(" cent found" );
}

I didnt get any replies from various forums I have posted.Pls help me with the solution
Also, when I pasted the question in this forum, the cent symbol was replaced by '?'.
Can you please treat this as a priority and help me..
Thanks a lot
0
Comment
Question by:sharma_kv123
  • 3
  • 3
7 Comments
 
LVL 1

Expert Comment

by:Evlich
ID: 10807275
What I would look at is teh character set that is being used. There is a class in java.nio that deals with character sets and byte buffers and a bunch of stuff like that. Normally machines replace unknown characters with '?'s. If you exclusively pick a character set, it should work. Hope this, helps but if it doesn't it should give you a place to start.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10807323
Try

BufferedReader br = new BufferedReader(new InputStreamReader(is, "ISO8859-1"));
0
 

Author Comment

by:sharma_kv123
ID: 10808158
Thanks a lot. The problem has been solved with the solution you have given,
BufferedReader br = new BufferedReader(new InputStreamReader(inputFileName,"ISO-8859-1"));

and it works perfectly fine.
I have explicitly checked for the cent and the pound symbol and then parsed it.
Is there any generic code where I can parse all the special characters that are not in the keyboard and parse it. Becoz we donno what the XML contains as i am getting it from a legacy system. As I know the xml may contain cent symbol, i have parsed it and saved me for this time. Can you please give me a pseudo code where i can do a generic checking and parsing. That would really be a great help to me.
Thanks a lot for the previous answer.
You have been a great help to me  
0
Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

 
LVL 86

Accepted Solution

by:
CEHJ earned 125 total points
ID: 10808194
There's no such thing as "generic checking and parsing" unfortunately. It's the responsibility of those providing the input to ensure that the input is encoded in a specific way that's suitable for the target program. Effectively it's not practicable for the receiving program to guess encodings
0
 

Author Comment

by:sharma_kv123
ID: 10808335
Thanks a lot Mr CEHJ.
I have decided to check for all the special characters that may come in the xml file. At least this would solve the problem to some extent.
Thanks again.
I really appreciate the way you have responded.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10808342
8-)
0
 

Author Comment

by:sharma_kv123
ID: 10834684
Hi
It is working fine with the special characters like cent and pound. When i added more special chars like ® (registration sign) and © (copyright sign), it being parsed properly. But when i replace it back just before i update in the database , it is being stored as ? but it works fine for cent and symbols.
String title = replace(title,"#xAE;", "®"); // for ® (registration sign)
I have function called "replace" where it replaces the text with old value(#xAE;) to the new value(® ).
Its working fine for cent and pounds but not working fine for other special chars
Is it anything do with the encoding. I have used ISO8859-1
Can you please tell me what should i do.. Its very urgent.
Thanks a lot
0

Featured Post

Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

An old method to applying the Singleton pattern in your Java code is to check if a static instance, defined in the same class that needs to be instantiated once and only once, is null and then create a new instance; otherwise, the pre-existing insta…
Are you developing a Java application and want to create Excel Spreadsheets? You have come to the right place, this article will describe how you can create Excel Spreadsheets from a Java Application. For the purposes of this article, I will be u…
The viewer will learn how to implement Singleton Design Pattern in Java.
This tutorial covers a practical example of lazy loading technique and early loading technique in a Singleton Design Pattern.

778 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question