Solved

XML Parsing for special characters

Posted on 2004-04-12
7
772 Views
Last Modified: 2008-02-01
Hello
I have a XML file which contains some special characters like 'cent symbol', 'pound symbol' .My java program parses the special characters and replaces them with #xA2;( for cent).When i run it in the unix box, the cent symbol is being interpreted as '?'.Whereas when i run the same code in Windows machine, it is being interpreted as cent and is working fine.Can u please tell me why the specific code is not working in Unix box but working in Windows. I would really appreciate if i get some very quick replies.

InputSteam is = new FileInputStream("some.xml");
BufferedReader br = new BufferedReader(new InputStreamReader(is));
while ((line = br.readLine())!= null) {
if(line.indexOf('\u00A2')!=-1) {
//or if(line.indexOf('#xA2;')!=-1) {
System.out.println(" cent found" );
}

I didnt get any replies from various forums I have posted.Pls help me with the solution
Also, when I pasted the question in this forum, the cent symbol was replaced by '?'.
Can you please treat this as a priority and help me..
Thanks a lot
0
Comment
Question by:sharma_kv123
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
7 Comments
 
LVL 1

Expert Comment

by:Evlich
ID: 10807275
What I would look at is teh character set that is being used. There is a class in java.nio that deals with character sets and byte buffers and a bunch of stuff like that. Normally machines replace unknown characters with '?'s. If you exclusively pick a character set, it should work. Hope this, helps but if it doesn't it should give you a place to start.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10807323
Try

BufferedReader br = new BufferedReader(new InputStreamReader(is, "ISO8859-1"));
0
 

Author Comment

by:sharma_kv123
ID: 10808158
Thanks a lot. The problem has been solved with the solution you have given,
BufferedReader br = new BufferedReader(new InputStreamReader(inputFileName,"ISO-8859-1"));

and it works perfectly fine.
I have explicitly checked for the cent and the pound symbol and then parsed it.
Is there any generic code where I can parse all the special characters that are not in the keyboard and parse it. Becoz we donno what the XML contains as i am getting it from a legacy system. As I know the xml may contain cent symbol, i have parsed it and saved me for this time. Can you please give me a pseudo code where i can do a generic checking and parsing. That would really be a great help to me.
Thanks a lot for the previous answer.
You have been a great help to me  
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 86

Accepted Solution

by:
CEHJ earned 125 total points
ID: 10808194
There's no such thing as "generic checking and parsing" unfortunately. It's the responsibility of those providing the input to ensure that the input is encoded in a specific way that's suitable for the target program. Effectively it's not practicable for the receiving program to guess encodings
0
 

Author Comment

by:sharma_kv123
ID: 10808335
Thanks a lot Mr CEHJ.
I have decided to check for all the special characters that may come in the xml file. At least this would solve the problem to some extent.
Thanks again.
I really appreciate the way you have responded.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10808342
8-)
0
 

Author Comment

by:sharma_kv123
ID: 10834684
Hi
It is working fine with the special characters like cent and pound. When i added more special chars like ® (registration sign) and © (copyright sign), it being parsed properly. But when i replace it back just before i update in the database , it is being stored as ? but it works fine for cent and symbols.
String title = replace(title,"#xAE;", "®"); // for ® (registration sign)
I have function called "replace" where it replaces the text with old value(#xAE;) to the new value(® ).
Its working fine for cent and pounds but not working fine for other special chars
Is it anything do with the encoding. I have used ISO8859-1
Can you please tell me what should i do.. Its very urgent.
Thanks a lot
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

INTRODUCTION Working with files is a moderately common task in Java.  For most projects hard coding the file names, using parameters in configuration files, or using command-line arguments is sufficient.   However, when your application has vi…
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
Viewers learn how to read error messages and identify possible mistakes that could cause hours of frustration. Coding is as much about debugging your code as it is about writing it. Define Error Message: Line Numbers: Type of Error: Break Down…
Viewers will learn about arithmetic and Boolean expressions in Java and the logical operators used to create Boolean expressions. We will cover the symbols used for arithmetic expressions and define each logical operator and how to use them in Boole…

756 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question