Solved

XML Parsing for special characters

Posted on 2004-04-12
7
764 Views
Last Modified: 2008-02-01
Hello
I have a XML file which contains some special characters like 'cent symbol', 'pound symbol' .My java program parses the special characters and replaces them with #xA2;( for cent).When i run it in the unix box, the cent symbol is being interpreted as '?'.Whereas when i run the same code in Windows machine, it is being interpreted as cent and is working fine.Can u please tell me why the specific code is not working in Unix box but working in Windows. I would really appreciate if i get some very quick replies.

InputSteam is = new FileInputStream("some.xml");
BufferedReader br = new BufferedReader(new InputStreamReader(is));
while ((line = br.readLine())!= null) {
if(line.indexOf('\u00A2')!=-1) {
//or if(line.indexOf('#xA2;')!=-1) {
System.out.println(" cent found" );
}

I didnt get any replies from various forums I have posted.Pls help me with the solution
Also, when I pasted the question in this forum, the cent symbol was replaced by '?'.
Can you please treat this as a priority and help me..
Thanks a lot
0
Comment
Question by:sharma_kv123
  • 3
  • 3
7 Comments
 
LVL 1

Expert Comment

by:Evlich
Comment Utility
What I would look at is teh character set that is being used. There is a class in java.nio that deals with character sets and byte buffers and a bunch of stuff like that. Normally machines replace unknown characters with '?'s. If you exclusively pick a character set, it should work. Hope this, helps but if it doesn't it should give you a place to start.
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
Try

BufferedReader br = new BufferedReader(new InputStreamReader(is, "ISO8859-1"));
0
 

Author Comment

by:sharma_kv123
Comment Utility
Thanks a lot. The problem has been solved with the solution you have given,
BufferedReader br = new BufferedReader(new InputStreamReader(inputFileName,"ISO-8859-1"));

and it works perfectly fine.
I have explicitly checked for the cent and the pound symbol and then parsed it.
Is there any generic code where I can parse all the special characters that are not in the keyboard and parse it. Becoz we donno what the XML contains as i am getting it from a legacy system. As I know the xml may contain cent symbol, i have parsed it and saved me for this time. Can you please give me a pseudo code where i can do a generic checking and parsing. That would really be a great help to me.
Thanks a lot for the previous answer.
You have been a great help to me  
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 
LVL 86

Accepted Solution

by:
CEHJ earned 125 total points
Comment Utility
There's no such thing as "generic checking and parsing" unfortunately. It's the responsibility of those providing the input to ensure that the input is encoded in a specific way that's suitable for the target program. Effectively it's not practicable for the receiving program to guess encodings
0
 

Author Comment

by:sharma_kv123
Comment Utility
Thanks a lot Mr CEHJ.
I have decided to check for all the special characters that may come in the xml file. At least this would solve the problem to some extent.
Thanks again.
I really appreciate the way you have responded.
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
8-)
0
 

Author Comment

by:sharma_kv123
Comment Utility
Hi
It is working fine with the special characters like cent and pound. When i added more special chars like ® (registration sign) and © (copyright sign), it being parsed properly. But when i replace it back just before i update in the database , it is being stored as ? but it works fine for cent and symbols.
String title = replace(title,"#xAE;", "®"); // for ® (registration sign)
I have function called "replace" where it replaces the text with old value(#xAE;) to the new value(® ).
Its working fine for cent and pounds but not working fine for other special chars
Is it anything do with the encoding. I have used ISO8859-1
Can you please tell me what should i do.. Its very urgent.
Thanks a lot
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

Java had always been an easily readable and understandable language.  Some relatively recent changes in the language seem to be changing this pretty fast, and anyone that had not seen any Java code for the last 5 years will possibly have issues unde…
Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
Viewers learn about the scanner class in this video and are introduced to receiving user input for their programs. Additionally, objects, conditional statements, and loops are used to help reinforce the concepts. Introduce Scanner class: Importing…
Viewers will learn about the regular for loop in Java and how to use it. Definition: Break the for loop down into 3 parts: Syntax when using for loops: Example using a for loop:

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now