We help IT Professionals succeed at work.

Check out our new AWS podcast with Certified Expert, Phil Phillips! Listen to "How to Execute a Seamless AWS Migration" on EE or on your favorite podcast platform. Listen Now

x

XML Parsing for special characters

sharma_kv123
sharma_kv123 asked
on
Medium Priority
832 Views
Last Modified: 2008-02-01
Hello
I have a XML file which contains some special characters like 'cent symbol', 'pound symbol' .My java program parses the special characters and replaces them with #xA2;( for cent).When i run it in the unix box, the cent symbol is being interpreted as '?'.Whereas when i run the same code in Windows machine, it is being interpreted as cent and is working fine.Can u please tell me why the specific code is not working in Unix box but working in Windows. I would really appreciate if i get some very quick replies.

InputSteam is = new FileInputStream("some.xml");
BufferedReader br = new BufferedReader(new InputStreamReader(is));
while ((line = br.readLine())!= null) {
if(line.indexOf('\u00A2')!=-1) {
//or if(line.indexOf('#xA2;')!=-1) {
System.out.println(" cent found" );
}

I didnt get any replies from various forums I have posted.Pls help me with the solution
Also, when I pasted the question in this forum, the cent symbol was replaced by '?'.
Can you please treat this as a priority and help me..
Thanks a lot
Comment
Watch Question

Commented:
What I would look at is teh character set that is being used. There is a class in java.nio that deals with character sets and byte buffers and a bunch of stuff like that. Normally machines replace unknown characters with '?'s. If you exclusively pick a character set, it should work. Hope this, helps but if it doesn't it should give you a place to start.
CERTIFIED EXPERT
Top Expert 2016

Commented:
Try

BufferedReader br = new BufferedReader(new InputStreamReader(is, "ISO8859-1"));

Author

Commented:
Thanks a lot. The problem has been solved with the solution you have given,
BufferedReader br = new BufferedReader(new InputStreamReader(inputFileName,"ISO-8859-1"));

and it works perfectly fine.
I have explicitly checked for the cent and the pound symbol and then parsed it.
Is there any generic code where I can parse all the special characters that are not in the keyboard and parse it. Becoz we donno what the XML contains as i am getting it from a legacy system. As I know the xml may contain cent symbol, i have parsed it and saved me for this time. Can you please give me a pseudo code where i can do a generic checking and parsing. That would really be a great help to me.
Thanks a lot for the previous answer.
You have been a great help to me  
CERTIFIED EXPERT
Top Expert 2016
Commented:
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION

Author

Commented:
Thanks a lot Mr CEHJ.
I have decided to check for all the special characters that may come in the xml file. At least this would solve the problem to some extent.
Thanks again.
I really appreciate the way you have responded.
CERTIFIED EXPERT
Top Expert 2016

Commented:
8-)

Author

Commented:
Hi
It is working fine with the special characters like cent and pound. When i added more special chars like ® (registration sign) and © (copyright sign), it being parsed properly. But when i replace it back just before i update in the database , it is being stored as ? but it works fine for cent and symbols.
String title = replace(title,"#xAE;", "®"); // for ® (registration sign)
I have function called "replace" where it replaces the text with old value(#xAE;) to the new value(® ).
Its working fine for cent and pounds but not working fine for other special chars
Is it anything do with the encoding. I have used ISO8859-1
Can you please tell me what should i do.. Its very urgent.
Thanks a lot
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a sample view!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.