sharma_kv123
asked on
XML Parsing for special characters
Hello
I have a XML file which contains some special characters like 'cent symbol', 'pound symbol' .My java program parses the special characters and replaces them with #xA2;( for cent).When i run it in the unix box, the cent symbol is being interpreted as '?'.Whereas when i run the same code in Windows machine, it is being interpreted as cent and is working fine.Can u please tell me why the specific code is not working in Unix box but working in Windows. I would really appreciate if i get some very quick replies.
InputSteam is = new FileInputStream("some.xml" );
BufferedReader br = new BufferedReader(new InputStreamReader(is));
while ((line = br.readLine())!= null) {
if(line.indexOf('\u00A2')! =-1) {
//or if(line.indexOf('#xA2;')!= -1) {
System.out.println(" cent found" );
}
I didnt get any replies from various forums I have posted.Pls help me with the solution
Also, when I pasted the question in this forum, the cent symbol was replaced by '?'.
Can you please treat this as a priority and help me..
Thanks a lot
I have a XML file which contains some special characters like 'cent symbol', 'pound symbol' .My java program parses the special characters and replaces them with #xA2;( for cent).When i run it in the unix box, the cent symbol is being interpreted as '?'.Whereas when i run the same code in Windows machine, it is being interpreted as cent and is working fine.Can u please tell me why the specific code is not working in Unix box but working in Windows. I would really appreciate if i get some very quick replies.
InputSteam is = new FileInputStream("some.xml"
BufferedReader br = new BufferedReader(new InputStreamReader(is));
while ((line = br.readLine())!= null) {
if(line.indexOf('\u00A2')!
//or if(line.indexOf('#xA2;')!=
System.out.println(" cent found" );
}
I didnt get any replies from various forums I have posted.Pls help me with the solution
Also, when I pasted the question in this forum, the cent symbol was replaced by '?'.
Can you please treat this as a priority and help me..
Thanks a lot
What I would look at is teh character set that is being used. There is a class in java.nio that deals with character sets and byte buffers and a bunch of stuff like that. Normally machines replace unknown characters with '?'s. If you exclusively pick a character set, it should work. Hope this, helps but if it doesn't it should give you a place to start.
Try
BufferedReader br = new BufferedReader(new InputStreamReader(is, "ISO8859-1"));
BufferedReader br = new BufferedReader(new InputStreamReader(is, "ISO8859-1"));
ASKER
Thanks a lot. The problem has been solved with the solution you have given,
BufferedReader br = new BufferedReader(new InputStreamReader(inputFil eName,"ISO -8859-1")) ;
and it works perfectly fine.
I have explicitly checked for the cent and the pound symbol and then parsed it.
Is there any generic code where I can parse all the special characters that are not in the keyboard and parse it. Becoz we donno what the XML contains as i am getting it from a legacy system. As I know the xml may contain cent symbol, i have parsed it and saved me for this time. Can you please give me a pseudo code where i can do a generic checking and parsing. That would really be a great help to me.
Thanks a lot for the previous answer.
You have been a great help to me
BufferedReader br = new BufferedReader(new InputStreamReader(inputFil
and it works perfectly fine.
I have explicitly checked for the cent and the pound symbol and then parsed it.
Is there any generic code where I can parse all the special characters that are not in the keyboard and parse it. Becoz we donno what the XML contains as i am getting it from a legacy system. As I know the xml may contain cent symbol, i have parsed it and saved me for this time. Can you please give me a pseudo code where i can do a generic checking and parsing. That would really be a great help to me.
Thanks a lot for the previous answer.
You have been a great help to me
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks a lot Mr CEHJ.
I have decided to check for all the special characters that may come in the xml file. At least this would solve the problem to some extent.
Thanks again.
I really appreciate the way you have responded.
I have decided to check for all the special characters that may come in the xml file. At least this would solve the problem to some extent.
Thanks again.
I really appreciate the way you have responded.
8-)
ASKER
Hi
It is working fine with the special characters like cent and pound. When i added more special chars like ® (registration sign) and © (copyright sign), it being parsed properly. But when i replace it back just before i update in the database , it is being stored as ? but it works fine for cent and symbols.
String title = replace(title,"#xAE;", "®"); // for ® (registration sign)
I have function called "replace" where it replaces the text with old value(#xAE;) to the new value(® ).
Its working fine for cent and pounds but not working fine for other special chars
Is it anything do with the encoding. I have used ISO8859-1
Can you please tell me what should i do.. Its very urgent.
Thanks a lot
It is working fine with the special characters like cent and pound. When i added more special chars like ® (registration sign) and © (copyright sign), it being parsed properly. But when i replace it back just before i update in the database , it is being stored as ? but it works fine for cent and symbols.
String title = replace(title,"#xAE;", "®"); // for ® (registration sign)
I have function called "replace" where it replaces the text with old value(#xAE;) to the new value(® ).
Its working fine for cent and pounds but not working fine for other special chars
Is it anything do with the encoding. I have used ISO8859-1
Can you please tell me what should i do.. Its very urgent.
Thanks a lot