Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2510
  • Last Modified:

enocding ISO-8859-1/UTF-8 using com.thoughtworks.xstream.io.xml.DomDriver API

Hi Expert ,
I am using "xstream" api to write xml file . but if I hava  ISO-8859-1 parameter in th value of been
that I used to write Its not decoded it proper. Please help me ASAP.
I tried both ways
1)XStream xs = new XStream(new DomDriver(" ISO-8859-1"));
and
fis.write("<?xml version='1.0'  encoding=' ISO-8859-1' standalone='yes'?>".getBytes());

similarly
1)XStream xs = new XStream(new DomDriver(" "));
and
fis.write("<?xml version='1.0'  encoding=' ISO-8859-1' standalone='yes'?>".getBytes());

in below code.

XStream xs = new XStream(new DomDriver(""));
			
			
			FileOutputStream fis = new FileOutputStream(s);
			fis.write("<?xml version='1.0'  encoding='UTF-8' standalone='yes'?>".getBytes());
			fis.write("<?xml-stylesheet type='text/xsl'  href='mainpage.xsl'?>".getBytes());
			
			fis.write("<resellerlist xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>".getBytes());
			
			fis.write("<ruleparam>".getBytes());
			RuleGroupParamForm obj=(RuleGroupParamForm)session.getAttribute("rgparam");
			xs.alias("param", RuleGroupParamForm.class);
			xs.toXML(obj, fis);
			fis.write("</ruleparam>".getBytes());

Open in new window

0
Dehankar
Asked:
Dehankar
  • 6
  • 4
1 Solution
 
abelCommented:
What you need is to pass on the encoding string in the DomDriver constructor:

XStream xs = new XStream(new DomDriver("ISO-8859-1"));

that should do it.

0
 
DehankarAuthor Commented:
Hi,

I tried that also...see  line 1)XStream xs = new XStream(new DomDriver(" ISO-8859-1"));
But In My XSLT and XML I am using  ("<?xml version='1.0'  encoding='UTF-8' standalone='yes'>
...
0
 
abelCommented:
Ah, maybe I misunderstand the question. You mean you actually do not want ISO-8859-1 but instead you want it as the (much easier and general) UTF-8 Unicode encoding?

Btw, if you use XSLT anyway, the encoding should not really be a problem to you. You can specify the output encoding in the XSLT in the <xsl:output /> instruction. The encoding used for the source (or for the XSLT itself) is then of minor importance and all your control goes into one location, the XSLT.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
DehankarAuthor Commented:
My problem is that ..I develop web application using jsp+struts+xslt+xml,
the requirement is For UTF-8 support,Now there are certain text which is in russian language...
which do not support UTF-8....
I am able to store them in database using
  String h2=  new String(str.getHeadernote().getBytes("ISO-8859-1"),"UTF-8");
and its work...but while writing into xml using XStream it is writing ??????.
and same display on beowser too.... I know the problem is in writin bean--->xml using XStream APi...
what should I Do for it

FileOutputStream fis = new FileOutputStream(s);
                  fis.write("<?xml version='1.0'  encoding='UTF-8' standalone='yes'?>".getBytes());
                  fis.write("<?xml-stylesheet type='text/xsl'  href='mainpage.xsl'?>".getBytes());
                  
                  fis.write("<List xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>".getBytes());
                  
                  fis.write("<param>".getBytes());
                  ParamForm obj=(ParamForm)session.getAttribute("rgparam");
                  xs.alias("param", ParamForm.class);
                  xs.toXML(obj, fis);
                  fis.write("</param>".getBytes());
0
 
abelCommented:
There are some things that are weird here and I think it's wise to get these straightened out before we go further. I've worked a lot with many encodings and transformations/translations from one into another.

First issue: encoding
One (of many) traditional encodings for russian is ISO-8859-5, not ISO-8859-1. If you use ISO-8859-1 (Latin-1) instead of ISO-8859-5 (Cyrillic) you should see many strange characters.

Second issue: the question marks
Depending on the viewer you use, the question marks can mean many things. For instance, the question marks mean that the viewer (browser, text editor) does not understand what the encoding is (even if it is correctly UTF-8) and shows it using some other encoding, say US-ASCII, and replaces anything unknown with "?" mark.

But, it may also be that the source was unknown (using the wrong encoding, ISO-8859-1) and that the transformation results in real question marks (it is allowed by the XML specification to replace unknown characters with a question mark or a replacement char).

To find the cause of this, make sure the viewer you use can show you the difference or view in details using a hex viewer.

Third issue: understanding the roles of the different fases + solution
You say that you must store it using UTF-8. That is only at the total end of the chain. My suggestion is to let the tools you have do what they are good at. Use the original Russian encoding in the XML file (whether that's ISO-8859-5, KOI8-U, KOI8-R, KOI8-Unified, CP1251 or some Unicode encoding). Put that encoding in the XML header. Put the UTF-8 encoding in the xsl:output instruction in XSLT and the output must be (as per the XSLT specification) encoded using UTF-8 and your problems are solved and you don't need to try to find out what the real culprit is in this "alphabet soup" ;)

Hope this helps
-- Abel --
0
 
DehankarAuthor Commented:
Yes, its make sence but the way I encode string
 String h2=  new String(str.getHeadernote().getBytes("ISO-8859-1"),"UTF-8");(i.e java can read it)
this way ,I am looking for solution which can decode it and probably by using XStream api ...
which is actually writing the xml......
0
 
abelCommented:
Can you please check your encoding? You say "russian", but in all your examples you use "latin".
0
 
abelCommented:
Does that mean that it works now? You grade a B, that usually means that you need more clarification. Do you need some additional help?
0
 
DehankarAuthor Commented:
how can I ristrict user to enter invalid UTF-8 char?
0
 
abelCommented:
Users do not enter invalid characters. UTF-8 is an encoding. The receiving application (the application where the users type their messages) must write in UTF-8 and you should be done. Do you want to reopen the question?
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 6
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now