Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1816
  • Last Modified:

URGENT: Convert Special Characters in XML

Hi, I'm working on a project.
When user submit a form, cgi will generate a email in xml format, it is something like this.

----------------------------------------------------------------------
<?xml version='1.0' encoding='ISO-8859-1'?>
<!-- start xml -->
<DATA>
<ACTION>
<CALL key="event-registration"></CALL>
</ACTION>
<CORE>
<ELEMENT name="euid">982</ELEMENT>
<ELEMENT name="stage">thankyou</ELEMENT>
<ELEMENT name="share_info">A & B & C </ELEMENT>
<ELEMENT name="special1">No<-----Agree  </ELEMENT>
</CORE>
</DATA>
<!---end of xml--->
----------------------------------------------------------------------
After the system recieve the email, my parser will use SAXParser and a selected defaultHandler to parse the xml data:

-----------------------------
                SAXHandler p = new SAXHandler(mail);
                SAXParserFactory factory = SAXParserFactory.newInstance();
                SAXParser parser = factory.newSAXParser();
                parser.parse(tempXML, p);
-----------------------------
As you can see, the element contains the special character (& and >).
For some special reason, I can't handle the special characters in cgi, so I have to take care of it in my java parser.
Can anyone know how to do it in a very simple way using SAXParser??
Assume you can convert it with String or file...

Thanks
0
joeyoungkc
Asked:
joeyoungkc
  • 9
  • 7
  • 5
1 Solution
 
CEHJCommented:
You can use this class to escape the xml:

      static class HTMLEscape {
            /**
             *  Description of the Method
             *
             * @param  s  Description of the Parameter
             * @return    Description of the Return Value
             */
            public static String escape(String s) {
                  int len = s.length();
                  StringBuffer sb = new StringBuffer(len * 5 / 4);

                  for (int i = 0; i < len; i++) {
                        char c = s.charAt(i);
                        String elem = htmlchars[c & 0xff];

                        sb.append(elem == null ? "" + c : elem);
                  }
                  return sb.toString();
            }


            private static String htmlchars[] = new String[256];

            static {
                  String entry[] = {
                              "nbsp", "iexcl", "cent", "pound", "curren", "yen", "brvbar",
                              "sect", "uml", "copy", "ordf", "laquo", "not", "shy", "reg",
                              "macr", "deg", "plusmn", "sup2", "sup3", "acute", "micro",
                              "para", "middot", "cedil", "sup1", "ordm", "raquo", "frac14",
                              "frac12", "frac34", "iquest",
                              "Agrave", "Aacute", "Acirc", "Atilde", "Auml", "Aring", "AElig",
                              "CCedil", "Egrave", "Eacute", "Ecirc", "Euml", "Igrave", "Iacute",
                              "Icirc", "Iuml", "ETH", "Ntilde", "Ograve", "Oacute", "Ocirc",
                              "Otilde", "Ouml", "times", "Oslash", "Ugrave", "Uacute", "Ucirc",
                              "Uuml", "Yacute", "THORN", "szlig",
                              "agrave", "aacute", "acirc", "atilde", "auml", "aring", "aelig",
                              "ccedil", "egrave", "eacute", "ecirc", "euml", "igrave", "iacute",
                              "icirc", "iuml", "eth", "ntilde", "ograve", "oacute", "ocirc",
                              "otilde", "ouml", "divid", "oslash", "ugrave", "uacute", "ucirc",
                              "uuml", "yacute", "thorn", "yuml"
                              };

                  htmlchars['&'] = "&amp;";
                  htmlchars['<'] = "&lt;";
                  htmlchars['>'] = "&gt;";

                  for (int c = '\u00A0', i = 0; c <= '\u00FF'; c++, i++) {
                        htmlchars[c] = "&" + entry[i] + ";";
                  }

                  for (int c = '\u0083', i = 131; c <= '\u009f'; c++, i++) {
                        htmlchars[c] = "&#" + i + ";";
                  }

                  htmlchars['\u0088'] = htmlchars['\u008D'] = htmlchars['\u008E'] = null;
                  htmlchars['\u008F'] = htmlchars['\u0090'] = htmlchars['\u0098'] = null;
                  htmlchars['\u009D'] = null;
            }

      }

}

0
 
joeyoungkcAuthor Commented:
It seems way too complicate.
Isn't it I only need to change the pre-defined charcter for xml
&     -> &amp;
 
<    ->  &lt;
 
>     ->   &gt;
 
"   ->    &quot;
 
'  ->   &apos;

???

 
0
 
CEHJCommented:
>>It seems way too complicate.

Why? All you need to do is call

s = HTMLEscape.escape(s);
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
objectsCommented:
try this, or use replaceAll() method in string:

class XMLUtil{

      private static String escapeChar(char c){
         switch(c){
            case('<')  : return "&lt;";
            case('&gt;')  : return "&gt;";  
            case('&')  : return "&amp;";
            case('\'') : return "&apos;";
            case('\"') : return "&quot;";                        
        }
        return null;    
      }
     
      public static String encodeChars(String string){
     
         if(string==null)
         return "null";
         int length = string.length();
         char[] characters = new char[length];
         string.getChars(0, length, characters, 0);
         StringBuffer encoded = new StringBuffer();
         String escape;
         for(int i = 0;i<length;i++){
            escape = escapeChar(characters[i]);
            if(escape == null) encoded.append(characters[i]);
               else encoded.append(escape);
         }
         return encoded.toString();
      }
               
      public static void main(String[] args){
         String test = "AP = ' QT = \" AMP = & LT = < GT = &gt; ";
         System.out.println(encodeChars(test));
      }
}
0
 
objectsCommented:
eg.

s = s.replaceAll(""\"", "&quot;");
0
 
CEHJCommented:
Not sure why you'd want to remove the functionality of replacing other characters that need replacing ...
0
 
objectsCommented:
If you need to add support for more characters, simply add them to  the esacpeChar switch stement, or add another replaceAll() call if using that.
0
 
CEHJCommented:
>>If you need to add support for more characters

You do need to, or you won't find out until it's too late ;-) That's why the class i posted is written in that way
0
 
joeyoungkcAuthor Commented:
The problem is the org line is
<ELEMENT name="share_info">A & B & C </ELEMENT>
and i want to change it to
<ELEMENT name="share_info">A & B &amp; C </ELEMENT>

but our guys' method will change to

&lt;ELEMENT name="postal_zip"&gt;A & B &gt; C&lt;/ELEMENT&gt;

....
0
 
objectsCommented:
you need to parse the line, and only convert the value, not the tags.
0
 
CEHJCommented:
Try
String[] text = line.split("<[^>]+>");
if (text.length == 1) {
    String s = HTMLEscape(text[0]);
}
0
 
CEHJCommented:
>>String s = HTMLEscape(text[0]);

was meant to be

String s = HTMLEscape.escape(text[0]);
0
 
objectsCommented:
> For some special reason, I can't handle the special characters in cgi,

whats the special reason :)
0
 
joeyoungkcAuthor Commented:
The reason is there has so many cgis and i don't want to change it one by one
=)
So the ideal solution is to solve it during parsing.i used
String[] text = line.split("<[^>]+>");
if (text.length == 1) {
    String s = HTMLEscape(text[0]);
}

and modify a lot and it works.

Thanks a lot.
Joe
0
 
CEHJCommented:
joeyoungkc, can you tell me how

>>whats the special reason :)

can be an answer to this question?
0
 
joeyoungkcAuthor Commented:
The reason is there has so many cgis that creates same kinds of xml and i don't want to change it one by one, so the best way is to handle the problem in the parser.
0
 
joeyoungkcAuthor Commented:
>>whats the special reason :)
that was typo....
0
 
objectsCommented:
> So the ideal solution is to solve it during parsing.i used
> String[] text = line.split("<[^>]+>");
> if (text.length == 1) {
>     String s = HTMLEscape(text[0]);
> }

Thats not ideal, in fact its not even safe. You'll still end up with corrupted XML data.

How many cgi's are involved?  The change required to each is fairly minor, and then you won't have to worry about anything during parsing.
0
 
objectsCommented:
And the time spent changing the cgi's is going to save you time sorting out problems in the future (not to mention time spent trying to work out how to handle it during parsing). And in the long run you may find you have to change the cgi's anyway :)

How is the cgi currently generating the xml, and how does it get passed for parsing?
0
 
objectsCommented:
If you can't be convinced then the following regex will work a lot better for pulling the value out. You may need to tweak it a little depending on exactly what you need to deal with and how it is delivered to you but you should get the idea. Give me a yell if you have any questions :)

Pattern p = Pattern.compile("<ELEMENT name=\"(.+?)\">(.*?)</ELEMENT>");
Matcher m = p.matcher(s);
if (m.matches())
{
   String name = m.group(1);
   String  value = m.group(2);
   System.out.println(name+"="+value);
   // XMLUtil handles all the encoding that you need to worry about
   // though depending on what you're doing with the parsed data you may not need to worry about it at all.
   value = XMLUtil.encodeChars(value);

}

Though as I've already mentioned I'd strongly suggest biting the bullet and fixing your cgi's.

<ELEMENT name=\"share_info\"><![CDATA[A & B & C ]]></ELEMENT>
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

  • 9
  • 7
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now