Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2854
  • Last Modified:

Easy points. Question on XML Escape character.

Hi!

Again, as always, I am too lazy to research and I like giving out points:

I have a method that genrates an XML Document.

Here's just the portion of it:

FactorModel fm = (FactorModel)itFactor.next();
Element factor_name = doc.createElement(FACTOR_NAME_ELEMENT);
factor_name.setAttribute(ID_ATTRIBUTE,String.valueOf(fm.getFactor_Id_Pk()));
factor_name.setAttribute(NAME_ATTRIBUTE,fm.getFactor_Name());
if (fm.getAnswer().getAnswerValue()!=null)
      factor_name.appendChild(doc.createTextNode(fm.getAnswer().getAnswerValue()));

So if the fm.getAnswer().getAnswerValue() has invalid characthers like "/" that screws up the xml when it is return as a String, what's the best way to escape it?


A+ Answer to first working solution!  thanks for helping....

0
gaetansavoie
Asked:
gaetansavoie
  • 5
  • 5
  • 2
1 Solution
 
Tommy BraasCommented:
Add as a CDATA element
0
 
CEHJCommented:
Just call

String s = HTMLEscape.escape(s);

on the below:

SNIP====================================================================

static class HTMLEscape {
            /**
             *  Escape xml/html characters             *
             * @param  s  The string to esacpe
             * @return      The escaped String
             */
            public static String escape(String s) {
                  int len = s.length();
                  StringBuffer sb = new StringBuffer(len * 5 / 4);

                  for (int i = 0; i < len; i++) {
                        char c = s.charAt(i);
                        String elem = htmlchars[c & 0xff];

                        sb.append(elem == null ? "" + c : elem);
                  }
                  return sb.toString();
            }


            private static String htmlchars[] = new String[256];

            static {
                  String entry[] = {
                              "nbsp", "iexcl", "cent", "pound", "curren", "yen", "brvbar",
                              "sect", "uml", "copy", "ordf", "laquo", "not", "shy", "reg",
                              "macr", "deg", "plusmn", "sup2", "sup3", "acute", "micro",
                              "para", "middot", "cedil", "sup1", "ordm", "raquo", "frac14",
                              "frac12", "frac34", "iquest",
                              "Agrave", "Aacute", "Acirc", "Atilde", "Auml", "Aring", "AElig",
                              "CCedil", "Egrave", "Eacute", "Ecirc", "Euml", "Igrave", "Iacute",
                              "Icirc", "Iuml", "ETH", "Ntilde", "Ograve", "Oacute", "Ocirc",
                              "Otilde", "Ouml", "times", "Oslash", "Ugrave", "Uacute", "Ucirc",
                              "Uuml", "Yacute", "THORN", "szlig",
                              "agrave", "aacute", "acirc", "atilde", "auml", "aring", "aelig",
                              "ccedil", "egrave", "eacute", "ecirc", "euml", "igrave", "iacute",
                              "icirc", "iuml", "eth", "ntilde", "ograve", "oacute", "ocirc",
                              "otilde", "ouml", "divid", "oslash", "ugrave", "uacute", "ucirc",
                              "uuml", "yacute", "thorn", "yuml"
                              };

                  htmlchars['&'] = "&amp;";
                  htmlchars['<'] = "&lt;";
                  htmlchars['>'] = "&gt;";

                  for (int c = '\u00A0', i = 0; c <= '\u00FF'; c++, i++) {
                        htmlchars[c] = "&" + entry[i] + ";";
                  }

                  for (int c = '\u0083', i = 131; c <= '\u009f'; c++, i++) {
                        htmlchars[c] = "&#" + i + ";";
                  }

                  htmlchars['\u0088'] = htmlchars['\u008D'] = htmlchars['\u008E'] = null;
                  htmlchars['\u008F'] = htmlchars['\u0090'] = htmlchars['\u0098'] = null;
                  htmlchars['\u009D'] = null;
            }

      }
0
 
Tommy BraasCommented:
oh, if you're using a home grown DOM model, you might have found a bug.

0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
Tommy BraasCommented:
hehe, to slow with the post...
0
 
CEHJCommented:
8-)
0
 
gaetansavoieAuthor Commented:
I accepted answer from CEHJ because you cannot nest CDATA and that caused problems for me! :(  


Thanks CEHJ
0
 
Tommy BraasCommented:
Btw, the escaping proposed by CEHJ should not be used. There are only a couple of characters that ever need to be escaped in XML. Always save XML documents in UTF-8 file encoding, no need ever to escape. The following  characters should be escaped:
<!ENTITY lt     "&#38;#60;">
<!ENTITY gt     "&#62;">
<!ENTITY amp    "&#38;#38;">
<!ENTITY apos   "&#39;">
<!ENTITY quot   "&#34;">
0
 
CEHJCommented:
>>I accepted answer from CEHJ because you cannot nest CDATA

You certainly can't insert CDATA aribitrarily

>>There are only a couple of characters...

Basically the same ones as html AFAIK
0
 
gaetansavoieAuthor Commented:
Thanks for comment CEHJ.  It's hard to explain but we have a special case where an element is actually an XML provided by client with CDATA already in it!  

And according to MSSoft, you cannot nest them which was my problem:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/xmlsdk30/htm/xmconcdatamarkedsections.asp

"Note   Content within CDATA sections must be within the range of characters permitted for XML content; control characters and compatibility characters cannot be escaped this way. In addition, the sequence ]]> cannot appear within a CDATA section because this sequence signals the end of the section. This means that CDATA sections cannot be nested. The sequence also appears in some scripts. Within scripts, it is usually possible tosubstitute ] ]> for ]]>."


Escaping worked so thanks again.  I am sure it's not trapping for every scenario but it did in my case!
0
 
CEHJCommented:
>>And according to MSSoft, you cannot nest them which was my problem:

Yes that's right. Might even be wise to add '[' and ']' to that table ;-) Although the parser should probably ignore them if they don't occur together
0
 
Tommy BraasCommented:
According to the XML standard available at http://www.w3c.org it states that. The reason why you cannot nest CDATA elements is simply that the parser never even tries to parse the CDATA. The definition of CDATA is an element which is not parsed.

Like I said above, no need to escape anything but: < > ' " &
0
 
CEHJCommented:
>>The reason why you cannot nest CDATA elements is simply that the parser never even tries to parse the CDATA.

That actually seems like a contradiction in terms. If it's noticed a CDATA element that's not going to be parsed, why should it worry how much nesting there is?
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 5
  • 5
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now