Solved

Easy points. Question on XML Escape character.

Posted on 2004-04-07
12
2,822 Views
Last Modified: 2013-11-19
Hi!

Again, as always, I am too lazy to research and I like giving out points:

I have a method that genrates an XML Document.

Here's just the portion of it:

FactorModel fm = (FactorModel)itFactor.next();
Element factor_name = doc.createElement(FACTOR_NAME_ELEMENT);
factor_name.setAttribute(ID_ATTRIBUTE,String.valueOf(fm.getFactor_Id_Pk()));
factor_name.setAttribute(NAME_ATTRIBUTE,fm.getFactor_Name());
if (fm.getAnswer().getAnswerValue()!=null)
      factor_name.appendChild(doc.createTextNode(fm.getAnswer().getAnswerValue()));

So if the fm.getAnswer().getAnswerValue() has invalid characthers like "/" that screws up the xml when it is return as a String, what's the best way to escape it?


A+ Answer to first working solution!  thanks for helping....

0
Comment
Question by:gaetansavoie
  • 5
  • 5
  • 2
12 Comments
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 10777506
Add as a CDATA element
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 500 total points
ID: 10777509
Just call

String s = HTMLEscape.escape(s);

on the below:

SNIP====================================================================

static class HTMLEscape {
            /**
             *  Escape xml/html characters             *
             * @param  s  The string to esacpe
             * @return      The escaped String
             */
            public static String escape(String s) {
                  int len = s.length();
                  StringBuffer sb = new StringBuffer(len * 5 / 4);

                  for (int i = 0; i < len; i++) {
                        char c = s.charAt(i);
                        String elem = htmlchars[c & 0xff];

                        sb.append(elem == null ? "" + c : elem);
                  }
                  return sb.toString();
            }


            private static String htmlchars[] = new String[256];

            static {
                  String entry[] = {
                              "nbsp", "iexcl", "cent", "pound", "curren", "yen", "brvbar",
                              "sect", "uml", "copy", "ordf", "laquo", "not", "shy", "reg",
                              "macr", "deg", "plusmn", "sup2", "sup3", "acute", "micro",
                              "para", "middot", "cedil", "sup1", "ordm", "raquo", "frac14",
                              "frac12", "frac34", "iquest",
                              "Agrave", "Aacute", "Acirc", "Atilde", "Auml", "Aring", "AElig",
                              "CCedil", "Egrave", "Eacute", "Ecirc", "Euml", "Igrave", "Iacute",
                              "Icirc", "Iuml", "ETH", "Ntilde", "Ograve", "Oacute", "Ocirc",
                              "Otilde", "Ouml", "times", "Oslash", "Ugrave", "Uacute", "Ucirc",
                              "Uuml", "Yacute", "THORN", "szlig",
                              "agrave", "aacute", "acirc", "atilde", "auml", "aring", "aelig",
                              "ccedil", "egrave", "eacute", "ecirc", "euml", "igrave", "iacute",
                              "icirc", "iuml", "eth", "ntilde", "ograve", "oacute", "ocirc",
                              "otilde", "ouml", "divid", "oslash", "ugrave", "uacute", "ucirc",
                              "uuml", "yacute", "thorn", "yuml"
                              };

                  htmlchars['&'] = "&amp;";
                  htmlchars['<'] = "&lt;";
                  htmlchars['>'] = "&gt;";

                  for (int c = '\u00A0', i = 0; c <= '\u00FF'; c++, i++) {
                        htmlchars[c] = "&" + entry[i] + ";";
                  }

                  for (int c = '\u0083', i = 131; c <= '\u009f'; c++, i++) {
                        htmlchars[c] = "&#" + i + ";";
                  }

                  htmlchars['\u0088'] = htmlchars['\u008D'] = htmlchars['\u008E'] = null;
                  htmlchars['\u008F'] = htmlchars['\u0090'] = htmlchars['\u0098'] = null;
                  htmlchars['\u009D'] = null;
            }

      }
0
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 10777549
oh, if you're using a home grown DOM model, you might have found a bug.

0
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 10777555
hehe, to slow with the post...
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10777575
8-)
0
 
LVL 2

Author Comment

by:gaetansavoie
ID: 10777616
I accepted answer from CEHJ because you cannot nest CDATA and that caused problems for me! :(  


Thanks CEHJ
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 14

Expert Comment

by:Tommy Braas
ID: 10777632
Btw, the escaping proposed by CEHJ should not be used. There are only a couple of characters that ever need to be escaped in XML. Always save XML documents in UTF-8 file encoding, no need ever to escape. The following  characters should be escaped:
<!ENTITY lt     "&#38;#60;">
<!ENTITY gt     "&#62;">
<!ENTITY amp    "&#38;#38;">
<!ENTITY apos   "&#39;">
<!ENTITY quot   "&#34;">
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10777670
>>I accepted answer from CEHJ because you cannot nest CDATA

You certainly can't insert CDATA aribitrarily

>>There are only a couple of characters...

Basically the same ones as html AFAIK
0
 
LVL 2

Author Comment

by:gaetansavoie
ID: 10777802
Thanks for comment CEHJ.  It's hard to explain but we have a special case where an element is actually an XML provided by client with CDATA already in it!  

And according to MSSoft, you cannot nest them which was my problem:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/xmlsdk30/htm/xmconcdatamarkedsections.asp

"Note   Content within CDATA sections must be within the range of characters permitted for XML content; control characters and compatibility characters cannot be escaped this way. In addition, the sequence ]]> cannot appear within a CDATA section because this sequence signals the end of the section. This means that CDATA sections cannot be nested. The sequence also appears in some scripts. Within scripts, it is usually possible tosubstitute ] ]> for ]]>."


Escaping worked so thanks again.  I am sure it's not trapping for every scenario but it did in my case!
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10777848
>>And according to MSSoft, you cannot nest them which was my problem:

Yes that's right. Might even be wise to add '[' and ']' to that table ;-) Although the parser should probably ignore them if they don't occur together
0
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 10778165
According to the XML standard available at http://www.w3c.org it states that. The reason why you cannot nest CDATA elements is simply that the parser never even tries to parse the CDATA. The definition of CDATA is an element which is not parsed.

Like I said above, no need to escape anything but: < > ' " &
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10778220
>>The reason why you cannot nest CDATA elements is simply that the parser never even tries to parse the CDATA.

That actually seems like a contradiction in terms. If it's noticed a CDATA element that's not going to be parsed, why should it worry how much nesting there is?
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
java set up 1 46
Python Assistance 7 31
site launch date and last modified date 3 49
maven project error 5 19
Java Flight Recorder and Java Mission Control together create a complete tool chain to continuously collect low level and detailed runtime information enabling after-the-fact incident analysis. Java Flight Recorder is a profiling and event collectio…
SASS allows you to treat your CSS code in a more OOP way. Let's have a look on how you can structure your code in order for it to be easily maintained and reused.
This tutorial covers a step-by-step guide to install VisualVM launcher in eclipse.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now