Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Easy points. Question on XML Escape character.

Posted on 2004-04-07
12
Medium Priority
?
2,850 Views
Last Modified: 2013-11-19
Hi!

Again, as always, I am too lazy to research and I like giving out points:

I have a method that genrates an XML Document.

Here's just the portion of it:

FactorModel fm = (FactorModel)itFactor.next();
Element factor_name = doc.createElement(FACTOR_NAME_ELEMENT);
factor_name.setAttribute(ID_ATTRIBUTE,String.valueOf(fm.getFactor_Id_Pk()));
factor_name.setAttribute(NAME_ATTRIBUTE,fm.getFactor_Name());
if (fm.getAnswer().getAnswerValue()!=null)
      factor_name.appendChild(doc.createTextNode(fm.getAnswer().getAnswerValue()));

So if the fm.getAnswer().getAnswerValue() has invalid characthers like "/" that screws up the xml when it is return as a String, what's the best way to escape it?


A+ Answer to first working solution!  thanks for helping....

0
Comment
Question by:gaetansavoie
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
  • 2
12 Comments
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 10777506
Add as a CDATA element
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 2000 total points
ID: 10777509
Just call

String s = HTMLEscape.escape(s);

on the below:

SNIP====================================================================

static class HTMLEscape {
            /**
             *  Escape xml/html characters             *
             * @param  s  The string to esacpe
             * @return      The escaped String
             */
            public static String escape(String s) {
                  int len = s.length();
                  StringBuffer sb = new StringBuffer(len * 5 / 4);

                  for (int i = 0; i < len; i++) {
                        char c = s.charAt(i);
                        String elem = htmlchars[c & 0xff];

                        sb.append(elem == null ? "" + c : elem);
                  }
                  return sb.toString();
            }


            private static String htmlchars[] = new String[256];

            static {
                  String entry[] = {
                              "nbsp", "iexcl", "cent", "pound", "curren", "yen", "brvbar",
                              "sect", "uml", "copy", "ordf", "laquo", "not", "shy", "reg",
                              "macr", "deg", "plusmn", "sup2", "sup3", "acute", "micro",
                              "para", "middot", "cedil", "sup1", "ordm", "raquo", "frac14",
                              "frac12", "frac34", "iquest",
                              "Agrave", "Aacute", "Acirc", "Atilde", "Auml", "Aring", "AElig",
                              "CCedil", "Egrave", "Eacute", "Ecirc", "Euml", "Igrave", "Iacute",
                              "Icirc", "Iuml", "ETH", "Ntilde", "Ograve", "Oacute", "Ocirc",
                              "Otilde", "Ouml", "times", "Oslash", "Ugrave", "Uacute", "Ucirc",
                              "Uuml", "Yacute", "THORN", "szlig",
                              "agrave", "aacute", "acirc", "atilde", "auml", "aring", "aelig",
                              "ccedil", "egrave", "eacute", "ecirc", "euml", "igrave", "iacute",
                              "icirc", "iuml", "eth", "ntilde", "ograve", "oacute", "ocirc",
                              "otilde", "ouml", "divid", "oslash", "ugrave", "uacute", "ucirc",
                              "uuml", "yacute", "thorn", "yuml"
                              };

                  htmlchars['&'] = "&amp;";
                  htmlchars['<'] = "&lt;";
                  htmlchars['>'] = "&gt;";

                  for (int c = '\u00A0', i = 0; c <= '\u00FF'; c++, i++) {
                        htmlchars[c] = "&" + entry[i] + ";";
                  }

                  for (int c = '\u0083', i = 131; c <= '\u009f'; c++, i++) {
                        htmlchars[c] = "&#" + i + ";";
                  }

                  htmlchars['\u0088'] = htmlchars['\u008D'] = htmlchars['\u008E'] = null;
                  htmlchars['\u008F'] = htmlchars['\u0090'] = htmlchars['\u0098'] = null;
                  htmlchars['\u009D'] = null;
            }

      }
0
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 10777549
oh, if you're using a home grown DOM model, you might have found a bug.

0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 14

Expert Comment

by:Tommy Braas
ID: 10777555
hehe, to slow with the post...
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10777575
8-)
0
 
LVL 2

Author Comment

by:gaetansavoie
ID: 10777616
I accepted answer from CEHJ because you cannot nest CDATA and that caused problems for me! :(  


Thanks CEHJ
0
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 10777632
Btw, the escaping proposed by CEHJ should not be used. There are only a couple of characters that ever need to be escaped in XML. Always save XML documents in UTF-8 file encoding, no need ever to escape. The following  characters should be escaped:
<!ENTITY lt     "&#38;#60;">
<!ENTITY gt     "&#62;">
<!ENTITY amp    "&#38;#38;">
<!ENTITY apos   "&#39;">
<!ENTITY quot   "&#34;">
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10777670
>>I accepted answer from CEHJ because you cannot nest CDATA

You certainly can't insert CDATA aribitrarily

>>There are only a couple of characters...

Basically the same ones as html AFAIK
0
 
LVL 2

Author Comment

by:gaetansavoie
ID: 10777802
Thanks for comment CEHJ.  It's hard to explain but we have a special case where an element is actually an XML provided by client with CDATA already in it!  

And according to MSSoft, you cannot nest them which was my problem:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/xmlsdk30/htm/xmconcdatamarkedsections.asp

"Note   Content within CDATA sections must be within the range of characters permitted for XML content; control characters and compatibility characters cannot be escaped this way. In addition, the sequence ]]> cannot appear within a CDATA section because this sequence signals the end of the section. This means that CDATA sections cannot be nested. The sequence also appears in some scripts. Within scripts, it is usually possible tosubstitute ] ]> for ]]>."


Escaping worked so thanks again.  I am sure it's not trapping for every scenario but it did in my case!
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10777848
>>And according to MSSoft, you cannot nest them which was my problem:

Yes that's right. Might even be wise to add '[' and ']' to that table ;-) Although the parser should probably ignore them if they don't occur together
0
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 10778165
According to the XML standard available at http://www.w3c.org it states that. The reason why you cannot nest CDATA elements is simply that the parser never even tries to parse the CDATA. The definition of CDATA is an element which is not parsed.

Like I said above, no need to escape anything but: < > ' " &
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10778220
>>The reason why you cannot nest CDATA elements is simply that the parser never even tries to parse the CDATA.

That actually seems like a contradiction in terms. If it's noticed a CDATA element that's not going to be parsed, why should it worry how much nesting there is?
0

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
In this post we will learn how to connect and configure Android Device (Smartphone etc.) with Android Studio. After that we will run a simple Hello World Program.
The viewer will learn the benefit of using external CSS files and the relationship between class and ID selectors. Create your external css file by saving it as style.css then set up your style tags: (CODE) Reference the nav tag and set your prop…
The viewer will learn the basics of jQuery including how to code hide show and toggles. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery…
Suggested Courses

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question