Solved

Easy points. Question on XML Escape character.

Posted on 2004-04-07
12
2,833 Views
Last Modified: 2013-11-19
Hi!

Again, as always, I am too lazy to research and I like giving out points:

I have a method that genrates an XML Document.

Here's just the portion of it:

FactorModel fm = (FactorModel)itFactor.next();
Element factor_name = doc.createElement(FACTOR_NAME_ELEMENT);
factor_name.setAttribute(ID_ATTRIBUTE,String.valueOf(fm.getFactor_Id_Pk()));
factor_name.setAttribute(NAME_ATTRIBUTE,fm.getFactor_Name());
if (fm.getAnswer().getAnswerValue()!=null)
      factor_name.appendChild(doc.createTextNode(fm.getAnswer().getAnswerValue()));

So if the fm.getAnswer().getAnswerValue() has invalid characthers like "/" that screws up the xml when it is return as a String, what's the best way to escape it?


A+ Answer to first working solution!  thanks for helping....

0
Comment
Question by:gaetansavoie
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
  • 2
12 Comments
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 10777506
Add as a CDATA element
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 500 total points
ID: 10777509
Just call

String s = HTMLEscape.escape(s);

on the below:

SNIP====================================================================

static class HTMLEscape {
            /**
             *  Escape xml/html characters             *
             * @param  s  The string to esacpe
             * @return      The escaped String
             */
            public static String escape(String s) {
                  int len = s.length();
                  StringBuffer sb = new StringBuffer(len * 5 / 4);

                  for (int i = 0; i < len; i++) {
                        char c = s.charAt(i);
                        String elem = htmlchars[c & 0xff];

                        sb.append(elem == null ? "" + c : elem);
                  }
                  return sb.toString();
            }


            private static String htmlchars[] = new String[256];

            static {
                  String entry[] = {
                              "nbsp", "iexcl", "cent", "pound", "curren", "yen", "brvbar",
                              "sect", "uml", "copy", "ordf", "laquo", "not", "shy", "reg",
                              "macr", "deg", "plusmn", "sup2", "sup3", "acute", "micro",
                              "para", "middot", "cedil", "sup1", "ordm", "raquo", "frac14",
                              "frac12", "frac34", "iquest",
                              "Agrave", "Aacute", "Acirc", "Atilde", "Auml", "Aring", "AElig",
                              "CCedil", "Egrave", "Eacute", "Ecirc", "Euml", "Igrave", "Iacute",
                              "Icirc", "Iuml", "ETH", "Ntilde", "Ograve", "Oacute", "Ocirc",
                              "Otilde", "Ouml", "times", "Oslash", "Ugrave", "Uacute", "Ucirc",
                              "Uuml", "Yacute", "THORN", "szlig",
                              "agrave", "aacute", "acirc", "atilde", "auml", "aring", "aelig",
                              "ccedil", "egrave", "eacute", "ecirc", "euml", "igrave", "iacute",
                              "icirc", "iuml", "eth", "ntilde", "ograve", "oacute", "ocirc",
                              "otilde", "ouml", "divid", "oslash", "ugrave", "uacute", "ucirc",
                              "uuml", "yacute", "thorn", "yuml"
                              };

                  htmlchars['&'] = "&amp;";
                  htmlchars['<'] = "&lt;";
                  htmlchars['>'] = "&gt;";

                  for (int c = '\u00A0', i = 0; c <= '\u00FF'; c++, i++) {
                        htmlchars[c] = "&" + entry[i] + ";";
                  }

                  for (int c = '\u0083', i = 131; c <= '\u009f'; c++, i++) {
                        htmlchars[c] = "&#" + i + ";";
                  }

                  htmlchars['\u0088'] = htmlchars['\u008D'] = htmlchars['\u008E'] = null;
                  htmlchars['\u008F'] = htmlchars['\u0090'] = htmlchars['\u0098'] = null;
                  htmlchars['\u009D'] = null;
            }

      }
0
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 10777549
oh, if you're using a home grown DOM model, you might have found a bug.

0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 14

Expert Comment

by:Tommy Braas
ID: 10777555
hehe, to slow with the post...
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10777575
8-)
0
 
LVL 2

Author Comment

by:gaetansavoie
ID: 10777616
I accepted answer from CEHJ because you cannot nest CDATA and that caused problems for me! :(  


Thanks CEHJ
0
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 10777632
Btw, the escaping proposed by CEHJ should not be used. There are only a couple of characters that ever need to be escaped in XML. Always save XML documents in UTF-8 file encoding, no need ever to escape. The following  characters should be escaped:
<!ENTITY lt     "&#38;#60;">
<!ENTITY gt     "&#62;">
<!ENTITY amp    "&#38;#38;">
<!ENTITY apos   "&#39;">
<!ENTITY quot   "&#34;">
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10777670
>>I accepted answer from CEHJ because you cannot nest CDATA

You certainly can't insert CDATA aribitrarily

>>There are only a couple of characters...

Basically the same ones as html AFAIK
0
 
LVL 2

Author Comment

by:gaetansavoie
ID: 10777802
Thanks for comment CEHJ.  It's hard to explain but we have a special case where an element is actually an XML provided by client with CDATA already in it!  

And according to MSSoft, you cannot nest them which was my problem:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/xmlsdk30/htm/xmconcdatamarkedsections.asp

"Note   Content within CDATA sections must be within the range of characters permitted for XML content; control characters and compatibility characters cannot be escaped this way. In addition, the sequence ]]> cannot appear within a CDATA section because this sequence signals the end of the section. This means that CDATA sections cannot be nested. The sequence also appears in some scripts. Within scripts, it is usually possible tosubstitute ] ]> for ]]>."


Escaping worked so thanks again.  I am sure it's not trapping for every scenario but it did in my case!
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10777848
>>And according to MSSoft, you cannot nest them which was my problem:

Yes that's right. Might even be wise to add '[' and ']' to that table ;-) Although the parser should probably ignore them if they don't occur together
0
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 10778165
According to the XML standard available at http://www.w3c.org it states that. The reason why you cannot nest CDATA elements is simply that the parser never even tries to parse the CDATA. The definition of CDATA is an element which is not parsed.

Like I said above, no need to escape anything but: < > ' " &
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10778220
>>The reason why you cannot nest CDATA elements is simply that the parser never even tries to parse the CDATA.

That actually seems like a contradiction in terms. If it's noticed a CDATA element that's not going to be parsed, why should it worry how much nesting there is?
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
servlet example 11 56
null output 3 42
jboss 7.1 start up error 1 61
Using jdbcTemplate.batchUpdate to improve INSERT performance 6 27
Styling your websites can become very complex. Here I'll show how SASS can help you better organize, maintain and reuse your CSS code.
Have you tried to learn about Unicode, UTF-8, and multibyte text encoding and all the articles are just too "academic" or too technical? This article aims to make the whole topic easy for just about anyone to understand.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
The viewer will the learn the benefit of plain text editors and code an HTML5 based template for use in further tutorials.

756 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question