Solved

Easy points. Question on XML Escape character.

Posted on 2004-04-07
12
2,831 Views
Last Modified: 2013-11-19
Hi!

Again, as always, I am too lazy to research and I like giving out points:

I have a method that genrates an XML Document.

Here's just the portion of it:

FactorModel fm = (FactorModel)itFactor.next();
Element factor_name = doc.createElement(FACTOR_NAME_ELEMENT);
factor_name.setAttribute(ID_ATTRIBUTE,String.valueOf(fm.getFactor_Id_Pk()));
factor_name.setAttribute(NAME_ATTRIBUTE,fm.getFactor_Name());
if (fm.getAnswer().getAnswerValue()!=null)
      factor_name.appendChild(doc.createTextNode(fm.getAnswer().getAnswerValue()));

So if the fm.getAnswer().getAnswerValue() has invalid characthers like "/" that screws up the xml when it is return as a String, what's the best way to escape it?


A+ Answer to first working solution!  thanks for helping....

0
Comment
Question by:gaetansavoie
  • 5
  • 5
  • 2
12 Comments
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 10777506
Add as a CDATA element
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 500 total points
ID: 10777509
Just call

String s = HTMLEscape.escape(s);

on the below:

SNIP====================================================================

static class HTMLEscape {
            /**
             *  Escape xml/html characters             *
             * @param  s  The string to esacpe
             * @return      The escaped String
             */
            public static String escape(String s) {
                  int len = s.length();
                  StringBuffer sb = new StringBuffer(len * 5 / 4);

                  for (int i = 0; i < len; i++) {
                        char c = s.charAt(i);
                        String elem = htmlchars[c & 0xff];

                        sb.append(elem == null ? "" + c : elem);
                  }
                  return sb.toString();
            }


            private static String htmlchars[] = new String[256];

            static {
                  String entry[] = {
                              "nbsp", "iexcl", "cent", "pound", "curren", "yen", "brvbar",
                              "sect", "uml", "copy", "ordf", "laquo", "not", "shy", "reg",
                              "macr", "deg", "plusmn", "sup2", "sup3", "acute", "micro",
                              "para", "middot", "cedil", "sup1", "ordm", "raquo", "frac14",
                              "frac12", "frac34", "iquest",
                              "Agrave", "Aacute", "Acirc", "Atilde", "Auml", "Aring", "AElig",
                              "CCedil", "Egrave", "Eacute", "Ecirc", "Euml", "Igrave", "Iacute",
                              "Icirc", "Iuml", "ETH", "Ntilde", "Ograve", "Oacute", "Ocirc",
                              "Otilde", "Ouml", "times", "Oslash", "Ugrave", "Uacute", "Ucirc",
                              "Uuml", "Yacute", "THORN", "szlig",
                              "agrave", "aacute", "acirc", "atilde", "auml", "aring", "aelig",
                              "ccedil", "egrave", "eacute", "ecirc", "euml", "igrave", "iacute",
                              "icirc", "iuml", "eth", "ntilde", "ograve", "oacute", "ocirc",
                              "otilde", "ouml", "divid", "oslash", "ugrave", "uacute", "ucirc",
                              "uuml", "yacute", "thorn", "yuml"
                              };

                  htmlchars['&'] = "&amp;";
                  htmlchars['<'] = "&lt;";
                  htmlchars['>'] = "&gt;";

                  for (int c = '\u00A0', i = 0; c <= '\u00FF'; c++, i++) {
                        htmlchars[c] = "&" + entry[i] + ";";
                  }

                  for (int c = '\u0083', i = 131; c <= '\u009f'; c++, i++) {
                        htmlchars[c] = "&#" + i + ";";
                  }

                  htmlchars['\u0088'] = htmlchars['\u008D'] = htmlchars['\u008E'] = null;
                  htmlchars['\u008F'] = htmlchars['\u0090'] = htmlchars['\u0098'] = null;
                  htmlchars['\u009D'] = null;
            }

      }
0
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 10777549
oh, if you're using a home grown DOM model, you might have found a bug.

0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 14

Expert Comment

by:Tommy Braas
ID: 10777555
hehe, to slow with the post...
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10777575
8-)
0
 
LVL 2

Author Comment

by:gaetansavoie
ID: 10777616
I accepted answer from CEHJ because you cannot nest CDATA and that caused problems for me! :(  


Thanks CEHJ
0
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 10777632
Btw, the escaping proposed by CEHJ should not be used. There are only a couple of characters that ever need to be escaped in XML. Always save XML documents in UTF-8 file encoding, no need ever to escape. The following  characters should be escaped:
<!ENTITY lt     "&#38;#60;">
<!ENTITY gt     "&#62;">
<!ENTITY amp    "&#38;#38;">
<!ENTITY apos   "&#39;">
<!ENTITY quot   "&#34;">
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10777670
>>I accepted answer from CEHJ because you cannot nest CDATA

You certainly can't insert CDATA aribitrarily

>>There are only a couple of characters...

Basically the same ones as html AFAIK
0
 
LVL 2

Author Comment

by:gaetansavoie
ID: 10777802
Thanks for comment CEHJ.  It's hard to explain but we have a special case where an element is actually an XML provided by client with CDATA already in it!  

And according to MSSoft, you cannot nest them which was my problem:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/xmlsdk30/htm/xmconcdatamarkedsections.asp

"Note   Content within CDATA sections must be within the range of characters permitted for XML content; control characters and compatibility characters cannot be escaped this way. In addition, the sequence ]]> cannot appear within a CDATA section because this sequence signals the end of the section. This means that CDATA sections cannot be nested. The sequence also appears in some scripts. Within scripts, it is usually possible tosubstitute ] ]> for ]]>."


Escaping worked so thanks again.  I am sure it's not trapping for every scenario but it did in my case!
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10777848
>>And according to MSSoft, you cannot nest them which was my problem:

Yes that's right. Might even be wise to add '[' and ']' to that table ;-) Although the parser should probably ignore them if they don't occur together
0
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 10778165
According to the XML standard available at http://www.w3c.org it states that. The reason why you cannot nest CDATA elements is simply that the parser never even tries to parse the CDATA. The definition of CDATA is an element which is not parsed.

Like I said above, no need to escape anything but: < > ' " &
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10778220
>>The reason why you cannot nest CDATA elements is simply that the parser never even tries to parse the CDATA.

That actually seems like a contradiction in terms. If it's noticed a CDATA element that's not going to be parsed, why should it worry how much nesting there is?
0

Featured Post

VMware Disaster Recovery and Data Protection

In this expert guide, you’ll learn about the components of a Modern Data Center. You will use cases for the value-added capabilities of Veeam®, including combining backup and replication for VMware disaster recovery and using replication for data center migration.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
tomcat administrtor 12 47
SHA2 certs for IIS AND Java? 2 91
hibernate example for saving data 19 42
JAVA API design with micro service cloud in mind 1 18
Preface This is the third article about the EE Collaborative Login Project. A Better Website Login System (http://www.experts-exchange.com/A_2902.html) introduces the Login System and shows how to implement a login page. The EE Collaborative Logi…
Introduction Knockoutjs (Knockout) is a JavaScript framework (Model View ViewModel or MVVM framework).   The main ideology behind Knockout is to control from JavaScript how a page looks whilst creating an engaging user experience in the least …
The viewer will learn how to count occurrences of each item in an array.
Learn how to create flexible layouts using relative units in CSS.  New relative units added in CSS3 include vw(viewports width), vh(viewports height), vmin(minimum of viewports height and width), and vmax (maximum of viewports height and width).

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question