Solved

SAX gurus:  DTD is being lost between custome filters - how to correct?

Posted on 2004-04-14
30
490 Views
Last Modified: 2013-11-23
Long form:

I have inherited an application and framework built around SAX, whichis an elegant solution for our particular business problem.  I would like it to be a little easier to automate testing the produced output.  Part of the solution to making it more robust is to allow the <!DOCTYPE ... > tag to make it into the produced output.

In the middle of the process, a custom XSLT transformation filter is producing XML that is given to a custom HTTP filter.  When I check the XML produced by the XSLT transformation filter, it iscorrect.  When I check the XML being transmitted by the custom HTTP filter, the <!DOCTYPE ... > tag has been stripped, but the XMLis otherwise correct.

Here is the transform code from the XSLT transformer, with all debugging code in place but commented out:

    public void parse(InputSource inputSource)
           throws java.io.IOException, SAXException {

        XMLReader parent = getParent();

        if(parent != null) {
            parent.setEntityResolver(this);
            parent.setDTDHandler(this);
            parent.setErrorHandler(this);
        }

        try {

            this.transformer.clearParameters();

            TransformationPathContext context = getContext();

            SAXSource source = new SAXSource(parent, inputSource);
            SAXResult result = new SAXResult(this);


            ByteArrayOutputStream os = new ByteArrayOutputStream ();
            StreamResult intermediate = new StreamResult (os);

            /* Do the transformation */

            //this.transformer.setOutputProperty("indent", "yes");
            //this.transformer.setOutputProperty("method", "xml");

            /* Debug purposes only */
            Properties op = this.transformer.getOutputProperties();
            if(op != null) {
                op.list(System.out);
            }




/*
            if(log.isDebugEnabled()) {
                ByteArrayOutputStream baos = new ByteArrayOutputStream();
                InputStream is = inputSource.getByteStream();
                int c;
                while((c = is.read()) != -1) {
                    baos.write(c);
                }
                byte[] tmpBuffer = baos.toByteArray();
//                log.debug("InputSource bytes(" + tmpBuffer.length + "):" + new String(tmpBuffer));
                System.out.println(this.getClass().getName()+":\r\nInputSource bytes(" + tmpBuffer.length + "):" + new String(tmpBuffer));
                ByteArrayInputStream bais = new ByteArrayInputStream(tmpBuffer);
                inputSource.setByteStream(bais);
            }
*/

/* For test purposes only */

/*
            this.transformer.transform(source, intermediate);
            System.out.println("----------------------------------------");
            System.out.println("  XML stream produced by XSL transform");
            System.out.println("    (may not represent final output)");
            System.out.println("----------------------------------------");
            System.out.println(new String (os.toByteArray()));
            System.out.println("----------------------------------------");

*/
           this.transformer.transform(source, result);

        } catch(TransformerException te) {
          if(log.isDebugEnabled()) {
            te.printStackTrace();
          }
          throw new SAXException("", te);
        }
    }


Here are the pieces that I believe are pertinent from the HTTP filter:

public class ConfigurableHTTPFilter
    extends XMLFilterImpl
    implements Configurable, TransformationPathFilter {

  public ConfigurableHTTPFilter(Element element) throws ConfigurationException {

    try {
      setRequestMethod(GET_METHOD);
      configure(element);

      /* Create a content handler to serialize the data received
         by this filter.  This serialized data will then be
         sent as the data for the HTTP request. */

      this.requestBuffer = new ByteArrayOutputStream();

      SerializerToXML serializer = new SerializerToXML();
      serializer.setOutputStream(this.requestBuffer);

      this.requestSerializerContentHandler = serializer.asContentHandler();
      this.setDTDHandler( this);

      this.fileNameMap = new HTTPFileNameMap(this.contentType);

    }
    catch (Exception e) {
      throw new ConfigurationException(e);
    }
  }

  public void parse(InputSource input) throws java.io.IOException, SAXException {

    this.requestBuffer.reset();

    /* Reroute the content handler to the serializer, if
       it is not already */

    ContentHandler contentHandler = getContentHandler();

    if (contentHandler != this.requestSerializerContentHandler) {

      /* The original content handler should be replace with
         the serializing content handler, and the original
         content handler will be used as the reply parsers content
         handler. */

      setContentHandler(this.requestSerializerContentHandler);
      this.replyParser.setContentHandler(contentHandler);
    }

    super.parse(input);
  }

  public void endDocument() throws SAXException {

    long bts = System.currentTimeMillis();

    HttpURLConnection.setFileNameMap(this.fileNameMap);

    try {
      /* Collect the received XML for the request data */
      this.requestSerializerContentHandler.endDocument();

      String content = new String(this.requestBuffer.toByteArray());

// uncomment to preview transmitted file
      System.out.println("Content: " + content);
      log.debug("Record at endDocument:" + content);

     /* *** ... SNIP ... *** */

    }
  }


0
Comment
Question by:swift99
  • 14
  • 9
  • 7
30 Comments
 
LVL 86

Expert Comment

by:CEHJ
ID: 10823291
Shouldn't you be using the methods of DTDHandler?

http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/DTDHandler.html
0
 
LVL 4

Expert Comment

by:john-at-7fff
ID: 10823378
Is the DOCTYPE of the source document constant, or do you need to dynamically retrieve it?

If you know it (it's a constant), then for the XSLT transform, you can force the DOCTYPE emission by adding this line after your comment /* Do the transformation */
:

    this.transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "DTDNAME";

This changes the SYSTEM doctype -- if it's a PUBLIC doctype, it's a bit different. Look at the members of OutputKeys in the JavaDoc for OutputKeys:

http://java.sun.com/j2se/1.4.2/docs/api/javax/xml/transform/OutputKeys.html

Again, if you need to grab the DOCTYPE from the incoming XML, you should implement a DTDHandler, as CEHJ says.
0
 
LVL 4

Expert Comment

by:john-at-7fff
ID: 10823383
Oops, forgot a paren:

this.transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "DTDNAME");

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10823513
>>Is the DOCTYPE of the source document constant, or do you need to dynamically retrieve it?

Surely a better way of handling it is to use the DTDHandler, then you don't need to worry whether it's constant? - it'll get reproduced anyway.
0
 
LVL 6

Author Comment

by:swift99
ID: 10823557
john-at-7FFF - The XSLT transformation is producing the XML with the correct <!DOCTYPE ...> tag.  The tag is built in the XSLT script.  The problem is that the correctly generated tag is being lost between the successful XSLT transform and the start of the HTTP piece.

CEHJ - the class implements the DTDHandler interface, but the methods are never called.  If they were called I would have included those methods.  Note that we are setting the DTDHandler to "this".  Am I possibly setting the wrong object's DTD handler?

If I understand the paradigm correctly, when I feed my transformer result to a SAXResult, the next stage filter events are being fired directly as the XML is being generated.  I suspect, as you indicated, that I have not hooked up some handler or event that I need to.  I am a relative newbie to SAX, and I am still not quite comfortable with the blend of forward and backward chaining logic that it entails.
0
 
LVL 6

Author Comment

by:swift99
ID: 10823797
I am most suspicious of the connection between the SerializerToXML in the HTTP piece and the SAXResult in the XSLT transformer piece.  To my eyes it looks like it drops into black magic land, but it works except for the detail that the DTD disappears.

If I follow the logic correctly, in part this class acts as a decorator for the SerializerToXML class - its methods tend to fire the serializer methods.  

SerializerToXML has StartDTD and EndDTD events, but my classes have no such events.  Do I need to add a lexicalHandler so the DTD's will pass through, maybe?
0
 
LVL 4

Expert Comment

by:john-at-7fff
ID: 10824070
Probably the parent DOES implement DTDHandler. So when you say:

parent.setDTDHandler(this);

You're just setting the parent's DTD Handler to its own DTDHandler, since YOU don't implement it in "this"

So:

Add the methods from the DTDHandler interface

    http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/DTDHandler.html

to the class that has that parse method, and see what you get from what is passed in to the DTDHandler methods.
0
 
LVL 6

Author Comment

by:swift99
ID: 10824165
Did that.  The DTDHandler methods are not called.
0
 
LVL 4

Expert Comment

by:john-at-7fff
ID: 10824197
OK.

You're also saying this.setDTDHandler( this); in ConfigurableHTTPFilter, where again the parent class -- XMLFilterImpl -- may be getting the DTDHandler calls.

Why don't you implement the DTDHandler methods in ConfigurableHTTPFilter, and check it there as well?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10824254
I'm not sure i've completely got my head around what's going on either, but if you've got configurable XSLT, are you matching the DTD node?
0
 
LVL 6

Author Comment

by:swift99
ID: 10824371
The output of the XSLT phase has the correct DTD node.  From the data, you can see what stage of the process this printout was intercepted from.  See following:

----------------------------------------

  XML stream produced by XSL transform

    (may not represent final output)

----------------------------------------

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Maptuit PUBLIC "-//Maptuit//DTD AttractionML 1.0//EN" "http://fleetnav.maptuit.com/am/dtd/AttractionML.dtd">

<Maptuit>
<UpdateAddressReq ClientId="jbhuntdev">
<UpdateLocationReq>
<OwnerID>CYTEST</OwnerID>
<Name>J B HUNT CHICAGO YARD</Name>
<Address>
<Street>3642 W 47TH STREET TEST</Street>
<City>CHICAGO</City>
<State>IL</State>
<Zip>60632-3510</Zip>
<Country>US</Country>
</Address>
</UpdateLocationReq>
</UpdateAddressReq>
</Maptuit>

This is intercepted at the "endDocument" method of the HTTP filter piece:
Content:

<?xml version="1.0" encoding="UTF-8"?>

<Maptuit><UpdateAddressReq ClientId="jbhuntdev"><UpdateLocationReq><OwnerID>CYTEST</OwnerID><Name>J B HUNT CHICAGO YARD</Name><Address><Street>3642 W 47TH STREET TEST</Street><City>CHICAGO</City><State>IL</State><Zip>60632-3510</Zip><Country>US</Country></Address></UpdateLocationReq></UpdateAddressReq></Maptuit>


Between the two pieces are a SAXResult and a SerializerToXML.
0
 
LVL 6

Author Comment

by:swift99
ID: 10824389
john: we already tried that.  The methods are never called.
0
 
LVL 4

Expert Comment

by:john-at-7fff
ID: 10824614
Swift99 -- Wow.

Do you have the code or the inheritance hierarchy for ConfigurableHTTPFilter ?
0
 
LVL 6

Author Comment

by:swift99
ID: 10824644
Yes, but I'm not at liberty to release it in its entirety.  The heirarchy is noted in the code snippets already submitted.  

public class ConfigurableHTTPFilter
    extends XMLFilterImpl
    implements Configurable, TransformationPathFilter
0
 
LVL 6

Author Comment

by:swift99
ID: 10824673
I did say that this was one for SAX gurus   :o)
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 6

Author Comment

by:swift99
ID: 10824744
Would this be pertinent, do you think?  The implication is that SAX readers do not recognize these "optional" sections, without specifically setting a exical handler to handle the startDTD, endDTD, and so on.

from http://217.31.71.131/xalan-j_2_5_D1-docs/apidocs/org/xml/sax/ext/LexicalHandler.html

public interface LexicalHandler
SAX2 extension handler for lexical events.

This module, both source code and documentation, is in the Public Domain, and comes with NO WARRANTY. See http://www.saxproject.org for further information.
This is an optional extension handler for SAX2 to provide lexical information about an XML document, such as comments and CDATA section boundaries. XML readers are not required to recognize this handler, and it is not part of core-only SAX2 distributions.

The events in the lexical handler apply to the entire document, not just to the document element, and all lexical handler events must appear between the content handler's startDocument and endDocument events.

To set the LexicalHandler for an XML reader, use the setProperty method with the property name http://xml.org/sax/properties/lexical-handler and an object implementing this interface (or null) as the value. If the reader does not report lexical events, it will throw a SAXNotRecognizedException when you attempt to register the handler.


Since:
SAX 2.0 (extensions 1.0)
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10824803
>>Would this be pertinent, do you think?  

I think you could be onto something. btw i'm not 'posing' as a SAX guru - you probably know more than i do by some way. But it's always good to have a pair of 'informed' eyes ;-)
0
 
LVL 4

Expert Comment

by:john-at-7fff
ID: 10824876
Awww, I was WAY off base. DTDHandler is for getting notation and entity declarations -- things like this:

<!NOTATION jpeg SYSTEM "images/jpeg">
<!ENTITY stars_logo SYSTEM "http://www.nhl.com/img/team/dal38.gif"
                    NDATA jpeg>

In any case, you are absolutely right about using a LexicalHandler. From the SAX FAQ (http://www.saxproject.org/?selected=faq):

Does SAX support comments/CDATA sections/DOCTYPE declarations, etc.?
Not in the core API. These kinds of things are pure lexical details, and are not relevant to most kinds of XML processing, so it doesn't make sense to put them in the core and force all implementors to support them.

However, SAX2 is designed to be extensible, and the LexicalHandler interface is supported by most SAX parsers. SAX2 parsers are not required to support this handler, but they are required to report an error if you try to use handlers they don't support.


0
 
LVL 6

Author Comment

by:swift99
ID: 10825001
That gives me ammo for another swing then.

Thanks, both of you.  Between the three of us I think we've tumbled in the right direction here.

I'm going to follow this up with some tests, and if this works I'll split the points.  If not, then I'll follow up with more feedback.
0
 
LVL 4

Accepted Solution

by:
john-at-7fff earned 125 total points
ID: 10825063
OK, the LexicalHandler definitely works for grabbing the DTD (I have a lot of SAX code lying around, but whenever I've needed to produce the DOCTYPE, I've always used a straight XSLT transform).

Here's what you need to do for that piece (the rest of the problem is then getting the values into your serializer: You might need to post the code for your SerializerToXML class).

Create a new class called MyLexicalHandler, and it should implement org.xml.sax.ext.LexicalHandler. You could use this:

class MyLexicalHandler implements org.xml.sax.ext.LexicalHandler {
private String name;
private String publicId;
private String systemId;
      public void endCDATA() throws SAXException {}
      public void endDTD() throws SAXException {}
      public void startCDATA() throws SAXException {}
      public void comment(char[] ch, int start, int length) throws SAXException {}
      public void endEntity(String name) throws SAXException {}
      public void startEntity(String name) throws SAXException {}
      public void startDTD(String name, String publicId, String systemId) throws SAXException {
            this.name = name;
            this.publicId = publicId;
            this.systemId = systemId;
            System.out.println("name: " + name);
            System.out.println("publicId: " + publicId);
            System.out.println("systemId: " + systemId);
      }
}

Where in ConfigurableHTTPFilter you're saying

    this.setDTDHandler( this);

You will want additionally

    this.setProperty("http://xml.org/sax/properties/lexical-handler", new MyLexicalHandler());

Try this and see, at least, if the DTD gets dumped to the console.
0
 
LVL 6

Author Comment

by:swift99
ID: 10826629
We get an exception instantiating the class.

org.xml.sax.SAXNotRecognizedException: Property: http://xml.org/sax/properties/lexical-handler
      at org.xml.sax.helpers.XMLFilterImpl.setProperty(Unknown Source)

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10826705
AFAIK you need to set that property on the reader
0
 
LVL 86

Assisted Solution

by:CEHJ
CEHJ earned 125 total points
ID: 10826710
0
 
LVL 6

Author Comment

by:swift99
ID: 10826781
CEHJ:  That's got it (property on the reader)!

The DTD is still not getting through, but the events are being fired, so I have several options open.

I'm no longer looking for a nice solid brick wall to bang my head on.   :o)


0
 
LVL 4

Expert Comment

by:john-at-7fff
ID: 10827186
Oops. Sorry about trying to put that instance in the wrong place.
0
 
LVL 6

Author Comment

by:swift99
ID: 10827210
LOL ... I got the event firing on the _output_ of the HTTP filter.  :o)

The exception is being raise because the default setProperty method of the reader only passes the property on to the parent.  If the parent is null, as in this case, an exception is raised.  The parent is the output side of this equation, so when I do this I get the event firing on the wrong end of the process.

Someone or something has to actually call the LexicalHandler interface.  It's time to dig into the source code.
0
 
LVL 6

Author Comment

by:swift99
ID: 10827658
The FAQ link you gave me directed me to the final part of the answer.

We need to update our SAX implementation.  That has to go through the architecture commitee, so I'm at a stand still for now on this.

Once that happens, I believe that we'll be in business.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10827694
OK 8-)
0
 
LVL 4

Expert Comment

by:john-at-7fff
ID: 10827798
Cheers, and good luck!
0
 
LVL 6

Author Comment

by:swift99
ID: 10832781
Thanks ... I'll need it!
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Introduction Knockoutjs (Knockout) is a JavaScript framework (Model View ViewModel or MVVM framework).   The main ideology behind Knockout is to control from JavaScript how a page looks whilst creating an engaging user experience in the least …
Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
The viewer will receive an overview of the basics of CSS showing inline styles. In the head tags set up your style tags: (CODE) Reference the nav tag and set your properties.: (CODE) Set the reference for the UL element and styles for it to ensu…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now