• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 508
  • Last Modified:

SAX gurus: DTD is being lost between custome filters - how to correct?

Long form:

I have inherited an application and framework built around SAX, whichis an elegant solution for our particular business problem.  I would like it to be a little easier to automate testing the produced output.  Part of the solution to making it more robust is to allow the <!DOCTYPE ... > tag to make it into the produced output.

In the middle of the process, a custom XSLT transformation filter is producing XML that is given to a custom HTTP filter.  When I check the XML produced by the XSLT transformation filter, it iscorrect.  When I check the XML being transmitted by the custom HTTP filter, the <!DOCTYPE ... > tag has been stripped, but the XMLis otherwise correct.

Here is the transform code from the XSLT transformer, with all debugging code in place but commented out:

    public void parse(InputSource inputSource)
           throws java.io.IOException, SAXException {

        XMLReader parent = getParent();

        if(parent != null) {
            parent.setEntityResolver(this);
            parent.setDTDHandler(this);
            parent.setErrorHandler(this);
        }

        try {

            this.transformer.clearParameters();

            TransformationPathContext context = getContext();

            SAXSource source = new SAXSource(parent, inputSource);
            SAXResult result = new SAXResult(this);


            ByteArrayOutputStream os = new ByteArrayOutputStream ();
            StreamResult intermediate = new StreamResult (os);

            /* Do the transformation */

            //this.transformer.setOutputProperty("indent", "yes");
            //this.transformer.setOutputProperty("method", "xml");

            /* Debug purposes only */
            Properties op = this.transformer.getOutputProperties();
            if(op != null) {
                op.list(System.out);
            }




/*
            if(log.isDebugEnabled()) {
                ByteArrayOutputStream baos = new ByteArrayOutputStream();
                InputStream is = inputSource.getByteStream();
                int c;
                while((c = is.read()) != -1) {
                    baos.write(c);
                }
                byte[] tmpBuffer = baos.toByteArray();
//                log.debug("InputSource bytes(" + tmpBuffer.length + "):" + new String(tmpBuffer));
                System.out.println(this.getClass().getName()+":\r\nInputSource bytes(" + tmpBuffer.length + "):" + new String(tmpBuffer));
                ByteArrayInputStream bais = new ByteArrayInputStream(tmpBuffer);
                inputSource.setByteStream(bais);
            }
*/

/* For test purposes only */

/*
            this.transformer.transform(source, intermediate);
            System.out.println("----------------------------------------");
            System.out.println("  XML stream produced by XSL transform");
            System.out.println("    (may not represent final output)");
            System.out.println("----------------------------------------");
            System.out.println(new String (os.toByteArray()));
            System.out.println("----------------------------------------");

*/
           this.transformer.transform(source, result);

        } catch(TransformerException te) {
          if(log.isDebugEnabled()) {
            te.printStackTrace();
          }
          throw new SAXException("", te);
        }
    }


Here are the pieces that I believe are pertinent from the HTTP filter:

public class ConfigurableHTTPFilter
    extends XMLFilterImpl
    implements Configurable, TransformationPathFilter {

  public ConfigurableHTTPFilter(Element element) throws ConfigurationException {

    try {
      setRequestMethod(GET_METHOD);
      configure(element);

      /* Create a content handler to serialize the data received
         by this filter.  This serialized data will then be
         sent as the data for the HTTP request. */

      this.requestBuffer = new ByteArrayOutputStream();

      SerializerToXML serializer = new SerializerToXML();
      serializer.setOutputStream(this.requestBuffer);

      this.requestSerializerContentHandler = serializer.asContentHandler();
      this.setDTDHandler( this);

      this.fileNameMap = new HTTPFileNameMap(this.contentType);

    }
    catch (Exception e) {
      throw new ConfigurationException(e);
    }
  }

  public void parse(InputSource input) throws java.io.IOException, SAXException {

    this.requestBuffer.reset();

    /* Reroute the content handler to the serializer, if
       it is not already */

    ContentHandler contentHandler = getContentHandler();

    if (contentHandler != this.requestSerializerContentHandler) {

      /* The original content handler should be replace with
         the serializing content handler, and the original
         content handler will be used as the reply parsers content
         handler. */

      setContentHandler(this.requestSerializerContentHandler);
      this.replyParser.setContentHandler(contentHandler);
    }

    super.parse(input);
  }

  public void endDocument() throws SAXException {

    long bts = System.currentTimeMillis();

    HttpURLConnection.setFileNameMap(this.fileNameMap);

    try {
      /* Collect the received XML for the request data */
      this.requestSerializerContentHandler.endDocument();

      String content = new String(this.requestBuffer.toByteArray());

// uncomment to preview transmitted file
      System.out.println("Content: " + content);
      log.debug("Record at endDocument:" + content);

     /* *** ... SNIP ... *** */

    }
  }


0
swift99
Asked:
swift99
  • 14
  • 9
  • 7
2 Solutions
 
CEHJCommented:
Shouldn't you be using the methods of DTDHandler?

http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/DTDHandler.html
0
 
john-at-7fffCommented:
Is the DOCTYPE of the source document constant, or do you need to dynamically retrieve it?

If you know it (it's a constant), then for the XSLT transform, you can force the DOCTYPE emission by adding this line after your comment /* Do the transformation */
:

    this.transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "DTDNAME";

This changes the SYSTEM doctype -- if it's a PUBLIC doctype, it's a bit different. Look at the members of OutputKeys in the JavaDoc for OutputKeys:

http://java.sun.com/j2se/1.4.2/docs/api/javax/xml/transform/OutputKeys.html

Again, if you need to grab the DOCTYPE from the incoming XML, you should implement a DTDHandler, as CEHJ says.
0
 
john-at-7fffCommented:
Oops, forgot a paren:

this.transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "DTDNAME");

0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
CEHJCommented:
>>Is the DOCTYPE of the source document constant, or do you need to dynamically retrieve it?

Surely a better way of handling it is to use the DTDHandler, then you don't need to worry whether it's constant? - it'll get reproduced anyway.
0
 
swift99Author Commented:
john-at-7FFF - The XSLT transformation is producing the XML with the correct <!DOCTYPE ...> tag.  The tag is built in the XSLT script.  The problem is that the correctly generated tag is being lost between the successful XSLT transform and the start of the HTTP piece.

CEHJ - the class implements the DTDHandler interface, but the methods are never called.  If they were called I would have included those methods.  Note that we are setting the DTDHandler to "this".  Am I possibly setting the wrong object's DTD handler?

If I understand the paradigm correctly, when I feed my transformer result to a SAXResult, the next stage filter events are being fired directly as the XML is being generated.  I suspect, as you indicated, that I have not hooked up some handler or event that I need to.  I am a relative newbie to SAX, and I am still not quite comfortable with the blend of forward and backward chaining logic that it entails.
0
 
swift99Author Commented:
I am most suspicious of the connection between the SerializerToXML in the HTTP piece and the SAXResult in the XSLT transformer piece.  To my eyes it looks like it drops into black magic land, but it works except for the detail that the DTD disappears.

If I follow the logic correctly, in part this class acts as a decorator for the SerializerToXML class - its methods tend to fire the serializer methods.  

SerializerToXML has StartDTD and EndDTD events, but my classes have no such events.  Do I need to add a lexicalHandler so the DTD's will pass through, maybe?
0
 
john-at-7fffCommented:
Probably the parent DOES implement DTDHandler. So when you say:

parent.setDTDHandler(this);

You're just setting the parent's DTD Handler to its own DTDHandler, since YOU don't implement it in "this"

So:

Add the methods from the DTDHandler interface

    http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/DTDHandler.html

to the class that has that parse method, and see what you get from what is passed in to the DTDHandler methods.
0
 
swift99Author Commented:
Did that.  The DTDHandler methods are not called.
0
 
john-at-7fffCommented:
OK.

You're also saying this.setDTDHandler( this); in ConfigurableHTTPFilter, where again the parent class -- XMLFilterImpl -- may be getting the DTDHandler calls.

Why don't you implement the DTDHandler methods in ConfigurableHTTPFilter, and check it there as well?
0
 
CEHJCommented:
I'm not sure i've completely got my head around what's going on either, but if you've got configurable XSLT, are you matching the DTD node?
0
 
swift99Author Commented:
The output of the XSLT phase has the correct DTD node.  From the data, you can see what stage of the process this printout was intercepted from.  See following:

----------------------------------------

  XML stream produced by XSL transform

    (may not represent final output)

----------------------------------------

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Maptuit PUBLIC "-//Maptuit//DTD AttractionML 1.0//EN" "http://fleetnav.maptuit.com/am/dtd/AttractionML.dtd">

<Maptuit>
<UpdateAddressReq ClientId="jbhuntdev">
<UpdateLocationReq>
<OwnerID>CYTEST</OwnerID>
<Name>J B HUNT CHICAGO YARD</Name>
<Address>
<Street>3642 W 47TH STREET TEST</Street>
<City>CHICAGO</City>
<State>IL</State>
<Zip>60632-3510</Zip>
<Country>US</Country>
</Address>
</UpdateLocationReq>
</UpdateAddressReq>
</Maptuit>

This is intercepted at the "endDocument" method of the HTTP filter piece:
Content:

<?xml version="1.0" encoding="UTF-8"?>

<Maptuit><UpdateAddressReq ClientId="jbhuntdev"><UpdateLocationReq><OwnerID>CYTEST</OwnerID><Name>J B HUNT CHICAGO YARD</Name><Address><Street>3642 W 47TH STREET TEST</Street><City>CHICAGO</City><State>IL</State><Zip>60632-3510</Zip><Country>US</Country></Address></UpdateLocationReq></UpdateAddressReq></Maptuit>


Between the two pieces are a SAXResult and a SerializerToXML.
0
 
swift99Author Commented:
john: we already tried that.  The methods are never called.
0
 
john-at-7fffCommented:
Swift99 -- Wow.

Do you have the code or the inheritance hierarchy for ConfigurableHTTPFilter ?
0
 
swift99Author Commented:
Yes, but I'm not at liberty to release it in its entirety.  The heirarchy is noted in the code snippets already submitted.  

public class ConfigurableHTTPFilter
    extends XMLFilterImpl
    implements Configurable, TransformationPathFilter
0
 
swift99Author Commented:
I did say that this was one for SAX gurus   :o)
0
 
swift99Author Commented:
Would this be pertinent, do you think?  The implication is that SAX readers do not recognize these "optional" sections, without specifically setting a exical handler to handle the startDTD, endDTD, and so on.

from http://217.31.71.131/xalan-j_2_5_D1-docs/apidocs/org/xml/sax/ext/LexicalHandler.html

public interface LexicalHandler
SAX2 extension handler for lexical events.

This module, both source code and documentation, is in the Public Domain, and comes with NO WARRANTY. See http://www.saxproject.org for further information.
This is an optional extension handler for SAX2 to provide lexical information about an XML document, such as comments and CDATA section boundaries. XML readers are not required to recognize this handler, and it is not part of core-only SAX2 distributions.

The events in the lexical handler apply to the entire document, not just to the document element, and all lexical handler events must appear between the content handler's startDocument and endDocument events.

To set the LexicalHandler for an XML reader, use the setProperty method with the property name http://xml.org/sax/properties/lexical-handler and an object implementing this interface (or null) as the value. If the reader does not report lexical events, it will throw a SAXNotRecognizedException when you attempt to register the handler.


Since:
SAX 2.0 (extensions 1.0)
0
 
CEHJCommented:
>>Would this be pertinent, do you think?  

I think you could be onto something. btw i'm not 'posing' as a SAX guru - you probably know more than i do by some way. But it's always good to have a pair of 'informed' eyes ;-)
0
 
john-at-7fffCommented:
Awww, I was WAY off base. DTDHandler is for getting notation and entity declarations -- things like this:

<!NOTATION jpeg SYSTEM "images/jpeg">
<!ENTITY stars_logo SYSTEM "http://www.nhl.com/img/team/dal38.gif"
                    NDATA jpeg>

In any case, you are absolutely right about using a LexicalHandler. From the SAX FAQ (http://www.saxproject.org/?selected=faq):

Does SAX support comments/CDATA sections/DOCTYPE declarations, etc.?
Not in the core API. These kinds of things are pure lexical details, and are not relevant to most kinds of XML processing, so it doesn't make sense to put them in the core and force all implementors to support them.

However, SAX2 is designed to be extensible, and the LexicalHandler interface is supported by most SAX parsers. SAX2 parsers are not required to support this handler, but they are required to report an error if you try to use handlers they don't support.


0
 
swift99Author Commented:
That gives me ammo for another swing then.

Thanks, both of you.  Between the three of us I think we've tumbled in the right direction here.

I'm going to follow this up with some tests, and if this works I'll split the points.  If not, then I'll follow up with more feedback.
0
 
john-at-7fffCommented:
OK, the LexicalHandler definitely works for grabbing the DTD (I have a lot of SAX code lying around, but whenever I've needed to produce the DOCTYPE, I've always used a straight XSLT transform).

Here's what you need to do for that piece (the rest of the problem is then getting the values into your serializer: You might need to post the code for your SerializerToXML class).

Create a new class called MyLexicalHandler, and it should implement org.xml.sax.ext.LexicalHandler. You could use this:

class MyLexicalHandler implements org.xml.sax.ext.LexicalHandler {
private String name;
private String publicId;
private String systemId;
      public void endCDATA() throws SAXException {}
      public void endDTD() throws SAXException {}
      public void startCDATA() throws SAXException {}
      public void comment(char[] ch, int start, int length) throws SAXException {}
      public void endEntity(String name) throws SAXException {}
      public void startEntity(String name) throws SAXException {}
      public void startDTD(String name, String publicId, String systemId) throws SAXException {
            this.name = name;
            this.publicId = publicId;
            this.systemId = systemId;
            System.out.println("name: " + name);
            System.out.println("publicId: " + publicId);
            System.out.println("systemId: " + systemId);
      }
}

Where in ConfigurableHTTPFilter you're saying

    this.setDTDHandler( this);

You will want additionally

    this.setProperty("http://xml.org/sax/properties/lexical-handler", new MyLexicalHandler());

Try this and see, at least, if the DTD gets dumped to the console.
0
 
swift99Author Commented:
We get an exception instantiating the class.

org.xml.sax.SAXNotRecognizedException: Property: http://xml.org/sax/properties/lexical-handler
      at org.xml.sax.helpers.XMLFilterImpl.setProperty(Unknown Source)

0
 
CEHJCommented:
AFAIK you need to set that property on the reader
0
 
swift99Author Commented:
CEHJ:  That's got it (property on the reader)!

The DTD is still not getting through, but the events are being fired, so I have several options open.

I'm no longer looking for a nice solid brick wall to bang my head on.   :o)


0
 
john-at-7fffCommented:
Oops. Sorry about trying to put that instance in the wrong place.
0
 
swift99Author Commented:
LOL ... I got the event firing on the _output_ of the HTTP filter.  :o)

The exception is being raise because the default setProperty method of the reader only passes the property on to the parent.  If the parent is null, as in this case, an exception is raised.  The parent is the output side of this equation, so when I do this I get the event firing on the wrong end of the process.

Someone or something has to actually call the LexicalHandler interface.  It's time to dig into the source code.
0
 
swift99Author Commented:
The FAQ link you gave me directed me to the final part of the answer.

We need to update our SAX implementation.  That has to go through the architecture commitee, so I'm at a stand still for now on this.

Once that happens, I believe that we'll be in business.
0
 
CEHJCommented:
OK 8-)
0
 
john-at-7fffCommented:
Cheers, and good luck!
0
 
swift99Author Commented:
Thanks ... I'll need it!
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

  • 14
  • 9
  • 7
Tackle projects and never again get stuck behind a technical roadblock.
Join Now