Avatar of henrikpettersen
henrikpettersen

asked on 

Perforning filtering on a SAX XML stream in Java

1. I have an InputStream/InputSource from a servlet, after doing a file upload of an XML file

2. After I perfom my filtering, I need to end up with another InputSource, asI am attempting to integrate a 3rd party API, which only has an API for an InputSource:

     //3rd party API
     public void importData(InputSource pContent) throws SAXException, IOException

3. Between getting the inputstream from the file upload servlet, and passing an InputSource to the 3rd party importData API above, I need to change the text content of a given element's attribute, like this:

    XPath xpath= DocumentHelper.createXPath("/io/multimediaGroup/multimedia/filename");
    List<Element> candidateElements = xpath.selectNodes(inputDoc);
    for (Element element : candidateElements){
        element.setText("Replacement Value");
    }

4. The above example uses XPath and DOM, but we don't really want this. The incoming / uploaded XML document could possibly be huge, so we would like a SAX based implementation. Something like this:

InputSource -> read sax events -> filter -> write sax events -> OutputSource(?) -> InputSource

so that we have a stream coming in, a filter that changes some values, and a result stream we can pass to our 3rd party API.

I've looked at the stAX API, as well as JAXP and the XMLReader interface, but I'm not able to find a solution.

I can see one solution, where I write my own XMLReader that appends to an outputstream as the input is parsed - but this seems like a lot of work, with a good chance of screwing things up by forgetting to append items to the outputstream properly.

Are you a Java XML/jaxp/stax/streaming/io/nio guru? Can you help?

Please let me know if you have any questions, I'll be happy to clerify.  
Java

Avatar of undefined
Last Comment
henrikpettersen
Avatar of henrikpettersen
henrikpettersen

ASKER

Also looked at using XMLFilter, and chaining XMLReaders together, but it seems impossible to get an InputSource from the XMLReader again....
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Avatar of henrikpettersen

ASKER

As I mentioned, I can see how you could chain filters together, and how you could end up with an XMLReader at the end of your chain, where all the filtering has taken place.

My problem is, how do I go from an XMLReader / XMLFilter instance, to generate an InputSource to be used in my 3rd party API? Am I missing something obvious?

Avatar of henrikpettersen

ASKER

I would be tempted to do: "new InputSource(myXmlReaderInstance)", but the javadoc for XMLReader says:

http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/XMLReader.html:
> Note: despite its name, this interface does not extend the standard Java Reader interface,
> because reading XML is a fundamentally different activity than reading character data.
Avatar of Mick Barry
Mick Barry
Flag of Australia image

your problem is that your filter produces *output*, but your after *input*. You could try using a transformation at the end of your filter chain which writes to say PipedOutputStream, and connect a PipedInputStream to it that you can pass to your 3rd party.

http://java.sun.com/j2ee/1.4/docs/tutorial/doc/JAXPXSLT8.html
Avatar of henrikpettersen

ASKER

Objects, this is right on the money:
>> your problem is that your filter produces *output*, but your after *input*

And I can certainly see how you would use the Java streaming APIs piped input- and outputstreams in order to convert from an outputStream to an inputStream. No problem.

As for using XSL transformations, I _should_ be able to get an outputstream (from what I can tell) after a transformation. However, I'm not really applying any stylesheets to my filterchain at the moment: Instead I have one custom XMLFilter which changes the value of a single attribute (stylesheets are not that great for small updates to documents - which is why some people came up with XUpdate) .

If I can obtain an outputstream from a Transformer, why can I not get one from an XMLReader? It seems a little much to apply a stylesheet with a single  instruction in order to get my hands on an InputStream instance.

So the desired chain of conversions probably looks something like this now:
InputSource -> XMLReader -> outputStream -> pipedOutputStream -> pipedInputStream -> InputSource

Almost there, but I still would like to get an OutputStream from the XMLReader, without having to apply a stylesheet transformation at the end of the filterchain (Perhaps my stylesheets assumptions are incorrect? If so, please feel free to correct me).

Thanks for your help so far, objects and CEHJ :-)

ASKER CERTIFIED SOLUTION
Avatar of Mick Barry
Mick Barry
Flag of Australia image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of henrikpettersen

ASKER

Hi objects / CEHJ:

my apologies for not getting back sooner: I was whisked away on another task.

Thank you, objects - I think I understand now how the Transformer class works, and I was able to create a little function to do what I wanted. Almost.

In my implementation, the program halts and never returns when I attempt o perform the transformation (See 'Code Snippet')



Also, here is the stacktrace I get when I suspend the blocking thread using remote debug:
===================================================================
Object.wait(long) line: not available [native method]      
PipedInputStream.awaitSpace() line: 204      
PipedInputStream.receive(byte[], int, int) line: 161      
PipedOutputStream.write(byte[], int, int) line: 129      
StreamEncoder$CharsetSE.writeBytes() line: 336      
StreamEncoder$CharsetSE.implWrite(char[], int, int) line: 395      
StreamEncoder$CharsetSE(StreamEncoder).write(char[], int, int) line: 136      
OutputStreamWriter.write(char[], int, int) line: 191      
BufferedWriter.flushBuffer() line: 111      
BufferedWriter.flush() line: 235      
XMLEmitter.endDocument() line: 148      
UncommittedEmitter(ProxyEmitter).endDocument() line: 70      
UncommittedEmitter.endDocument() line: 36      
NamespaceEmitter(ProxyEmitter).endDocument() line: 70      
ContentEmitter.endDocument() line: 76      
XMLFilterImpl.endDocument() line: 473      
SAXParser.endDocument() line: 1230      
XMLValidator.callEndDocument() line: 1146      
XMLDocumentScanner$EndOfInputDispatcher.dispatch(boolean) line: 1499      
XMLDocumentScanner.parseSome(boolean) line: 381      
SAXParser(XMLParser).parse(InputSource) line: 1098      
XMLFilterImpl.parse(InputSource) line: 333      
IdentityTransformer.transform(Source, Result) line: 90      
ImportAction.reolveUrls(InputSource, TimmyImporterFeedback) line: 195      
ImportAction.doImport(InputStream) line: 80      
ImportServlet.doPut(HttpServletRequest, HttpServletResponse) line: 56      
ImportServlet(HttpServlet).service(HttpServletRequest, HttpServletResponse) line: 713      
ImportServlet(HttpServlet).service(ServletRequest, ServletResponse) line: 803      
ApplicationFilterChain.internalDoFilter(ServletRequest, ServletResponse) line: 269      
ApplicationFilterChain.doFilter(ServletRequest, ServletResponse) line: 188      
StandardWrapperValve.invoke(Request, Response) line: 213      
StandardContextValve.invoke(Request, Response) line: 174      
StandardHostValve.invoke(Request, Response) line: 127      
ErrorReportValve.invoke(Request, Response) line: 117      
FastCommonAccessLogValve.invoke(Request, Response) line: 482      
StandardEngineValve.invoke(Request, Response) line: 108      
CoyoteAdapter.service(Request, Response) line: 174      
Http11Processor.process(InputStream, OutputStream) line: 874      
Http11Protocol$JmxHttp11ConnectionHandler(Http11BaseProtocol$Http11ConnectionHandler).processConnection(TcpConnection, Object[]) line: 665      
PoolTcpEndpoint.processSocket(Socket, TcpConnection, Object[]) line: 528      

Hope you can help me with this.

Thank you for all your help so far!

Sincerely,
Henrik Pettersen

...
//BROKEN: Removed our custom XmlFilterImpl extension in order to pin down this problem
//BROKEN: Issue seems to be with how we deal with XML transformations instead
//BROKEN: XMLFilter filter = new UrlResolverXmlFilter();
//BROKEN: So just used the base implementation class. This should always work...
XMLFilter filter = new XMLFilterImpl();
 
XMLReader xmlReader = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
filter.setParent(xmlReader);
 
//then we create a java io pipe, to send the results from the filtering to the escenic import function
PipedInputStream pipedInputStream = new PipedInputStream();
PipedOutputStream pipedOutputStream = new PipedOutputStream(pipedInputStream);
StreamResult streamResult = new StreamResult(pipedOutputStream);
 
//apply the filter to the inputstream
SAXTransformerFactory saxTransformerFactory = (SAXTransformerFactory) TransformerFactory.newInstance();
Transformer transformer = saxTransformerFactory.newTransformer();
SAXSource transformSource = new SAXSource(filter, originalInputSource);
 
//BROKEN: The program never returns from this call!
transformer.transform(transformSource, streamResult);
 
return new InputSource(pipedInputStream);	

Open in new window

SOLUTION
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
Avatar of henrikpettersen

ASKER

Thank you, CEHJ and Objects: I finally got this to work! Couldn't have done it without you :-)

I have attached an outline of my solutions - Hope someone will find this useful!

Many thanks!

Henrik

1. The original inpustream is coming from a HTTPServletRequest. First of all, this is how I am generating the request using curl:
 
    curl -X PUT -H 'Content-type: multipart/mixed stream' -d @src/xml/sample.xml http://dev-server:8080/myapp/import
 
2. In my servlet class:
 
    public void doPut(HttpServletRequest request, HttpServletResponse response){
	        ...snip...		
			ImportAction.doImport(request.getInputStream());
			...snip...
	}
			
3. Perform the import, using pipes, XML transformers and SAX parsers: 
 
    class ImportAction {
        ...snip...
        public void doImport(InputStream httpRequestInputStream) throws Exception{
    		ReaderThread readerThread = translateUrls(httpRequestInputStream);				
			readerThread.start();
			importHandler.importData(new InputSource(readerThread.getTransformedInStream()));
			readerThread.finish();
    	}        
        ...snip...
        
    	private static ReaderThread translateUrls(InputStream originalInputStream) throws Exception{
    			//First we create a filterchain
    			XMLFilter filter = new UrlResolverXmlFilter();
    			XMLReader xmlReader = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
    			filter.setParent(xmlReader);
 
    			//then we create a java io pipe, to send the results from the filtering to the escenic import function
    			PipedInputStream transformedPipedInputStream = new PipedInputStream();
    			PipedOutputStream pipedOutputStream = new PipedOutputStream(transformedPipedInputStream);
    			StreamResult streamResult = new StreamResult(pipedOutputStream);
 
    			//apply the filter to the inputstream
    			SAXTransformerFactory saxTransformerFactory = (SAXTransformerFactory) TransformerFactory.newInstance();
    			Transformer transformer = saxTransformerFactory.newTransformer();
    			InputSource originalInputSource = new InputSource(originalInputStream);
    			SAXSource originalInputSaxSource = new SAXSource(filter, originalInputSource);
 
    			return new ReaderThread(originalInputSaxSource, streamResult, transformer, transformedPipedInputStream);
    	} 
    	
    	...snip...
    }
	
	class ReaderThread extends Thread {
	
        private SAXSource            originalInputSaxSource;
        private StreamResult         streamResult;
        private Transformer          transformer;
        private InputStream          transformedInStream;
    	private boolean              finished                      = false;
		
	    ...snip...
        public void run(){
        	try {
    			transformer.transform(originalInputSaxSource, streamResult);
    			streamResult.getOutputStream().close();
			
    			//Hang around until we are sure we are finished, otherwise we will most likely
    			//get a broken pipe exception from the Escenic parser
    			while (!finished) try { Thread.sleep(500); } catch (InterruptedException e) {};
			
    		} catch (TransformerException e) { 
    			...snip...
    		} catch (IOException e) {
    			..snip...
    		}
    		finally{
    			try {
    				//try it again, just to make sure. The stream could be closed already (see above)!
    				streamResult.getOutputStream().close();
    			} catch (IOException e) {
    				//ignore
    			}
    		}
        }
    
    	public InputStream getTransformedInStream() {
    		return transformedInStream;
    	}
    	
        public void finish(){
        	finished = true;
        }
            	
        ...snip...
    }//end class
    

Open in new window

Java
Java

Java is a platform-independent, object-oriented programming language and run-time environment, designed to have as few implementation dependencies as possible such that developers can write one set of code across all platforms using libraries. Most devices will not run Java natively, and require a run-time component to be installed in order to execute a Java program.

102K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo