condor888
asked on
How do I use Java to process XML data stream efficiently?
I have massive amount of XML-formatted data coming from TCP connection. I need to use Java to process it so that it can be further processed by Hadoop. What is the best way to process it efficiently using Java?
Try http://docs.oracle.com/javase/8/docs/api/javax/xml/stream/XMLStreamReader.html
When you say process it, what exactly do you want to do with it before hadoop gets it?
ASKER
Hi gurpsbassi, I just want to either convert the XML to Java objects or using Java to store the XML into a database so that Hadoop can continue to process it. Any idea how may I use Java to do that efficiently?
Can you not store the files directly into HDFS?
ASKER
How can I continue to analyze the XML after I store the files to HDFS?
Are you needing Hadoop to process this in realtime? or as a batch job?
ASKER
Either way is fine. So my question is that do I need to use Java to pre-process it before passing it onto Hadoop.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.