Link to home
Start Free TrialLog in
Avatar of condor888
condor888

asked on

How do I use Java to process XML data stream efficiently?

I have massive amount of XML-formatted data coming from TCP connection.  I need to use Java to process it so that it can be further processed by Hadoop. What is the best way to process it efficiently using Java?
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

When you say process it, what exactly do you want to do with it before hadoop gets it?
Avatar of condor888
condor888

ASKER

Hi gurpsbassi, I just want to either convert the XML to Java objects or using Java to store the XML into a database so that Hadoop can continue to process it. Any idea how may I use Java to do that efficiently?
Can you not store the files directly into HDFS?
How can I continue to analyze the XML after I store the files to HDFS?
Are you needing Hadoop to process this in realtime? or as a batch job?
Either way is fine. So my question is that do I need to use Java to pre-process it before passing it onto Hadoop.
ASKER CERTIFIED SOLUTION
Avatar of gurpsbassi
gurpsbassi
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial