Link to home
Start Free TrialLog in
Avatar of MIHIR KAR
MIHIR KARFlag for India

asked on

Oracle to Hadoop HDFS and vice versa!

Hi Expert,

Could anybody please guide me how to load data from Oracle DB to Hadoop HDFS and the result back to Oracle DB.

Thanks in Advance!
Avatar of Sean Stuber
Sean Stuber

That's a much bigger question than you might think.

At simplest level, if your Oracle server can talk to the HDFS, then you can use utl_file to write data from the db into the Hadoop filesystem.
Any other kind of data unloader that produces a format consumable by your Hadoop applications would work - common formats being CSV, JSON, or XML.

Going the other direction from HDFS to Oracle, if the formats are compatible with your loading tools the sql*loader or external tables can suffice nicely.

If you have more complicated and/or varied data structures either in oracle or hadoop then you may need more specialized tools to parse and propagate the data between them in a consistent manner.
Also, both of the previous examples assume the Oracle server has read/write access to the HDFS.  If not, then you'll need tools to handle that as well.

What will work for YOUR data in YOUR environment is impossible to say.
Avatar of MIHIR KAR

ASKER

Thank you @Sdstuber for the comment.

Here i have data comes from multiple source system into Oracle, now a days i feel the Oracle is not enough to handle the large set of records,

On  top of Oracle i'm looking for Hadoop HDFS to store and process the data using batch job and again only the processed(result) data want to move into Oracle for my require transaction.

Make sense?  Please leave your comment if there any other way to process the data with Batch Job operation!
>>>  if there any other way to process the data

Again, that's a really big question.  

There are, of course, many ways to process data.
There's a good chance you could still process your data all within Oracle and never use Hadoop.  Maybe not, but it's worth considering.

Within Hadoop itself, your batch process could be written any number of ways with any number of varied tools.
I'm not a Hadoop expert, but even if I was, it would be impossible to provide specifics for an open-ended question like that.
I agree with sdstuber's comment about the possibility of doing all this work in Oracle and not needing Hadoop at all.  A couple options that could help with this are: global temporary tables and/or a separate Oracle database that is not in archivelog mode to do the processing, then you import just the processed results into your main Oracle database.
This question needs an answer!
Become an EE member today
7 DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.