asked on

Oracle to Hive using Sqoop - Efficient way to load huge tables (300 Million plus)

I am trying to load data from Oracle to Hive using Sqoop.

Oracle source has 300 Million records
need to load only 6 months worth of data for another table which has 180 million records
The table has lot of columns but I am using row_id column for split which btw is a varchar column and it has its values like 7-1XYZ etc
Have a created date column and I use a where condition with a 6-month clause

both don't have the index
row_id column will have unique values

what is the best way to load it most efficiently?

Am P

Hope you are doing good.

Your table must a some columns upon which grouping can be done. If this is the case, you need to use -m (or --num-mappers) along with --split-by option by using column upon which grouping can be done. Please note that this grouping is different than grouping in oracle. Sqoop divides the data based on the values provided with --split-by option and then does the loading.

Please refer https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_controlling_parallelism

Hope this helps.

This question needs an answer!

Become an EE member today

7 DAY FREE TRIAL

Members can start a 7-Day Free trial then enjoy unlimited access to the platform.

View membership options

Learn why we charge membership fees

We get it - no one likes a content blocker. Take one extra minute and find out why we block content.