Link to home
Start Free TrialLog in
Avatar of sue Tons
sue Tons

asked on

Load files from a directory into two tables automatically.

Data.txt

I am trying to read the files from a directory and load the data into these two table and automate this process
Avatar of Alex [***Alex140181***]
Alex [***Alex140181***]
Flag of Germany image

Why do you want / have to use C# for the data load?! Databases do have neat built-in tools for this purpose ;-) Oracle for instance has the SQL Loader...
Avatar of sue Tons
sue Tons

ASKER

So how can I automate it though?  I don't want to load it manually. I want it to be able to load into the database every time I have a new file in that folder.
There may be several ways to Rome here ;-)
You could setup the scheduling wihtin your OS (Windows task scheduler or crontab on Linux) and call some sort of batch/shell/powershell script, similar to this one here:
https://www.dbasupport.com/forums/showthread.php?32572-automating-amp-scheduling-sqlloader-process

Or you may also setup this directly within the database. For example, Oracle is capable of scheduling jobs that call "external" commands....
That's what I am trying to figure out, how to do that Alex.  Can you   You mean like a stored proc?  i don't know exactly what to do.  Please?
Ok, this setup is untested (I simply don't have the time to set this up properly), but should be fixed with ease in case of problems or errors.
This setup assumes you have set ORACLE_HOME correctly and you have the following folders in addition:
C:\DMC\CFG
C:\DMC\DONE

Put the following 2 files into the CFG folder:
1. PARFILE for titles
OPTIONS 
(
  SKIP=1,
  ROWS=1000,
  PARALLEL=true,
  DIRECT=true,
  SKIP_INDEX_MAINTENANCE=true
)
LOAD DATA
APPEND INTO TABLE title
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"' TRAILING NULLCOLS
(OLD_TITLE_NO, 
 OLD_PARCEL_NO, 
 OLD_PARCEL_AREA, 
 TITLE_NO, 
 PARCEL_NO, 
 PARCEL_AREA, 
 INTEREST_REGISTER_NO, 
 INTEREST_EFFECTIVE_DATE "to_date(substr(:INTEREST_EFFECTIVE_DATE, 1, 10), 'MM/DD/YYYY')",    
 OLD_INT_NO, 
 INTEREST_NO, 
 INT_TYP_CODE, 
 INT_TYPE)

Open in new window

2. PARFILE for interests
OPTIONS 
(
  SKIP=1,
  ROWS=1000,
  PARALLEL=true,
  DIRECT=true,
  SKIP_INDEX_MAINTENANCE=true
)
LOAD DATA
APPEND INTO TABLE interest
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"' TRAILING NULLCOLS
(INTEREST_TRN_TYP, 
 INT_TYPE, 
 INTEREST_NO, 
 INTEREST_REGISTER_NO, 
 TITLE_NO, 
 PARCEL_NO, 
 PARCEL_AREA, 
 INTEREST_EFFECTIVE_DATE "to_date(substr(:INTEREST_EFFECTIVE_DATE, 1, 10), 'MM/DD/YYYY')")

Open in new window


The following batch files may be put "anywhere" you like (or you can combine them):
Keep in mind: in both files the part "username/password@db" has to be changed accordingly ;-)
1. for titles
@ECHO OFF
ECHO  Finding Data to be loaded...
CD C:\DMC
FOR  %%V in (Title*) DO %ORACLE_HOME%\bin\sqlldr.exe username/password@db CONTROL=C:\DMC\CFG\titles.CTL DATA=%%V ERRORS=1500
MOVE Title* C:\DMC\DONE

Open in new window

2. for interests
@ECHO OFF
ECHO  Finding Data to be loaded...
CD C:\DMC
FOR  %%V in (interest*) DO %ORACLE_HOME%\bin\sqlldr.exe username/password@db CONTROL=C:\DMC\CFG\interest.CTL DATA=%%V ERRORS=1500
MOVE interest* C:\DMC\DONE

Open in new window

These batch files can be used within the Windows Task Scheduler to perform these jobs on a daily base (or whatever window you need)...
So  on the PARFILES, instead of using Option, can I  create or replace PROCEDURE Load Title etc.  I am not understanding the Option  part, not familiar with it.   Alex I really appreciate your help, I don't  understand what I am doing.  So when I add these files into the C:\DMC\CFG , and run the batch file, then it logs into C:\DMC\DONE , right once completed?   I need to understand the Option, I don't understand, I am lost.  I have no clue to be honest.  


The 2 PARFILES (parameter files) "just" contain the "code" from above. Create 2 files in the (new) folder
C:\DMC\CFG\

Open in new window

The 2 files will be "titles.CTL" and "interests.CTL"

These 2 files contain processing information for the SQL Loader tool which is beuing called in conjunction with these later on...

The batch file will call the SQLLDR.EXE (SQL Loader) and moves the input files (CSV data) into the "DONE"-folder (to be created) after importing the data into the corresponding tables...

No need for any procedure or such ;-) SQL Loader is THE (built-in) tool for importing that kind of data into the database!
I am so lost Alex I don't mean to be funny, This is too complicated for me as a new developer.  Can you list in steps as to what is needed to be done first please, so i follow your steps.  I am really lost to be honest.   Walk me through please, please? 
Ok, let's step back: what database are you trying to load that data into?
ClientData is a database in Oracle and in that database I have two tables named title and interest.  So I am trying to load data in those two tables.  Interest files  into Interest Table and title files into Title Table.
Excellent :-))
Then, the setup I mentioned above is exactly what you'll need!
Oracle's built-in tool "SQL Loader" is the weapon og choice when it comes to loading data from (flat) files like CSV or such!
It can be completely be command-line driven or can make use of a parameter file, as I suggested above. You just "tell" Oracle with the help of this tool to load some data in some specific tables.

I'm trying to break down some stuff:

PARFILE:

OPTIONS (
  SKIP=1, -------------------------------------------- skip the first row (usually this contains the header names)
  ROWS=1000, ------------------------------------ commit every 1000 rows (default is 2 I guess)
  PARALLEL=true, -------------------------------- make use of parallelism (if applicable/possible)
  DIRECT=true, ------------------------------------ "directly" write data into tables, avoiding the buffer cache (-> faster)
  SKIP_INDEX_MAINTENANCE=true ------- make things even more faster ;-)
)
LOAD DATA
APPEND INTO TABLE title --------------------- where should I put the data & how (method): here APPEND
FIELDS ----------------------------------------------- fields in the file are:
TERMINATED BY "," ------------------------------ terminated/delimted by the character ","
OPTIONALLY ENCLOSED BY '"' -------------- could be enclosed with the quote character
TRAILING NULLCOLS --------------------------- and treat missing fields at the end as null
(OLD_TITLE_NO,  OLD_PARCEL_NO,  OLD_PARCEL_AREA,  TITLE_NO,  PARCEL_NO,  PARCEL_AREA,  INTEREST_REGISTER_NO,  INTEREST_EFFECTIVE_DATE "to_date(substr(:INTEREST_EFFECTIVE_DATE, 1, 10), 'MM/DD/YYYY')",    
 OLD_INT_NO,  INTEREST_NO,  INT_TYP_CODE,  INT_TYPE)
----------------------------------------------------------- the last part is the definition/mapping of the fields/columnn


BATCH FILE:

@ECHO OFF ------------------------------- suppress unwanted output
ECHO  Finding Data to be loaded... -- display some text fyi ;-)
CD C:\DMC --------------------------------- change to the folder where the data (CSV files) reside
FOR  %%V in (Title*) DO %ORACLE_HOME%\bin\sqlldr.exe username/password@db CONTROL=C:\DMC\CFG\titles.CTL DATA=%%V ERRORS=1500
-------------------------------------------------- "simple" for loop which iterates through all files that match "Title*" and call for each of them the sqlldr command
MOVE Title* C:\DMC\DONE ------------ when import is done, move the processed files into the DONE subfolder
I see, WOW thank you for the explanation, I greatly appreciate you so much.  I have learned something I never knew.  
Now the question is, I want to set up a  job in task scheduler, so in the job scheduler do I add the PARFILE CODE: section with The Option(-------  or  the batch file.    Now the question is where do I plug in the code?  Do I run them in command prompt and store the batch file somewhere?  
I'm glad I could help or at least show you some directions ;-)

With this setup, you may just use the Windows Task Scheduler and create one or two simple tasks, let's say with daily or hourly schedules as trigger. For the "action", you'd have to tell the scheduler to call/execute the batch file from above, that's it.
With this setup you'll end up having 2 tasks, one for "Titles" and one for "Interests", though you could combine/merge them into just one if you like...
Using the Windows Task Scheduler is quite straight forward, so you should easily be able to apply all from above :-)

I tried to setup the whole stuff a bit ;-)
Please be cautions when extracting this as it might overwrite your stuff (but shouldn't as these files are all new)...
Due to some restrictions on this site regarding possible malicious file extensions, you need to rename the 2 ".rename2bat" to actual ".bat" files.
The 1 xml file is an export of the Task Scheduler just to show you, how you might setup this in the Scheduler...

DMC.zip
Awesome, I will set it up and test it and see what happens.  I will give you feedback.
@Alex
I realized some of the data I am loading is not consistent for example
In the title table, I have these headings below
Old_title_no ,Old_parcel_no,Old_parcel_area,Title_no,Parcel_no,Parcel_area,Interest Register No,Interest Effective Date, Old_int_no, Interest_no, Int_typ_code, Int_type
152995607,146378391,581,153382833,146378391,10/26/2020 14:51:30,,,,  --data  is missing here on this one but I would like date to go under date no matter what is missing.
152995708,146378414,290,153382855,146378414,290,,10/26/2020 14:51:30,,,, -- this one is good
152075347,111827763,,153382237,111827763,,108661181,10/26/2020 14:13:14,185613666,189112035,43,"Mortgage " -- this one is good.

How can I automatically match the  columns with the data I have despite the format of the data.  , its one of the challenges I am now facing.  





152995607,146378391,581,153382833,146378391,10/26/2020 14:51:30,,,,  --data  is missing here on this one but I would like date to go under date no matter what is missing.
This problem lies within the data itself! The line should read:
152995607,146378391,581,153382833,146378391,,10/26/2020 14:51:30,,,,

How can Oracle (or any other vendor) ever tell that there's a missing value?! Impossible ;-)
If you have a certain file specification, the files should comply with it!!
what about C# where you can call column by column  like  Old_title_no  is always index position 0 and has 9 characters, etc and date will always be date, When the data is inconsistent, its always a challenge.  Not sure how go about it anymore.
ASKER CERTIFIED SOLUTION
Avatar of Alex [***Alex140181***]
Alex [***Alex140181***]
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial