Link to home
Start Free TrialLog in
Avatar of jcneil4
jcneil4

asked on

Daily import using DTS- Only want new records.

I have a DTS package that imports several tables from an ODBC compilant DB into my SQL data warehouse.  The rows are deleted in the destination tables before import.
 
A few of the tables are large (over 1,000,000 rows).  These are history tables such as INVOICE_HISTORY_DETAILS.  I want to speed up the daily import by only brining in data that doesen't exist in the destination table or at a minimum use a query to minimize the date range (I don't want to import the last 6 years of history every day).  

Of course if I do import every row where inv_date > "some recent date" how do I make sure that only data that doesn't exist is appended to the table?

Thanks
Avatar of nmcdermaid
nmcdermaid

This is what I do:

1. Create a 'control table' that contains a single field holding the beginning date of the range that I want to import
2. Create a DTS which
   1. Loads this start date into a global variable
   2. Deletes all records in the destination table that exist after this date
   3. Transfers data with a data pump which contains a select statement which only selects records after this date.

Every day the date in the control table gets shuffled along one day. You can also manually change it to reload data if required.


>> how do I make sure that only data that doesn't exist is appended to the table

Unless you have a unique key which you can use to match then then you can't.

You need to investigate your source system and see if you can rely on this date method to correctly load data.

i.e. do people back date their transactions?

You can reduce this risk by making the load date always a couple of months behind.
Avatar of jcneil4

ASKER

I'm sorry, I should have mentioned that I'm a SQL newbie....
 
I created a table with one column called start_date, data type is smalldatetime, then I entered a recent date as the first row. Is this what you meant by
"Create a 'control table' that contains a single field holding the beginning date of the range that I want to import"?

Assuming that is what you meant then on to the next step...

I have a DTS that deletes data from destination (SQL) table then imports entire table from source- it does this for about 12 tables.  I only want to use the date filter on 3 of these tables- can you roughly describe how to can I modify my existing DTS to:
 
   1. Loads this start date into a global variable
   2. Deletes all records in the destination table that exist after this date
   3. Transfers data with a data pump which contains a select statement which only selects records after this date.

Thanks a lot..
ASKER CERTIFIED SOLUTION
Avatar of nmcdermaid
nmcdermaid

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of jcneil4

ASKER

Everything worked great- Now my problem is that I am trying to use the GV in the Transform Data Task to only retrieve data after the date from the source DB with something like:

SELECT ARN_InvHistoryHeader.*
FROM ARN_InvHistoryHeader
WHERE ARN_InvHistoryHeader.invoiceDate > ?

And when I click on Parameters to select the GV I am getting the error "the SQL statement does not contain any parameters."  I've tried "?", '?', (?), etc... any ideas?
hmmm this has worked for me but only in Sybase and SQL Server databases. What type of source database do you have?

What happens if you:

A. Remove the where clause and hit 'Parse'
B. Add the where clause with a constant and hit parse?
C. Add the where clause with your parameter placeholder and hit parse?

Here is a link to a workaround method if you can't get it to work. This is a good site generally.

http://www.sqldts.com/default.aspx?205


If you don't want to use a global variable you can also use native date functions for example:

SELECT ARN_InvHistoryHeader.*
FROM ARN_InvHistoryHeader
WHERE ARN_InvHistoryHeader.invoiceDate > DATEADD(dd,-10,GETDATE())

Will get all records from ten days ago. The only thing wrong with that is its stuck on ten days - its difficult to reload history if you need to.


Avatar of jcneil4

ASKER

The source database is Sage MAS200- providex 32bit ODBC driver

What happens if you:

A. Remove the where clause and hit 'Parse'-
it is successful.

B. Add the where clause with a constant and hit parse?
It works only in this format: WHERE ARN_InvHistoryHeader."InvoiceDate" > {d '2005-07-26'}

C. Add the where clause with your parameter placeholder and hit parse?
I'm not sure what you mean by this...
Try this

SELECT ARN_InvHistoryHeader.*
FROM ARN_InvHistoryHeader
WHERE ARN_InvHistoryHeader.invoiceDate > {d ? }


Its possible that the ODBC driver doesn't support parameters. Do you have any doco with it?




Avatar of jcneil4

ASKER

Failed.  

It doesn't have any documentation.. the ODBC driver is a really old version but we are forced to use it because of our version of the database.  I'm sure that's what is wrong (no support of parameters).  If I ever get to figure out how to support parameters, I'll be able to get it working based on what you've told me so far.  

Thanks for the help.. I'll close and award you the points.

Can you create a view in the source database? That view could have a hard coded two month window or something?

You still won't be able to use GV's but at least you could reduce the data set
Avatar of jcneil4

ASKER

No- I cant.  I'm just going to have to import the whole table.. I'll live.  Thanks again.
Actually there is one more thing you could do.... you could dynamically assign the SQL in the source query everytime.

This link:

http://www.sqldts.com/default.aspx?213


Shows roughly how to do it, except you want to use the SourceSQLStatement property instead of the SourceObjectName property.