Solved

Daily import using DTS- Only want new records.

Posted on 2006-11-15
11
440 Views
Last Modified: 2008-01-09
I have a DTS package that imports several tables from an ODBC compilant DB into my SQL data warehouse.  The rows are deleted in the destination tables before import.
 
A few of the tables are large (over 1,000,000 rows).  These are history tables such as INVOICE_HISTORY_DETAILS.  I want to speed up the daily import by only brining in data that doesen't exist in the destination table or at a minimum use a query to minimize the date range (I don't want to import the last 6 years of history every day).  

Of course if I do import every row where inv_date > "some recent date" how do I make sure that only data that doesn't exist is appended to the table?

Thanks
0
Comment
Question by:jcneil4
  • 6
  • 5
11 Comments
 
LVL 30

Expert Comment

by:nmcdermaid
ID: 17954679
This is what I do:

1. Create a 'control table' that contains a single field holding the beginning date of the range that I want to import
2. Create a DTS which
   1. Loads this start date into a global variable
   2. Deletes all records in the destination table that exist after this date
   3. Transfers data with a data pump which contains a select statement which only selects records after this date.

Every day the date in the control table gets shuffled along one day. You can also manually change it to reload data if required.


>> how do I make sure that only data that doesn't exist is appended to the table

Unless you have a unique key which you can use to match then then you can't.

You need to investigate your source system and see if you can rely on this date method to correctly load data.

i.e. do people back date their transactions?

You can reduce this risk by making the load date always a couple of months behind.
0
 
LVL 1

Author Comment

by:jcneil4
ID: 17965190
I'm sorry, I should have mentioned that I'm a SQL newbie....
 
I created a table with one column called start_date, data type is smalldatetime, then I entered a recent date as the first row. Is this what you meant by
"Create a 'control table' that contains a single field holding the beginning date of the range that I want to import"?

Assuming that is what you meant then on to the next step...

I have a DTS that deletes data from destination (SQL) table then imports entire table from source- it does this for about 12 tables.  I only want to use the date filter on 3 of these tables- can you roughly describe how to can I modify my existing DTS to:
 
   1. Loads this start date into a global variable
   2. Deletes all records in the destination table that exist after this date
   3. Transfers data with a data pump which contains a select statement which only selects records after this date.

Thanks a lot..
0
 
LVL 30

Accepted Solution

by:
nmcdermaid earned 500 total points
ID: 17970652
Yep thats what I meant by a control table.

To create your DTS:

1. Go to Data Transformation Sevices, Local Packages, Right click and create a new package
2. Press Connection then MS OLE DB connection (the first one). Enter detail to connect to the destination database, give it a decent name and press OK
3. Create your globl variable: Ensure the connection you just created is NOT selected then press Package/Properties. Press the global variables tab. In the first row here type your globl variable name i.e. gStartDate, pick a data type of Date, and type 1900-01-01 in the value column and press OK.
4. Now load the value from the table into the global variable: Press Task then Execute SQL Task. In here type SELECT YourField FROM YourControlTable. Press Parse query to ensure its valid. Press the parameters button. Press the Output Parameters tab. Select Row Value. You'll see your table field name below. In the output global variable column select your global variable. Press OK

Now test it:

1. Press Packages/Properties go to the global variables tab and note that the value of your GV is 1900-01-01
2. Right click in the Execute SQL task (yellow cylinder) and press Execute
3. Go back the GV viewer and note that it has loaded in your date value. Hooray!!


Unfortunately I have to run now but I will post back in a few hours. In the meantime see if you can work out how to use an execute SQL task to delete data from your destination table based on the GV.

Basically you want to drop a new Execute SQL task on the page then put some SQL like this in there:

DELETE YourTable WHERE StartDate > ?


the ? is a placeholder for the GV and this time should allow you to fill in details on the input parameters tab.

You may want to do this on a test system in case it deletes the wrong stuff!!


0
 
LVL 1

Author Comment

by:jcneil4
ID: 17989742
Everything worked great- Now my problem is that I am trying to use the GV in the Transform Data Task to only retrieve data after the date from the source DB with something like:

SELECT ARN_InvHistoryHeader.*
FROM ARN_InvHistoryHeader
WHERE ARN_InvHistoryHeader.invoiceDate > ?

And when I click on Parameters to select the GV I am getting the error "the SQL statement does not contain any parameters."  I've tried "?", '?', (?), etc... any ideas?
0
 
LVL 30

Expert Comment

by:nmcdermaid
ID: 17994152
hmmm this has worked for me but only in Sybase and SQL Server databases. What type of source database do you have?

What happens if you:

A. Remove the where clause and hit 'Parse'
B. Add the where clause with a constant and hit parse?
C. Add the where clause with your parameter placeholder and hit parse?

Here is a link to a workaround method if you can't get it to work. This is a good site generally.

http://www.sqldts.com/default.aspx?205


If you don't want to use a global variable you can also use native date functions for example:

SELECT ARN_InvHistoryHeader.*
FROM ARN_InvHistoryHeader
WHERE ARN_InvHistoryHeader.invoiceDate > DATEADD(dd,-10,GETDATE())

Will get all records from ten days ago. The only thing wrong with that is its stuck on ten days - its difficult to reload history if you need to.


0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 
LVL 1

Author Comment

by:jcneil4
ID: 17995717
The source database is Sage MAS200- providex 32bit ODBC driver

What happens if you:

A. Remove the where clause and hit 'Parse'-
it is successful.

B. Add the where clause with a constant and hit parse?
It works only in this format: WHERE ARN_InvHistoryHeader."InvoiceDate" > {d '2005-07-26'}

C. Add the where clause with your parameter placeholder and hit parse?
I'm not sure what you mean by this...
0
 
LVL 30

Expert Comment

by:nmcdermaid
ID: 18001518
Try this

SELECT ARN_InvHistoryHeader.*
FROM ARN_InvHistoryHeader
WHERE ARN_InvHistoryHeader.invoiceDate > {d ? }


Its possible that the ODBC driver doesn't support parameters. Do you have any doco with it?




0
 
LVL 1

Author Comment

by:jcneil4
ID: 18020006
Failed.  

It doesn't have any documentation.. the ODBC driver is a really old version but we are forced to use it because of our version of the database.  I'm sure that's what is wrong (no support of parameters).  If I ever get to figure out how to support parameters, I'll be able to get it working based on what you've told me so far.  

Thanks for the help.. I'll close and award you the points.

0
 
LVL 30

Expert Comment

by:nmcdermaid
ID: 18035945
Can you create a view in the source database? That view could have a hard coded two month window or something?

You still won't be able to use GV's but at least you could reduce the data set
0
 
LVL 1

Author Comment

by:jcneil4
ID: 18036810
No- I cant.  I'm just going to have to import the whole table.. I'll live.  Thanks again.
0
 
LVL 30

Expert Comment

by:nmcdermaid
ID: 18050169
Actually there is one more thing you could do.... you could dynamically assign the SQL in the source query everytime.

This link:

http://www.sqldts.com/default.aspx?213


Shows roughly how to do it, except you want to use the SourceSQLStatement property instead of the SourceObjectName property.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Nowadays, some of developer are too much worried about data. Who is using data, who is updating it etc. etc. Because, data is more costlier in term of money and information. So security of data is focusing concern in days. Lets' understand the Au…
Introduction In my previous article (http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SSIS/A_9150-Loading-XML-Using-SSIS.html) I showed you how the XML Source component can be used to load XML files into a SQL Server database, us…
Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now