Solved

Daily import using DTS- Only want new records.

Posted on 2006-11-15
11
458 Views
Last Modified: 2008-01-09
I have a DTS package that imports several tables from an ODBC compilant DB into my SQL data warehouse.  The rows are deleted in the destination tables before import.
 
A few of the tables are large (over 1,000,000 rows).  These are history tables such as INVOICE_HISTORY_DETAILS.  I want to speed up the daily import by only brining in data that doesen't exist in the destination table or at a minimum use a query to minimize the date range (I don't want to import the last 6 years of history every day).  

Of course if I do import every row where inv_date > "some recent date" how do I make sure that only data that doesn't exist is appended to the table?

Thanks
0
Comment
Question by:jcneil4
  • 6
  • 5
11 Comments
 
LVL 30

Expert Comment

by:nmcdermaid
ID: 17954679
This is what I do:

1. Create a 'control table' that contains a single field holding the beginning date of the range that I want to import
2. Create a DTS which
   1. Loads this start date into a global variable
   2. Deletes all records in the destination table that exist after this date
   3. Transfers data with a data pump which contains a select statement which only selects records after this date.

Every day the date in the control table gets shuffled along one day. You can also manually change it to reload data if required.


>> how do I make sure that only data that doesn't exist is appended to the table

Unless you have a unique key which you can use to match then then you can't.

You need to investigate your source system and see if you can rely on this date method to correctly load data.

i.e. do people back date their transactions?

You can reduce this risk by making the load date always a couple of months behind.
0
 
LVL 1

Author Comment

by:jcneil4
ID: 17965190
I'm sorry, I should have mentioned that I'm a SQL newbie....
 
I created a table with one column called start_date, data type is smalldatetime, then I entered a recent date as the first row. Is this what you meant by
"Create a 'control table' that contains a single field holding the beginning date of the range that I want to import"?

Assuming that is what you meant then on to the next step...

I have a DTS that deletes data from destination (SQL) table then imports entire table from source- it does this for about 12 tables.  I only want to use the date filter on 3 of these tables- can you roughly describe how to can I modify my existing DTS to:
 
   1. Loads this start date into a global variable
   2. Deletes all records in the destination table that exist after this date
   3. Transfers data with a data pump which contains a select statement which only selects records after this date.

Thanks a lot..
0
 
LVL 30

Accepted Solution

by:
nmcdermaid earned 500 total points
ID: 17970652
Yep thats what I meant by a control table.

To create your DTS:

1. Go to Data Transformation Sevices, Local Packages, Right click and create a new package
2. Press Connection then MS OLE DB connection (the first one). Enter detail to connect to the destination database, give it a decent name and press OK
3. Create your globl variable: Ensure the connection you just created is NOT selected then press Package/Properties. Press the global variables tab. In the first row here type your globl variable name i.e. gStartDate, pick a data type of Date, and type 1900-01-01 in the value column and press OK.
4. Now load the value from the table into the global variable: Press Task then Execute SQL Task. In here type SELECT YourField FROM YourControlTable. Press Parse query to ensure its valid. Press the parameters button. Press the Output Parameters tab. Select Row Value. You'll see your table field name below. In the output global variable column select your global variable. Press OK

Now test it:

1. Press Packages/Properties go to the global variables tab and note that the value of your GV is 1900-01-01
2. Right click in the Execute SQL task (yellow cylinder) and press Execute
3. Go back the GV viewer and note that it has loaded in your date value. Hooray!!


Unfortunately I have to run now but I will post back in a few hours. In the meantime see if you can work out how to use an execute SQL task to delete data from your destination table based on the GV.

Basically you want to drop a new Execute SQL task on the page then put some SQL like this in there:

DELETE YourTable WHERE StartDate > ?


the ? is a placeholder for the GV and this time should allow you to fill in details on the input parameters tab.

You may want to do this on a test system in case it deletes the wrong stuff!!


0
Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

 
LVL 1

Author Comment

by:jcneil4
ID: 17989742
Everything worked great- Now my problem is that I am trying to use the GV in the Transform Data Task to only retrieve data after the date from the source DB with something like:

SELECT ARN_InvHistoryHeader.*
FROM ARN_InvHistoryHeader
WHERE ARN_InvHistoryHeader.invoiceDate > ?

And when I click on Parameters to select the GV I am getting the error "the SQL statement does not contain any parameters."  I've tried "?", '?', (?), etc... any ideas?
0
 
LVL 30

Expert Comment

by:nmcdermaid
ID: 17994152
hmmm this has worked for me but only in Sybase and SQL Server databases. What type of source database do you have?

What happens if you:

A. Remove the where clause and hit 'Parse'
B. Add the where clause with a constant and hit parse?
C. Add the where clause with your parameter placeholder and hit parse?

Here is a link to a workaround method if you can't get it to work. This is a good site generally.

http://www.sqldts.com/default.aspx?205


If you don't want to use a global variable you can also use native date functions for example:

SELECT ARN_InvHistoryHeader.*
FROM ARN_InvHistoryHeader
WHERE ARN_InvHistoryHeader.invoiceDate > DATEADD(dd,-10,GETDATE())

Will get all records from ten days ago. The only thing wrong with that is its stuck on ten days - its difficult to reload history if you need to.


0
 
LVL 1

Author Comment

by:jcneil4
ID: 17995717
The source database is Sage MAS200- providex 32bit ODBC driver

What happens if you:

A. Remove the where clause and hit 'Parse'-
it is successful.

B. Add the where clause with a constant and hit parse?
It works only in this format: WHERE ARN_InvHistoryHeader."InvoiceDate" > {d '2005-07-26'}

C. Add the where clause with your parameter placeholder and hit parse?
I'm not sure what you mean by this...
0
 
LVL 30

Expert Comment

by:nmcdermaid
ID: 18001518
Try this

SELECT ARN_InvHistoryHeader.*
FROM ARN_InvHistoryHeader
WHERE ARN_InvHistoryHeader.invoiceDate > {d ? }


Its possible that the ODBC driver doesn't support parameters. Do you have any doco with it?




0
 
LVL 1

Author Comment

by:jcneil4
ID: 18020006
Failed.  

It doesn't have any documentation.. the ODBC driver is a really old version but we are forced to use it because of our version of the database.  I'm sure that's what is wrong (no support of parameters).  If I ever get to figure out how to support parameters, I'll be able to get it working based on what you've told me so far.  

Thanks for the help.. I'll close and award you the points.

0
 
LVL 30

Expert Comment

by:nmcdermaid
ID: 18035945
Can you create a view in the source database? That view could have a hard coded two month window or something?

You still won't be able to use GV's but at least you could reduce the data set
0
 
LVL 1

Author Comment

by:jcneil4
ID: 18036810
No- I cant.  I'm just going to have to import the whole table.. I'll live.  Thanks again.
0
 
LVL 30

Expert Comment

by:nmcdermaid
ID: 18050169
Actually there is one more thing you could do.... you could dynamically assign the SQL in the source query everytime.

This link:

http://www.sqldts.com/default.aspx?213


Shows roughly how to do it, except you want to use the SourceSQLStatement property instead of the SourceObjectName property.
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Some sers suddenly getting error popup msg 28 89
t-sql Joining Issue 10 37
convert in derived column 7 29
Creating SQL script using SQL data and SQL script 8 38
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
Via a live example combined with referencing Books Online, show some of the information that can be extracted from the Catalog Views in SQL Server.
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question