Incremental load after first initial Load

Posted on 2013-06-05
Last Modified: 2016-02-11
Hi experts,

Here's my concerns.I have initially loaded the huge data from the source.It took about 2 hrs,and also did the lookup on some column so as to compare with the source just to load the new rows.But my concern is my lookup will be fetching all the records from the source and it will take the same 2 hrs and will load the new records,but is there a a way to bypass the previous load and check just the new incoming rows in the source so that way load will be much faster while doing the incremental.Also there's no unique identifier like createddate,modifieddate on the source.Attached is my data as well.

Thanks in advance.
Question by:Josh2442
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
  • 2
  • +1
LVL 34

Expert Comment

ID: 39225317
the data come from a sql server database? or, is it a .csv or .xslx file you read?

a csv should be much smaller (therefore faster) and could be compressed.

also you could read it in big chunks rather than reading record by record.

you could make a csv out of the xlsx by automation or out of database. and you could add some kind of load id/load timestamp that could help for incremental load.

LVL 21

Expert Comment

by:Jason Yousef, MS
ID: 39225329
so your source is an excel file? or a DB?
and how many records have you loaded in these 2 hours?

typically you would use any unique ID from the source, or date column against your load, also if your source is sql server you would use change data tracking or if you're using sql server enterprise edition you would use CDC (change data capture)

if none of these applicable to you, then optimize your package that loads from the excel file, load these into a staging table, apply CDC or change tracking, or use merge join, lookup, slowly changing dimension or even straight T-sql to do the upserts.

let me know if you need more info
LVL 19

Expert Comment

ID: 39225342
Hi Josh2442,

it would be difficult to do without a primary key. Are you sure there is none? Can you ask to include additional identifiers to the load?

You could try to use CHECKSUM function. If you create indexed computed column, you could then compare checksum of incoming data against what is already inserted. Check this script for illustration:
--Create sample table
	SomeNumber INT,
	SomeString VARCHAR(MAX),
	RowCheckSum AS CHECKSUM(SomeNumber, SomeString)

--Create index on checksum
CREATE UNIQUE NONCLUSTERED INDEX [IX_RowCheckSum] ON [dbo].[YourTable]([RowCheckSum] ASC)

--Insert some test data data
INSERT INTO YourTable VALUES (1,'A'),(2,'B'),(3,NULL)


--Prepare some more
CREATE TABLE #new_data 
	SomeNumber INT,
	SomeString VARCHAR(MAX),
INSERT INTO #new_data VALUES (3,NULL),(4,'New Data') 

--Try to insert it. Only one row should be inserted
INSERT INTO YourTable(SomeNumber, SomeString)
SELECT SomeNumber, SomeString 
FROM #new_data 
WHERE CHECKSUM(SomeNumber, SomeString) NOT IN (SELECT RowCheckSum FROM YourTable)


Open in new window

Ransomware: The New Cyber Threat & How to Stop It

This infographic explains ransomware, type of malware that blocks access to your files or your systems and holds them hostage until a ransom is paid. It also examines the different types of ransomware and explains what you can do to thwart this sinister online threat.  


Author Comment

ID: 39225373
Source is from another db and i cannot do any dml on that source like doing identity insert on.Have you heard about cache monitor which use csql somewhat similar to that of tsql but not a whole lot, this is where source is coming from. I m loading that into regular sql server which is ssms.
LVL 19

Expert Comment

ID: 39225587
Can you try CHECKSUM approach then?

Author Comment

ID: 39226135
Hi expert,sorry I didnt analyze the data properly.Now when I went through all those data,rowid was incrementing but in delimited format.So let me know how we do insert based on that rowid.My approach was to compare the maxid row from destination to the source and load the data greater than that maxid.But its really hard to figure out how data is incrementing based on that delimited format.Here's how data is incrementing--i have listed the data how the first 3 numbers changed from 209 to 210 in rowid.
LVL 21

Accepted Solution

Jason Yousef, MS earned 500 total points
ID: 39226159
Yeah I was also wondering  when I saw that column !! anyway you can write to a control table what rowID and load date, is the latest rowID you have on your destination,
and in your ETL, grab that rowID in a variable, and load greater than that rowID.

or use the "lookup transformation" based on the RowID.

Author Comment

ID: 39229297
Hi expert,
I was able to do the incremental load using that rowid,but I have a question,like what if someone changes the particular column  within the source as my incremental will look only for maxrowid,and will insert the new rows.But I need to do the update or delete based on the changed column which is obviously will be the row less than that maxrowid.Does it have to fectch the whole record?

Author Comment

ID: 39233460
Hi expert,,
I really need big favor. I m stuck @ sql command. Since the source is i can't directly pass the parameter like in oledb sql comman.i have created 2 variables maxdate n mindate.Mindate is 60 days less than maxdate. How do i write a query where columnname is between these 2 variables.i need this query in sql command in ssis expression buider .

Featured Post

Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
CAST issue with SQL 2000 15 44
how to solve "recursive" database tables 2 53
SQL 2012 Report Builder 3.0 query 2 21
SQL Convert rows to columns 5 32
A Stored Procedure in Microsoft SQL Server is a powerful feature that it can be used to execute the Data Manipulation Language (DML) or Data Definition Language (DDL). Depending on business requirements, a single Stored Procedure can return differe…
Recently we ran in to an issue while running some SQL jobs where we were trying to process the cubes.  We got an error saying failure stating 'NT SERVICE\SQLSERVERAGENT does not have access to Analysis Services. So this is a way to automate that wit…
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question