SolvedPrivate

Incremental load after first initial Load

Posted on 2013-06-05
9
50 Views
Last Modified: 2016-02-11
Hi experts,

Here's my concerns.I have initially loaded the huge data from the source.It took about 2 hrs,and also did the lookup on some column so as to compare with the source just to load the new rows.But my concern is my lookup will be fetching all the records from the source and it will take the same 2 hrs and will load the new records,but is there a a way to bypass the previous load and check just the new incoming rows in the source so that way load will be much faster while doing the incremental.Also there's no unique identifier like createddate,modifieddate on the source.Attached is my data as well.

Thanks in advance.
kk.xlsx
0
Comment
Question by:Josh2442
  • 4
  • 2
  • 2
  • +1
9 Comments
 
LVL 32

Expert Comment

by:sarabande
ID: 39225317
the data come from a sql server database? or, is it a .csv or .xslx file you read?

a csv should be much smaller (therefore faster) and could be compressed.

also you could read it in big chunks rather than reading record by record.

you could make a csv out of the xlsx by automation or out of database. and you could add some kind of load id/load timestamp that could help for incremental load.

Sara
0
 
LVL 21

Expert Comment

by:huslayer
ID: 39225329
Hi,
so your source is an excel file? or a DB?
and how many records have you loaded in these 2 hours?

typically you would use any unique ID from the source, or date column against your load, also if your source is sql server you would use change data tracking or if you're using sql server enterprise edition you would use CDC (change data capture)

http://msdn.microsoft.com/en-us/library/cc280519(v=sql.105).aspx

if none of these applicable to you, then optimize your package that loads from the excel file, load these into a staging table, apply CDC or change tracking, or use merge join, lookup, slowly changing dimension or even straight T-sql to do the upserts.

let me know if you need more info
0
 
LVL 19

Expert Comment

by:Rimvis
ID: 39225342
Hi Josh2442,

it would be difficult to do without a primary key. Are you sure there is none? Can you ask to include additional identifiers to the load?

You could try to use CHECKSUM function. If you create indexed computed column, you could then compare checksum of incoming data against what is already inserted. Check this script for illustration:
--Create sample table
CREATE TABLE YourTable
(
	SomeNumber INT,
	SomeString VARCHAR(MAX),
	RowCheckSum AS CHECKSUM(SomeNumber, SomeString)
)	

--Create index on checksum
CREATE UNIQUE NONCLUSTERED INDEX [IX_RowCheckSum] ON [dbo].[YourTable]([RowCheckSum] ASC)


--Insert some test data data
INSERT INTO YourTable VALUES (1,'A'),(2,'B'),(3,NULL)

SELECT * FROM YourTable

--Prepare some more
CREATE TABLE #new_data 
(
	SomeNumber INT,
	SomeString VARCHAR(MAX),
)
INSERT INTO #new_data VALUES (3,NULL),(4,'New Data') 


--Try to insert it. Only one row should be inserted
INSERT INTO YourTable(SomeNumber, SomeString)
SELECT SomeNumber, SomeString 
FROM #new_data 
WHERE CHECKSUM(SomeNumber, SomeString) NOT IN (SELECT RowCheckSum FROM YourTable)

SELECT * FROM YourTable

Open in new window

0
 

Author Comment

by:Josh2442
ID: 39225373
Source is from another db and i cannot do any dml on that source like doing identity insert on.Have you heard about cache monitor which use csql somewhat similar to that of tsql but not a whole lot, this is where source is coming from. I m loading that into regular sql server which is ssms.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 19

Expert Comment

by:Rimvis
ID: 39225587
Can you try CHECKSUM approach then?
0
 

Author Comment

by:Josh2442
ID: 39226135
Hi expert,sorry I didnt analyze the data properly.Now when I went through all those data,rowid was incrementing but in delimited format.So let me know how we do insert based on that rowid.My approach was to compare the maxid row from destination to the source and load the data greater than that maxid.But its really hard to figure out how data is incrementing based on that delimited format.Here's how data is incrementing--i have listed the data how the first 3 numbers changed from 209 to 210 in rowid.
0
 
LVL 21

Accepted Solution

by:
huslayer earned 500 total points
ID: 39226159
Yeah I was also wondering  when I saw that column !! anyway you can write to a control table what rowID and load date, is the latest rowID you have on your destination,
and in your ETL, grab that rowID in a variable, and load greater than that rowID.

or use the "lookup transformation" based on the RowID.
0
 

Author Comment

by:Josh2442
ID: 39229297
Hi expert,
I was able to do the incremental load using that rowid,but I have a question,like what if someone changes the particular column  within the source as my incremental will look only for maxrowid,and will insert the new rows.But I need to do the update or delete based on the changed column which is obviously will be the row less than that maxrowid.Does it have to fectch the whole record?
0
 

Author Comment

by:Josh2442
ID: 39233460
Hi expert,,
I really need big favor. I m stuck @ ado.net sql command. Since the source is ado.net i can't directly pass the parameter like in oledb sql comman.i have created 2 variables maxdate n mindate.Mindate is 60 days less than maxdate. How do i write a query where columnname is between these 2 variables.i need this query in ado.net sql command in ssis expression buider .
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Suggested Solutions

Lessons learned during ten years of interviewing for SQL Server Integration Services (SSIS) and other Extract-Transform-Load (ETL) contract roles and two years of staff manager interviewing contractors.
How to leverage one TLS certificate to encrypt Microsoft SQL traffic and Remote Desktop Services, versus creating multiple tickets for the same server.
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
Via a live example, show how to shrink a transaction log file down to a reasonable size.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now