SolvedPrivate

how to check if duplicate insert?

Posted on 2014-04-12
6
31 Views
Last Modified: 2016-02-10
Everyday I have a set of files like Excel data, text file data that I need to insert into a database.

I want to know if there is an auto method to check if the data has been inserted into the database first before running an insert. I want to avoid duplicate in the database. Or is there a an auto method to check if the file data has been inserted into. If yes, do not run the the file. Otherwise, run the insert.

I want to use SSIS/ETL to do that. can someone help me if there is an auto method to check for this purpose?


Thanks,
0
Comment
Question by:wasabi3689
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
6 Comments
 
LVL 12

Assisted Solution

by:Harish Varghese
Harish Varghese earned 120 total points
ID: 39996960
Hello,

Do you really want to compare the records in the file with the records in the database? If so, the best way may be to load the data first into a temporary table in the database, then compare the records with the actual table and insert only the new records. Or if you have any option to distinguish the excel/text files by any means (like filename), then you may store the names of the processed files in a table. And check if the file was processed earlier by looking into this table.

-Harish
0
 

Author Comment

by:wasabi3689
ID: 39996964
can we set up an auto way to insert the processed file name to a table?

can ETL have a function to do it?
0
 
LVL 34

Accepted Solution

by:
ste5an earned 60 total points
ID: 39997229
Create an appropriate UNIQUE INDEX on your columns. And use the EXIST() predicate:

INSERT INTO destinationTable ( columnList )
	SELECT	columnList
	FROM	stagingTable S
	WHERE NOT EXISTS (
		SELECT	*
		FROM	destinationTable D
		WHERE	D.uniqueColumnList = S.uniqueColumnList
	);

Open in new window

0
Containers & Docker to Create a Powerful Team

Containers are an incredibly powerful technology that can provide you and/or your engineering team with huge productivity gains. Using containers, you can deploy, back up, replicate, and move apps and their dependencies quickly and easily.

 
LVL 48

Assisted Solution

by:Dale Fye (Access MVP)
Dale Fye (Access MVP) earned 60 total points
ID: 39997251
I generally maintain a second table (tbl_Uploads) which contains an AutoNumber field, the date of the upload, and the name of the file used to perform the upload.

Before I even allow my users to select the file to upload from, I check to see whether an upload has already been performed on that day, and provide a warning which the user can use to either exit the process or continue.

If they continue, I allow them to select the file to upload and then I check to see whether that particular file has been uploaded.  If so, I provide another warning message and allow them to select to exit the process or continue.  In that warning, I tell them that if they choose to continue, the previously uploaded data will be deleted.

When I actually perform the upload, I begin a transaction, write the info to the uploads table, capture the Upload_ID from that table, and write that Upload_ID along with all of the other data from the upload file into my database table.  By writing the Upload_ID to the main data table, it ensures that I can easily delete those records if need be.
0
 
LVL 12

Assisted Solution

by:Harish Varghese
Harish Varghese earned 120 total points
ID: 39997315
@fyed has provided all the steps in detail. And all this can be done in SSIS package, provided the package is executed manually by the user (i.e. the SSIS package is not executed automatically by a Job or a scheduler). You may use "Execute SQL Task" to perform any database query (checking if an upload has happened today, if the same file was uploaded already, save currently processed file info, moving data from staging table to actual table, etc.). You may use "Script Task" to get the name of the file to be processed, display any message box, etc. And a "Dataflow Task" to move data from input file to database table.

-Harish
0
 
LVL 27

Assisted Solution

by:skullnobrains
skullnobrains earned 60 total points
ID: 39997482
a modified @ste5an's solution

- create the index
- set IGNORE_DUP_KEY = ON in the table definition
- perform regular inserts : any row that already exists will be ignored

you can also insert into a temporary table and perform a MERGE statement
0

Featured Post

Comparison of Amazon Drive, Google Drive, OneDrive

What is Best for Backup: Amazon Drive, Google Drive or MS OneDrive? In this free whitepaper we look at their performance, pricing, and platform availability to help you decide which cloud drive is right for your situation. Download and read the results of our testing for free!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Why is this different from all of the other step by step guides?  Because I make a living as a DBA and not as a writer and I lived through this experience. Defining the name: When I talk to people they say different names on this subject stuff l…
A couple of weeks ago, my client requested me to implement a SSIS package that allows them to download their files from a FTP server and archives them. Microsoft SSIS is the powerful tool which allows us to proceed multiple files at same time even w…
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question