wdbates
asked on
cvs Flat File Source to OLE DB Destination SSIS
Dear Experts;
My brain just ran out of go juice, but I need to query a .csv file to return only DISTINCT records into a SQL 2008 table using SSIS. I'm cool on using the Data Flow Task to pull all of the records from the .csv file, but can I just query only the DISTINCT records form the .csv flat file? I have included a screen shot of the Flat File Source Editor and the OLE DB Destination Editor.
FlatandOLE.docx
My brain just ran out of go juice, but I need to query a .csv file to return only DISTINCT records into a SQL 2008 table using SSIS. I'm cool on using the Data Flow Task to pull all of the records from the .csv file, but can I just query only the DISTINCT records form the .csv flat file? I have included a screen shot of the Flat File Source Editor and the OLE DB Destination Editor.
FlatandOLE.docx
ASKER
Hello Iochan;
This just one file of many very large files. Presently I am loading the .csv files into a staging table as seen in the attachments. After the staging files are loaded I run an editing processing checking for errors, etc. and then I use MERGE to UPDATE or INSERT the record into the final table. The client performs very little checking and is known for sending duplicate records. I thought if I just removed them even prior to the staging table that would save some processing time.
This just one file of many very large files. Presently I am loading the .csv files into a staging table as seen in the attachments. After the staging files are loaded I run an editing processing checking for errors, etc. and then I use MERGE to UPDATE or INSERT the record into the final table. The client performs very little checking and is known for sending duplicate records. I thought if I just removed them even prior to the staging table that would save some processing time.
As far as I'm not aware of anything like that to exists - I mean eliminating duplicates during insert while reading the CSV file you may need and intermediate staging where to put the current csv and select from there the DISTINCT records to be inserted into the Staging or...if you think you can add a PK on the Staging to ignore the duplicates (although this may be costly if you have many columns as part of the PK) IGNORE_DUP_KEY = ON may help you and please see code sample below:
CREATE TABLE dbo.foo (col1 int,col2 sysname PRIMARY KEY WITH (FILLFACTOR=90, IGNORE_DUP_KEY = ON))
GO
INSERT dbo.foo VALUES (1,'Fname')
GO
INSERT dbo.foo VALUES (1,'Fname')
GO
INSERT dbo.foo VALUES (1,'Fname')
GO
--gives only
(1 row(s) affected)
Duplicate key was ignored.
(0 row(s) affected)
Duplicate key was ignored.
(0 row(s) affected)
CREATE TABLE dbo.foo (col1 int,col2 sysname PRIMARY KEY WITH (FILLFACTOR=90, IGNORE_DUP_KEY = ON))
GO
INSERT dbo.foo VALUES (1,'Fname')
GO
INSERT dbo.foo VALUES (1,'Fname')
GO
INSERT dbo.foo VALUES (1,'Fname')
GO
--gives only
(1 row(s) affected)
Duplicate key was ignored.
(0 row(s) affected)
Duplicate key was ignored.
(0 row(s) affected)
By Distinct records, do you mean no duplicate STUD_ID's to be entered into the database? If so, use a lookup Lookup Component to check for STUD_ID before importing into the DB.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Dear Vikas Garg;
Your solution was great and thank you for the screen shot. I forgot all about the Sort Transformation.
Your solution was great and thank you for the screen shot. I forgot all about the Sort Transformation.
--IMPORT
CREATE TABLE Sample-Output.csv
(ID INT,
FirstName VARCHAR(40),
LastName VARCHAR(40),
BirthDate SMALLDATETIME)
GO
--Create CSV file in drive C: with name Sample-Output.csv.txt with following content. The location of the file is C:\Sample-Output.csv.txt
1,James,Smith,19750101
2,Meggie,Smith,19790122
3,Robert,Smith,20071101
4,Alex,Smith,20040202
--Now run following script to load all the data from CSV to database table. If there is any error in any row it will be not inserted but other rows will be inserted.
BULK
INSERT Sample-Output.csv
FROM 'c:\Sample-Output.csv.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
--Check the content of the table.
SELECT DISTINCT * --or your own criteria for DISTINCT here
FROM Sample-Output.csv
GO