ITMikeK
asked on
How would one write a set based query to replace a cursor to handle this situation?
Given a flat file that looks like this:
AcctNum Date1 Value1 Date2 Value2 Date3 Value3 ...... through Value 13
This is a rolling data file supplied by a customer for the last 13 months of data. For example: in the August file Date1 would probably be something like 20130811, Date2 would be the previous month, 20130705, etc.
When we would get the September file, Date1 would be 20130915, Date2, 20130811, Date3 20130705. See the pattern?
What I am trying to do is create a small subset of Value records for each account/date. Example : AcctNum: 0001, TransDate 201308, Value 5555 AcctNum: 0001, TransDate 201309, Value 6666
I've been able to do this with a cursor, going through each Date/Value pair from the input file (13 times per record), determining if the AcctNum/TransDate already exists in our subset, if not, insert it.
Needless to say, this is extremely slow and we have at least 750K records per month to process. Is there any way to handle this without using a cursor?
I'm trying to be as concise as possible, but let me know if I can clarify anything.
Thank you!
sql-server-2012
AcctNum Date1 Value1 Date2 Value2 Date3 Value3 ...... through Value 13
This is a rolling data file supplied by a customer for the last 13 months of data. For example: in the August file Date1 would probably be something like 20130811, Date2 would be the previous month, 20130705, etc.
When we would get the September file, Date1 would be 20130915, Date2, 20130811, Date3 20130705. See the pattern?
What I am trying to do is create a small subset of Value records for each account/date. Example : AcctNum: 0001, TransDate 201308, Value 5555 AcctNum: 0001, TransDate 201309, Value 6666
I've been able to do this with a cursor, going through each Date/Value pair from the input file (13 times per record), determining if the AcctNum/TransDate already exists in our subset, if not, insert it.
Needless to say, this is extremely slow and we have at least 750K records per month to process. Is there any way to handle this without using a cursor?
I'm trying to be as concise as possible, but let me know if I can clarify anything.
Thank you!
sql-server-2012
ASKER
In other words I want to transform :
AccountNum Date1 Value1 Date2 Value2 Date3 Value3 .......Date13 Value3 into
New Usage File
Rec1: AccountNum
Contents of Date1
Contents of Value 1
Rec2: AccountNum
Contents of Date 2
Contents of Value2
.................
Remember, every month Date/Value pairs will roll to the right with the newest months data in the Date 1/Value1 columns. The previous Date1/Value1 pair will now be in Date2/Value2 with the previous information in Date13/Value13 "dropping off" (hence, a 13 month history)
There are 400K+ records in the flat file to per processed every month. Also, if a new customer was added, we (might) have their entire 13 month history record added to the flat file.
Hope this helps.
AccountNum Date1 Value1 Date2 Value2 Date3 Value3 .......Date13 Value3 into
New Usage File
Rec1: AccountNum
Contents of Date1
Contents of Value 1
Rec2: AccountNum
Contents of Date 2
Contents of Value2
.................
Remember, every month Date/Value pairs will roll to the right with the newest months data in the Date 1/Value1 columns. The previous Date1/Value1 pair will now be in Date2/Value2 with the previous information in Date13/Value13 "dropping off" (hence, a 13 month history)
There are 400K+ records in the flat file to per processed every month. Also, if a new customer was added, we (might) have their entire 13 month history record added to the flat file.
Hope this helps.
@ITMikeK,
I was editing my post when you were typing, so please re-read above. I understood the movement of the data in my original post. The point is it will be computationally more difficult if you are doing 13 operations versus flattening the data and performing one.
Hence, my recommendation is to look at UNPIVOT either in SSIS or T-SQL.
I was editing my post when you were typing, so please re-read above. I understood the movement of the data in my original post. The point is it will be computationally more difficult if you are doing 13 operations versus flattening the data and performing one.
Hence, my recommendation is to look at UNPIVOT either in SSIS or T-SQL.
ASKER
I do have access to SSIS for SQL Server 2012. Any suggestions for best examples before I start googling everything?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Correction to my note/code above.
Yes, LEFT JOIN would work, but the flaw is that ROW_NUMBER() will sequence the non-NULL values, making your matching invalid. Hence, if you do have NULL values, you need to change more than the JOIN type. Therefore, a better solution to get "rn" to match dates and values is...
This works if the suffix behind 'date' and 'value' are the same by pair and not just numbers based on the order they appear.
Let me know if you have any questions.
If you can have NULL values, you may want a LEFT JOIN here.
Yes, LEFT JOIN would work, but the flaw is that ROW_NUMBER() will sequence the non-NULL values, making your matching invalid. Hence, if you do have NULL values, you need to change more than the JOIN type. Therefore, a better solution to get "rn" to match dates and values is...
rn = REPLACE(date_num, 'date', '') /* REPLACE(value_num, 'value', '') */
This works if the suffix behind 'date' and 'value' are the same by pair and not just numbers based on the order they appear.
Let me know if you have any questions.
ASKER
Kevin,
The UNION option worked great! Now what were you saying about MERGE? We are using 2012, but I haven't tried that yet. The ultimate goal is to take next months file run this same UNION query and whatever doesn't exist in the first set from the new file, go ahead and insert it based on AccountNum and First_Date, like you mentioned. Is there a quick way to complete this?
The UNION option worked great! Now what were you saying about MERGE? We are using 2012, but I haven't tried that yet. The ultimate goal is to take next months file run this same UNION query and whatever doesn't exist in the first set from the new file, go ahead and insert it based on AccountNum and First_Date, like you mentioned. Is there a quick way to complete this?
ASKER
Actually, I looked it up and it is very easy to understand. Thank you for your help!
You are most welcome! Yes, I found MERGE easy to adopt.
Best regards and happy coding,
Kevin
Best regards and happy coding,
Kevin
P.S. How are you bringing in the flat file? If it is via SSIS, you can use the unpivot transformation, so that the data is normalized in your staging table (before insert into production table).
SSIS transform: http://technet.microsoft.com/en-us/library/ms141723.aspx
T-SQL operator: http://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx