Solved

How would one write a set based query to replace a cursor to handle this situation?

Posted on 2013-12-17
9
411 Views
Last Modified: 2013-12-17
Given a flat file that looks like this:

AcctNum Date1 Value1 Date2 Value2 Date3 Value3 ...... through Value 13
This is a rolling data file supplied by a customer for the last 13 months of data. For example: in the August file Date1 would probably be something like 20130811, Date2 would be the previous month, 20130705, etc.

When we would get the September file, Date1 would be 20130915, Date2, 20130811, Date3 20130705. See the pattern?

What I am trying to do is create a small subset of Value records for each account/date. Example : AcctNum: 0001, TransDate 201308, Value 5555 AcctNum: 0001, TransDate 201309, Value 6666

I've been able to do this with a cursor, going through each Date/Value pair from the input file (13 times per record), determining if the AcctNum/TransDate already exists in our subset, if not, insert it.

Needless to say, this is extremely slow and we have at least 750K records per month to process. Is there any way to handle this without using a cursor?

I'm trying to be as concise as possible, but let me know if I can clarify anything.

Thank you!

sql-server-2012
0
Comment
Question by:ITMikeK
  • 5
  • 4
9 Comments
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 39724125
Have you tried this using UNPIVOT, or another such approach like UNION SELECT AcctNum, DateX, ValueX?  Both avoid the cursor, so may reduce the overhead of the process.

P.S. How are you bringing in the flat file?  If it is via SSIS, you can use the unpivot transformation, so that the data is normalized in your staging table (before insert into production table).

SSIS transform: http://technet.microsoft.com/en-us/library/ms141723.aspx
T-SQL operator: http://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx
0
 

Author Comment

by:ITMikeK
ID: 39724132
In other words I want to transform :

     AccountNum  Date1 Value1 Date2 Value2 Date3 Value3 .......Date13 Value3 into

     New Usage File

     Rec1: AccountNum
               Contents of Date1
               Contents of Value 1

     Rec2: AccountNum
               Contents of Date 2
               Contents of Value2

                 .................

Remember, every month Date/Value pairs will roll to the right with the newest months data in the Date 1/Value1 columns.  The previous Date1/Value1 pair will now be in Date2/Value2 with the previous information in Date13/Value13 "dropping off" (hence, a 13 month history)

There are 400K+ records in the flat file to per processed every month.  Also, if a new customer was added, we (might) have their entire 13 month history record added to the flat file.

Hope this helps.
0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 39724143
@ITMikeK,

I was editing my post when you were typing, so please re-read above.  I understood the movement of the data in my original post.  The point is it will be computationally more difficult if you are doing 13 operations versus flattening the data and performing one.

Hence, my recommendation is to look at UNPIVOT either in SSIS or T-SQL.
0
Three Reasons Why Backup is Strategic

Backup is strategic to your business because your data is strategic to your business. Without backup, your business will fail. This white paper explains why it is vital for you to design and immediately execute a backup strategy to protect 100 percent of your data.

 

Author Comment

by:ITMikeK
ID: 39724253
I do have access to SSIS for SQL Server 2012.  Any suggestions for best examples before I start googling everything?
0
 
LVL 59

Accepted Solution

by:
Kevin Cross earned 500 total points
ID: 39724315
How are you currently bringing in the data?

Regardless of the import method, you can do one of the following once you have the data in a staging table.  

(1) Use the UNPIVOT relational operator:
/* 
normalize data via UNPIVOT of n date-value pairs.
use two subqueries/unpivots then JOIN to get date-value combo in one row.
*/
SELECT dte.accountnum, dte.the_date, val.the_value
FROM (

    SELECT accountnum, the_date
         , rn = ROW_NUMBER() OVER(PARTITION BY accountnum ORDER BY date_num)
    /* best to use a subquery selecting only accountnum and date columns. */
    FROM #YourImportedData
    UNPIVOT (
        the_date
        FOR date_num
        IN (date1, date2, date3)
    ) upvt

) dte
/* If you can have NULL values, you may want a LEFT JOIN here. */
JOIN (

    SELECT accountnum, the_value
         , rn = ROW_NUMBER() OVER(PARTITION BY accountnum ORDER BY value_num)
    /* best to use a subquery selecting only accountnum and value columns. */
    FROM #YourImportedData
    UNPIVOT (
        the_value
        FOR value_num
        IN (value1, value2, value3)
    ) upvt

) val ON val.accountnum = dte.accountnum AND val.rn = dte.rn
;

Open in new window


(2) Use a UNION:
/*
normalize date via UNION of n queries for separate date-value pairs.
*/
SELECT accountnum, the_date = date1, the_value = value1
FROM #YourImportedData
UNION
SELECT accountnum, the_date = date2, the_value = value2
FROM #YourImportedData
UNION
SELECT accountnum, the_date = date3, the_value = value3
FROM #YourImportedData
;

Open in new window


Starting with this sample data for example:
0001 20130811 5555 20130705 7897 20130604 4444 
0003 20130811 5565 20130705 7237 20130604 1114
0007 20130811 5755 20130705 1247 20130604 1235
0009 20130811 5995 20130705 7564 20130604 2356
0012 20130811 8555 20130705 7112 20130604 3344
0301 20130811 5765 20130705 9547 20130604 5566
2005 20130811 4325 20130705 1234 20130604 7788

Open in new window


The above queries will yield:
accountnum           the_date   the_value
-------------------- ---------- -----------
0001                 2013-08-11 5555
0001                 2013-07-05 7897
0001                 2013-06-04 4444
0003                 2013-08-11 5565
0003                 2013-07-05 7237
0003                 2013-06-04 1114
0007                 2013-08-11 5755
0007                 2013-07-05 1247
0007                 2013-06-04 1235
0009                 2013-08-11 5995
0009                 2013-07-05 7564
0009                 2013-06-04 2356
0012                 2013-08-11 8555
0012                 2013-07-05 7112
0012                 2013-06-04 3344
0301                 2013-08-11 5765
0301                 2013-07-05 9547
0301                 2013-06-04 5566
2005                 2013-08-11 4325
2005                 2013-07-05 1234
2005                 2013-06-04 7788

Open in new window


Therefore, you will have data in a format that more easily compares and INSERTs, assuming your production data is in the normalized format and not the more wide imported one.  With SQL 2012, you can use MERGE to handle the UPDATE or INSERT part.

If the target table is in the same format, I would think you only need to compare the account number and the first date.  In other words, when the existing record's first date is not the same as the first date in the file, you have a new record and will be replacing the entire row.  If nothing exists in target for the account number, you INSERT new row.

I hope that helps!

Kevin
0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 39724341
Correction to my note/code above.

If you can have NULL values, you may want a LEFT JOIN here.

Yes, LEFT JOIN would work, but the flaw is that ROW_NUMBER() will sequence the non-NULL values, making your matching invalid.  Hence, if you do have NULL values, you need to change more than the JOIN type.  Therefore, a better solution to get "rn" to match dates and values is...

rn = REPLACE(date_num, 'date', '') /* REPLACE(value_num, 'value', '') */

Open in new window


This works if the suffix behind 'date' and 'value' are the same by pair and not just numbers based on the order they appear.

Let me know if you have any questions.
0
 

Author Comment

by:ITMikeK
ID: 39724551
Kevin,

   The UNION option worked great!  Now what were you saying about MERGE?  We are using 2012, but I haven't tried that yet.  The ultimate goal is to take next months file run this same UNION query and whatever doesn't exist in the first set from the new file, go ahead and insert it based on AccountNum and First_Date, like you mentioned.  Is there a quick way to complete this?
0
 

Author Comment

by:ITMikeK
ID: 39724576
Actually, I looked it up and it is very easy to understand.  Thank you for your help!
0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 39724599
You are most welcome!  Yes, I found MERGE easy to adopt.  

Best regards and happy coding,

Kevin
0

Featured Post

Ransomware-A Revenue Bonanza for Service Providers

Ransomware – malware that gets on your customers’ computers, encrypts their data, and extorts a hefty ransom for the decryption keys – is a surging new threat.  The purpose of this eBook is to educate the reader about ransomware attacks.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

There have been several questions about Large Transaction Log Files in SQL Server 2008, and how to get rid of them when disk space has become critical. This article will explain how to disable full recovery and implement simple recovery that carries…
In this article I will describe the Backup & Restore method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
This tutorial gives a high-level tour of the interface of Marketo (a marketing automation tool to help businesses track and engage prospective customers and drive them to purchase). You will see the main areas including Marketing Activities, Design …
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question