How would one write a set based query to replace a cursor to handle this situation?

Posted on 2013-12-17
Last Modified: 2013-12-17
Given a flat file that looks like this:

AcctNum Date1 Value1 Date2 Value2 Date3 Value3 ...... through Value 13
This is a rolling data file supplied by a customer for the last 13 months of data. For example: in the August file Date1 would probably be something like 20130811, Date2 would be the previous month, 20130705, etc.

When we would get the September file, Date1 would be 20130915, Date2, 20130811, Date3 20130705. See the pattern?

What I am trying to do is create a small subset of Value records for each account/date. Example : AcctNum: 0001, TransDate 201308, Value 5555 AcctNum: 0001, TransDate 201309, Value 6666

I've been able to do this with a cursor, going through each Date/Value pair from the input file (13 times per record), determining if the AcctNum/TransDate already exists in our subset, if not, insert it.

Needless to say, this is extremely slow and we have at least 750K records per month to process. Is there any way to handle this without using a cursor?

I'm trying to be as concise as possible, but let me know if I can clarify anything.

Thank you!

Question by:ITMikeK
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
LVL 60

Expert Comment

by:Kevin Cross
ID: 39724125
Have you tried this using UNPIVOT, or another such approach like UNION SELECT AcctNum, DateX, ValueX?  Both avoid the cursor, so may reduce the overhead of the process.

P.S. How are you bringing in the flat file?  If it is via SSIS, you can use the unpivot transformation, so that the data is normalized in your staging table (before insert into production table).

SSIS transform:
T-SQL operator:

Author Comment

ID: 39724132
In other words I want to transform :

     AccountNum  Date1 Value1 Date2 Value2 Date3 Value3 .......Date13 Value3 into

     New Usage File

     Rec1: AccountNum
               Contents of Date1
               Contents of Value 1

     Rec2: AccountNum
               Contents of Date 2
               Contents of Value2


Remember, every month Date/Value pairs will roll to the right with the newest months data in the Date 1/Value1 columns.  The previous Date1/Value1 pair will now be in Date2/Value2 with the previous information in Date13/Value13 "dropping off" (hence, a 13 month history)

There are 400K+ records in the flat file to per processed every month.  Also, if a new customer was added, we (might) have their entire 13 month history record added to the flat file.

Hope this helps.
LVL 60

Expert Comment

by:Kevin Cross
ID: 39724143

I was editing my post when you were typing, so please re-read above.  I understood the movement of the data in my original post.  The point is it will be computationally more difficult if you are doing 13 operations versus flattening the data and performing one.

Hence, my recommendation is to look at UNPIVOT either in SSIS or T-SQL.
Online Training Solution

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action. Forget about retraining and skyrocket knowledge retention rates.


Author Comment

ID: 39724253
I do have access to SSIS for SQL Server 2012.  Any suggestions for best examples before I start googling everything?
LVL 60

Accepted Solution

Kevin Cross earned 500 total points
ID: 39724315
How are you currently bringing in the data?

Regardless of the import method, you can do one of the following once you have the data in a staging table.  

(1) Use the UNPIVOT relational operator:
normalize data via UNPIVOT of n date-value pairs.
use two subqueries/unpivots then JOIN to get date-value combo in one row.
SELECT dte.accountnum, dte.the_date, val.the_value

    SELECT accountnum, the_date
         , rn = ROW_NUMBER() OVER(PARTITION BY accountnum ORDER BY date_num)
    /* best to use a subquery selecting only accountnum and date columns. */
    FROM #YourImportedData
        FOR date_num
        IN (date1, date2, date3)
    ) upvt

) dte
/* If you can have NULL values, you may want a LEFT JOIN here. */

    SELECT accountnum, the_value
         , rn = ROW_NUMBER() OVER(PARTITION BY accountnum ORDER BY value_num)
    /* best to use a subquery selecting only accountnum and value columns. */
    FROM #YourImportedData
        FOR value_num
        IN (value1, value2, value3)
    ) upvt

) val ON val.accountnum = dte.accountnum AND val.rn = dte.rn

Open in new window

(2) Use a UNION:
normalize date via UNION of n queries for separate date-value pairs.
SELECT accountnum, the_date = date1, the_value = value1
FROM #YourImportedData
SELECT accountnum, the_date = date2, the_value = value2
FROM #YourImportedData
SELECT accountnum, the_date = date3, the_value = value3
FROM #YourImportedData

Open in new window

Starting with this sample data for example:
0001 20130811 5555 20130705 7897 20130604 4444 
0003 20130811 5565 20130705 7237 20130604 1114
0007 20130811 5755 20130705 1247 20130604 1235
0009 20130811 5995 20130705 7564 20130604 2356
0012 20130811 8555 20130705 7112 20130604 3344
0301 20130811 5765 20130705 9547 20130604 5566
2005 20130811 4325 20130705 1234 20130604 7788

Open in new window

The above queries will yield:
accountnum           the_date   the_value
-------------------- ---------- -----------
0001                 2013-08-11 5555
0001                 2013-07-05 7897
0001                 2013-06-04 4444
0003                 2013-08-11 5565
0003                 2013-07-05 7237
0003                 2013-06-04 1114
0007                 2013-08-11 5755
0007                 2013-07-05 1247
0007                 2013-06-04 1235
0009                 2013-08-11 5995
0009                 2013-07-05 7564
0009                 2013-06-04 2356
0012                 2013-08-11 8555
0012                 2013-07-05 7112
0012                 2013-06-04 3344
0301                 2013-08-11 5765
0301                 2013-07-05 9547
0301                 2013-06-04 5566
2005                 2013-08-11 4325
2005                 2013-07-05 1234
2005                 2013-06-04 7788

Open in new window

Therefore, you will have data in a format that more easily compares and INSERTs, assuming your production data is in the normalized format and not the more wide imported one.  With SQL 2012, you can use MERGE to handle the UPDATE or INSERT part.

If the target table is in the same format, I would think you only need to compare the account number and the first date.  In other words, when the existing record's first date is not the same as the first date in the file, you have a new record and will be replacing the entire row.  If nothing exists in target for the account number, you INSERT new row.

I hope that helps!

LVL 60

Expert Comment

by:Kevin Cross
ID: 39724341
Correction to my note/code above.

If you can have NULL values, you may want a LEFT JOIN here.

Yes, LEFT JOIN would work, but the flaw is that ROW_NUMBER() will sequence the non-NULL values, making your matching invalid.  Hence, if you do have NULL values, you need to change more than the JOIN type.  Therefore, a better solution to get "rn" to match dates and values is...

rn = REPLACE(date_num, 'date', '') /* REPLACE(value_num, 'value', '') */

Open in new window

This works if the suffix behind 'date' and 'value' are the same by pair and not just numbers based on the order they appear.

Let me know if you have any questions.

Author Comment

ID: 39724551

   The UNION option worked great!  Now what were you saying about MERGE?  We are using 2012, but I haven't tried that yet.  The ultimate goal is to take next months file run this same UNION query and whatever doesn't exist in the first set from the new file, go ahead and insert it based on AccountNum and First_Date, like you mentioned.  Is there a quick way to complete this?

Author Comment

ID: 39724576
Actually, I looked it up and it is very easy to understand.  Thank you for your help!
LVL 60

Expert Comment

by:Kevin Cross
ID: 39724599
You are most welcome!  Yes, I found MERGE easy to adopt.  

Best regards and happy coding,


Featured Post

[Webinar] Learn How Hackers Steal Your Credentials

Do You Know How Hackers Steal Your Credentials? Join us and Skyport Systems to learn how hackers steal your credentials and why Active Directory must be secure to stop them. Thursday, July 13, 2017 10:00 A.M. PDT

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Audit has been really one of the more interesting, most useful, yet difficult to maintain topics in the history of SQL Server. In earlier versions of SQL people had very few options for auditing in SQL Server. It typically meant using SQL Trace …
     When we have to pass multiple rows of data to SQL Server, the developers either have to send one row at a time or come up with other workarounds to meet requirements like using XML to pass data, which is complex and tedious to use. There is a …
Monitoring a network: why having a policy is the best policy? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the enormous benefits of having a policy-based approach when monitoring medium and large networks. Software utilized in this v…
Sometimes it takes a new vantage point, apart from our everyday security practices, to truly see our Active Directory (AD) vulnerabilities. We get used to implementing the same techniques and checking the same areas for a breach. This pattern can re…

630 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question