How would one write a set based query to replace a cursor to handle this situation?

Posted on 2013-12-17
Last Modified: 2013-12-17
Given a flat file that looks like this:

AcctNum Date1 Value1 Date2 Value2 Date3 Value3 ...... through Value 13
This is a rolling data file supplied by a customer for the last 13 months of data. For example: in the August file Date1 would probably be something like 20130811, Date2 would be the previous month, 20130705, etc.

When we would get the September file, Date1 would be 20130915, Date2, 20130811, Date3 20130705. See the pattern?

What I am trying to do is create a small subset of Value records for each account/date. Example : AcctNum: 0001, TransDate 201308, Value 5555 AcctNum: 0001, TransDate 201309, Value 6666

I've been able to do this with a cursor, going through each Date/Value pair from the input file (13 times per record), determining if the AcctNum/TransDate already exists in our subset, if not, insert it.

Needless to say, this is extremely slow and we have at least 750K records per month to process. Is there any way to handle this without using a cursor?

I'm trying to be as concise as possible, but let me know if I can clarify anything.

Thank you!

Question by:ITMikeK
  • 5
  • 4
LVL 59

Expert Comment

by:Kevin Cross
ID: 39724125
Have you tried this using UNPIVOT, or another such approach like UNION SELECT AcctNum, DateX, ValueX?  Both avoid the cursor, so may reduce the overhead of the process.

P.S. How are you bringing in the flat file?  If it is via SSIS, you can use the unpivot transformation, so that the data is normalized in your staging table (before insert into production table).

SSIS transform:
T-SQL operator:

Author Comment

ID: 39724132
In other words I want to transform :

     AccountNum  Date1 Value1 Date2 Value2 Date3 Value3 .......Date13 Value3 into

     New Usage File

     Rec1: AccountNum
               Contents of Date1
               Contents of Value 1

     Rec2: AccountNum
               Contents of Date 2
               Contents of Value2


Remember, every month Date/Value pairs will roll to the right with the newest months data in the Date 1/Value1 columns.  The previous Date1/Value1 pair will now be in Date2/Value2 with the previous information in Date13/Value13 "dropping off" (hence, a 13 month history)

There are 400K+ records in the flat file to per processed every month.  Also, if a new customer was added, we (might) have their entire 13 month history record added to the flat file.

Hope this helps.
LVL 59

Expert Comment

by:Kevin Cross
ID: 39724143

I was editing my post when you were typing, so please re-read above.  I understood the movement of the data in my original post.  The point is it will be computationally more difficult if you are doing 13 operations versus flattening the data and performing one.

Hence, my recommendation is to look at UNPIVOT either in SSIS or T-SQL.
Simplifying Server Workload Migrations

This use case outlines the migration challenges that organizations face and how the Acronis AnyData Engine supports physical-to-physical (P2P), physical-to-virtual (P2V), virtual to physical (V2P), and cross-virtual (V2V) migration scenarios to address these challenges.


Author Comment

ID: 39724253
I do have access to SSIS for SQL Server 2012.  Any suggestions for best examples before I start googling everything?
LVL 59

Accepted Solution

Kevin Cross earned 500 total points
ID: 39724315
How are you currently bringing in the data?

Regardless of the import method, you can do one of the following once you have the data in a staging table.  

(1) Use the UNPIVOT relational operator:
normalize data via UNPIVOT of n date-value pairs.
use two subqueries/unpivots then JOIN to get date-value combo in one row.
SELECT dte.accountnum, dte.the_date, val.the_value

    SELECT accountnum, the_date
         , rn = ROW_NUMBER() OVER(PARTITION BY accountnum ORDER BY date_num)
    /* best to use a subquery selecting only accountnum and date columns. */
    FROM #YourImportedData
        FOR date_num
        IN (date1, date2, date3)
    ) upvt

) dte
/* If you can have NULL values, you may want a LEFT JOIN here. */

    SELECT accountnum, the_value
         , rn = ROW_NUMBER() OVER(PARTITION BY accountnum ORDER BY value_num)
    /* best to use a subquery selecting only accountnum and value columns. */
    FROM #YourImportedData
        FOR value_num
        IN (value1, value2, value3)
    ) upvt

) val ON val.accountnum = dte.accountnum AND val.rn = dte.rn

Open in new window

(2) Use a UNION:
normalize date via UNION of n queries for separate date-value pairs.
SELECT accountnum, the_date = date1, the_value = value1
FROM #YourImportedData
SELECT accountnum, the_date = date2, the_value = value2
FROM #YourImportedData
SELECT accountnum, the_date = date3, the_value = value3
FROM #YourImportedData

Open in new window

Starting with this sample data for example:
0001 20130811 5555 20130705 7897 20130604 4444 
0003 20130811 5565 20130705 7237 20130604 1114
0007 20130811 5755 20130705 1247 20130604 1235
0009 20130811 5995 20130705 7564 20130604 2356
0012 20130811 8555 20130705 7112 20130604 3344
0301 20130811 5765 20130705 9547 20130604 5566
2005 20130811 4325 20130705 1234 20130604 7788

Open in new window

The above queries will yield:
accountnum           the_date   the_value
-------------------- ---------- -----------
0001                 2013-08-11 5555
0001                 2013-07-05 7897
0001                 2013-06-04 4444
0003                 2013-08-11 5565
0003                 2013-07-05 7237
0003                 2013-06-04 1114
0007                 2013-08-11 5755
0007                 2013-07-05 1247
0007                 2013-06-04 1235
0009                 2013-08-11 5995
0009                 2013-07-05 7564
0009                 2013-06-04 2356
0012                 2013-08-11 8555
0012                 2013-07-05 7112
0012                 2013-06-04 3344
0301                 2013-08-11 5765
0301                 2013-07-05 9547
0301                 2013-06-04 5566
2005                 2013-08-11 4325
2005                 2013-07-05 1234
2005                 2013-06-04 7788

Open in new window

Therefore, you will have data in a format that more easily compares and INSERTs, assuming your production data is in the normalized format and not the more wide imported one.  With SQL 2012, you can use MERGE to handle the UPDATE or INSERT part.

If the target table is in the same format, I would think you only need to compare the account number and the first date.  In other words, when the existing record's first date is not the same as the first date in the file, you have a new record and will be replacing the entire row.  If nothing exists in target for the account number, you INSERT new row.

I hope that helps!

LVL 59

Expert Comment

by:Kevin Cross
ID: 39724341
Correction to my note/code above.

If you can have NULL values, you may want a LEFT JOIN here.

Yes, LEFT JOIN would work, but the flaw is that ROW_NUMBER() will sequence the non-NULL values, making your matching invalid.  Hence, if you do have NULL values, you need to change more than the JOIN type.  Therefore, a better solution to get "rn" to match dates and values is...

rn = REPLACE(date_num, 'date', '') /* REPLACE(value_num, 'value', '') */

Open in new window

This works if the suffix behind 'date' and 'value' are the same by pair and not just numbers based on the order they appear.

Let me know if you have any questions.

Author Comment

ID: 39724551

   The UNION option worked great!  Now what were you saying about MERGE?  We are using 2012, but I haven't tried that yet.  The ultimate goal is to take next months file run this same UNION query and whatever doesn't exist in the first set from the new file, go ahead and insert it based on AccountNum and First_Date, like you mentioned.  Is there a quick way to complete this?

Author Comment

ID: 39724576
Actually, I looked it up and it is very easy to understand.  Thank you for your help!
LVL 59

Expert Comment

by:Kevin Cross
ID: 39724599
You are most welcome!  Yes, I found MERGE easy to adopt.  

Best regards and happy coding,


Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
sql 2008 how to table join 2 31
Are triggers slow? 7 23
Disable TLS1.0 on Win 2012 server 7 57
Substring works but need to tweak it 14 31
Hi all, It is important and often overlooked to understand “Database properties”. Often we see questions about "log files" or "where is the database" and one of the easiest ways to get general information about your database is to use “Database p…
SQL Server engine let you use a Windows account or a SQL Server account to connect to a SQL Server instance. This can be configured immediatly during the SQL Server installation or after in the Server Authentication section in the Server properties …
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question