Solved

How would one write a set based query to replace a cursor to handle this situation?

Posted on 2013-12-17
9
417 Views
Last Modified: 2013-12-17
Given a flat file that looks like this:

AcctNum Date1 Value1 Date2 Value2 Date3 Value3 ...... through Value 13
This is a rolling data file supplied by a customer for the last 13 months of data. For example: in the August file Date1 would probably be something like 20130811, Date2 would be the previous month, 20130705, etc.

When we would get the September file, Date1 would be 20130915, Date2, 20130811, Date3 20130705. See the pattern?

What I am trying to do is create a small subset of Value records for each account/date. Example : AcctNum: 0001, TransDate 201308, Value 5555 AcctNum: 0001, TransDate 201309, Value 6666

I've been able to do this with a cursor, going through each Date/Value pair from the input file (13 times per record), determining if the AcctNum/TransDate already exists in our subset, if not, insert it.

Needless to say, this is extremely slow and we have at least 750K records per month to process. Is there any way to handle this without using a cursor?

I'm trying to be as concise as possible, but let me know if I can clarify anything.

Thank you!

sql-server-2012
0
Comment
Question by:ITMikeK
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
9 Comments
 
LVL 60

Expert Comment

by:Kevin Cross
ID: 39724125
Have you tried this using UNPIVOT, or another such approach like UNION SELECT AcctNum, DateX, ValueX?  Both avoid the cursor, so may reduce the overhead of the process.

P.S. How are you bringing in the flat file?  If it is via SSIS, you can use the unpivot transformation, so that the data is normalized in your staging table (before insert into production table).

SSIS transform: http://technet.microsoft.com/en-us/library/ms141723.aspx
T-SQL operator: http://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx
0
 

Author Comment

by:ITMikeK
ID: 39724132
In other words I want to transform :

     AccountNum  Date1 Value1 Date2 Value2 Date3 Value3 .......Date13 Value3 into

     New Usage File

     Rec1: AccountNum
               Contents of Date1
               Contents of Value 1

     Rec2: AccountNum
               Contents of Date 2
               Contents of Value2

                 .................

Remember, every month Date/Value pairs will roll to the right with the newest months data in the Date 1/Value1 columns.  The previous Date1/Value1 pair will now be in Date2/Value2 with the previous information in Date13/Value13 "dropping off" (hence, a 13 month history)

There are 400K+ records in the flat file to per processed every month.  Also, if a new customer was added, we (might) have their entire 13 month history record added to the flat file.

Hope this helps.
0
 
LVL 60

Expert Comment

by:Kevin Cross
ID: 39724143
@ITMikeK,

I was editing my post when you were typing, so please re-read above.  I understood the movement of the data in my original post.  The point is it will be computationally more difficult if you are doing 13 operations versus flattening the data and performing one.

Hence, my recommendation is to look at UNPIVOT either in SSIS or T-SQL.
0
Edgartown IT Case Study

Learn about Edgartown's quest to ensure the safety and security of the entire town's employee and citizen data. Read the case study!

 

Author Comment

by:ITMikeK
ID: 39724253
I do have access to SSIS for SQL Server 2012.  Any suggestions for best examples before I start googling everything?
0
 
LVL 60

Accepted Solution

by:
Kevin Cross earned 500 total points
ID: 39724315
How are you currently bringing in the data?

Regardless of the import method, you can do one of the following once you have the data in a staging table.  

(1) Use the UNPIVOT relational operator:
/* 
normalize data via UNPIVOT of n date-value pairs.
use two subqueries/unpivots then JOIN to get date-value combo in one row.
*/
SELECT dte.accountnum, dte.the_date, val.the_value
FROM (

    SELECT accountnum, the_date
         , rn = ROW_NUMBER() OVER(PARTITION BY accountnum ORDER BY date_num)
    /* best to use a subquery selecting only accountnum and date columns. */
    FROM #YourImportedData
    UNPIVOT (
        the_date
        FOR date_num
        IN (date1, date2, date3)
    ) upvt

) dte
/* If you can have NULL values, you may want a LEFT JOIN here. */
JOIN (

    SELECT accountnum, the_value
         , rn = ROW_NUMBER() OVER(PARTITION BY accountnum ORDER BY value_num)
    /* best to use a subquery selecting only accountnum and value columns. */
    FROM #YourImportedData
    UNPIVOT (
        the_value
        FOR value_num
        IN (value1, value2, value3)
    ) upvt

) val ON val.accountnum = dte.accountnum AND val.rn = dte.rn
;

Open in new window


(2) Use a UNION:
/*
normalize date via UNION of n queries for separate date-value pairs.
*/
SELECT accountnum, the_date = date1, the_value = value1
FROM #YourImportedData
UNION
SELECT accountnum, the_date = date2, the_value = value2
FROM #YourImportedData
UNION
SELECT accountnum, the_date = date3, the_value = value3
FROM #YourImportedData
;

Open in new window


Starting with this sample data for example:
0001 20130811 5555 20130705 7897 20130604 4444 
0003 20130811 5565 20130705 7237 20130604 1114
0007 20130811 5755 20130705 1247 20130604 1235
0009 20130811 5995 20130705 7564 20130604 2356
0012 20130811 8555 20130705 7112 20130604 3344
0301 20130811 5765 20130705 9547 20130604 5566
2005 20130811 4325 20130705 1234 20130604 7788

Open in new window


The above queries will yield:
accountnum           the_date   the_value
-------------------- ---------- -----------
0001                 2013-08-11 5555
0001                 2013-07-05 7897
0001                 2013-06-04 4444
0003                 2013-08-11 5565
0003                 2013-07-05 7237
0003                 2013-06-04 1114
0007                 2013-08-11 5755
0007                 2013-07-05 1247
0007                 2013-06-04 1235
0009                 2013-08-11 5995
0009                 2013-07-05 7564
0009                 2013-06-04 2356
0012                 2013-08-11 8555
0012                 2013-07-05 7112
0012                 2013-06-04 3344
0301                 2013-08-11 5765
0301                 2013-07-05 9547
0301                 2013-06-04 5566
2005                 2013-08-11 4325
2005                 2013-07-05 1234
2005                 2013-06-04 7788

Open in new window


Therefore, you will have data in a format that more easily compares and INSERTs, assuming your production data is in the normalized format and not the more wide imported one.  With SQL 2012, you can use MERGE to handle the UPDATE or INSERT part.

If the target table is in the same format, I would think you only need to compare the account number and the first date.  In other words, when the existing record's first date is not the same as the first date in the file, you have a new record and will be replacing the entire row.  If nothing exists in target for the account number, you INSERT new row.

I hope that helps!

Kevin
0
 
LVL 60

Expert Comment

by:Kevin Cross
ID: 39724341
Correction to my note/code above.

If you can have NULL values, you may want a LEFT JOIN here.

Yes, LEFT JOIN would work, but the flaw is that ROW_NUMBER() will sequence the non-NULL values, making your matching invalid.  Hence, if you do have NULL values, you need to change more than the JOIN type.  Therefore, a better solution to get "rn" to match dates and values is...

rn = REPLACE(date_num, 'date', '') /* REPLACE(value_num, 'value', '') */

Open in new window


This works if the suffix behind 'date' and 'value' are the same by pair and not just numbers based on the order they appear.

Let me know if you have any questions.
0
 

Author Comment

by:ITMikeK
ID: 39724551
Kevin,

   The UNION option worked great!  Now what were you saying about MERGE?  We are using 2012, but I haven't tried that yet.  The ultimate goal is to take next months file run this same UNION query and whatever doesn't exist in the first set from the new file, go ahead and insert it based on AccountNum and First_Date, like you mentioned.  Is there a quick way to complete this?
0
 

Author Comment

by:ITMikeK
ID: 39724576
Actually, I looked it up and it is very easy to understand.  Thank you for your help!
0
 
LVL 60

Expert Comment

by:Kevin Cross
ID: 39724599
You are most welcome!  Yes, I found MERGE easy to adopt.  

Best regards and happy coding,

Kevin
0

Featured Post

MS Dynamics Made Instantly Simpler

Make Your Microsoft Dynamics Investment Count  & Drastically Decrease Training Time by Providing Intuitive Step-By-Step WalkThru Tutorials.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

SQL Server engine let you use a Windows account or a SQL Server account to connect to a SQL Server instance. This can be configured immediatly during the SQL Server installation or after in the Server Authentication section in the Server properties …
In this article I will describe the Backup & Restore method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question