Solved

How would one write a set based query to replace a cursor to handle this situation?

Posted on 2013-12-17
9
406 Views
Last Modified: 2013-12-17
Given a flat file that looks like this:

AcctNum Date1 Value1 Date2 Value2 Date3 Value3 ...... through Value 13
This is a rolling data file supplied by a customer for the last 13 months of data. For example: in the August file Date1 would probably be something like 20130811, Date2 would be the previous month, 20130705, etc.

When we would get the September file, Date1 would be 20130915, Date2, 20130811, Date3 20130705. See the pattern?

What I am trying to do is create a small subset of Value records for each account/date. Example : AcctNum: 0001, TransDate 201308, Value 5555 AcctNum: 0001, TransDate 201309, Value 6666

I've been able to do this with a cursor, going through each Date/Value pair from the input file (13 times per record), determining if the AcctNum/TransDate already exists in our subset, if not, insert it.

Needless to say, this is extremely slow and we have at least 750K records per month to process. Is there any way to handle this without using a cursor?

I'm trying to be as concise as possible, but let me know if I can clarify anything.

Thank you!

sql-server-2012
0
Comment
Question by:ITMikeK
  • 5
  • 4
9 Comments
 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
Have you tried this using UNPIVOT, or another such approach like UNION SELECT AcctNum, DateX, ValueX?  Both avoid the cursor, so may reduce the overhead of the process.

P.S. How are you bringing in the flat file?  If it is via SSIS, you can use the unpivot transformation, so that the data is normalized in your staging table (before insert into production table).

SSIS transform: http://technet.microsoft.com/en-us/library/ms141723.aspx
T-SQL operator: http://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx
0
 

Author Comment

by:ITMikeK
Comment Utility
In other words I want to transform :

     AccountNum  Date1 Value1 Date2 Value2 Date3 Value3 .......Date13 Value3 into

     New Usage File

     Rec1: AccountNum
               Contents of Date1
               Contents of Value 1

     Rec2: AccountNum
               Contents of Date 2
               Contents of Value2

                 .................

Remember, every month Date/Value pairs will roll to the right with the newest months data in the Date 1/Value1 columns.  The previous Date1/Value1 pair will now be in Date2/Value2 with the previous information in Date13/Value13 "dropping off" (hence, a 13 month history)

There are 400K+ records in the flat file to per processed every month.  Also, if a new customer was added, we (might) have their entire 13 month history record added to the flat file.

Hope this helps.
0
 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
@ITMikeK,

I was editing my post when you were typing, so please re-read above.  I understood the movement of the data in my original post.  The point is it will be computationally more difficult if you are doing 13 operations versus flattening the data and performing one.

Hence, my recommendation is to look at UNPIVOT either in SSIS or T-SQL.
0
 

Author Comment

by:ITMikeK
Comment Utility
I do have access to SSIS for SQL Server 2012.  Any suggestions for best examples before I start googling everything?
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 59

Accepted Solution

by:
Kevin Cross earned 500 total points
Comment Utility
How are you currently bringing in the data?

Regardless of the import method, you can do one of the following once you have the data in a staging table.  

(1) Use the UNPIVOT relational operator:
/* 
normalize data via UNPIVOT of n date-value pairs.
use two subqueries/unpivots then JOIN to get date-value combo in one row.
*/
SELECT dte.accountnum, dte.the_date, val.the_value
FROM (

    SELECT accountnum, the_date
         , rn = ROW_NUMBER() OVER(PARTITION BY accountnum ORDER BY date_num)
    /* best to use a subquery selecting only accountnum and date columns. */
    FROM #YourImportedData
    UNPIVOT (
        the_date
        FOR date_num
        IN (date1, date2, date3)
    ) upvt

) dte
/* If you can have NULL values, you may want a LEFT JOIN here. */
JOIN (

    SELECT accountnum, the_value
         , rn = ROW_NUMBER() OVER(PARTITION BY accountnum ORDER BY value_num)
    /* best to use a subquery selecting only accountnum and value columns. */
    FROM #YourImportedData
    UNPIVOT (
        the_value
        FOR value_num
        IN (value1, value2, value3)
    ) upvt

) val ON val.accountnum = dte.accountnum AND val.rn = dte.rn
;

Open in new window


(2) Use a UNION:
/*
normalize date via UNION of n queries for separate date-value pairs.
*/
SELECT accountnum, the_date = date1, the_value = value1
FROM #YourImportedData
UNION
SELECT accountnum, the_date = date2, the_value = value2
FROM #YourImportedData
UNION
SELECT accountnum, the_date = date3, the_value = value3
FROM #YourImportedData
;

Open in new window


Starting with this sample data for example:
0001 20130811 5555 20130705 7897 20130604 4444 
0003 20130811 5565 20130705 7237 20130604 1114
0007 20130811 5755 20130705 1247 20130604 1235
0009 20130811 5995 20130705 7564 20130604 2356
0012 20130811 8555 20130705 7112 20130604 3344
0301 20130811 5765 20130705 9547 20130604 5566
2005 20130811 4325 20130705 1234 20130604 7788

Open in new window


The above queries will yield:
accountnum           the_date   the_value
-------------------- ---------- -----------
0001                 2013-08-11 5555
0001                 2013-07-05 7897
0001                 2013-06-04 4444
0003                 2013-08-11 5565
0003                 2013-07-05 7237
0003                 2013-06-04 1114
0007                 2013-08-11 5755
0007                 2013-07-05 1247
0007                 2013-06-04 1235
0009                 2013-08-11 5995
0009                 2013-07-05 7564
0009                 2013-06-04 2356
0012                 2013-08-11 8555
0012                 2013-07-05 7112
0012                 2013-06-04 3344
0301                 2013-08-11 5765
0301                 2013-07-05 9547
0301                 2013-06-04 5566
2005                 2013-08-11 4325
2005                 2013-07-05 1234
2005                 2013-06-04 7788

Open in new window


Therefore, you will have data in a format that more easily compares and INSERTs, assuming your production data is in the normalized format and not the more wide imported one.  With SQL 2012, you can use MERGE to handle the UPDATE or INSERT part.

If the target table is in the same format, I would think you only need to compare the account number and the first date.  In other words, when the existing record's first date is not the same as the first date in the file, you have a new record and will be replacing the entire row.  If nothing exists in target for the account number, you INSERT new row.

I hope that helps!

Kevin
0
 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
Correction to my note/code above.

If you can have NULL values, you may want a LEFT JOIN here.

Yes, LEFT JOIN would work, but the flaw is that ROW_NUMBER() will sequence the non-NULL values, making your matching invalid.  Hence, if you do have NULL values, you need to change more than the JOIN type.  Therefore, a better solution to get "rn" to match dates and values is...

rn = REPLACE(date_num, 'date', '') /* REPLACE(value_num, 'value', '') */

Open in new window


This works if the suffix behind 'date' and 'value' are the same by pair and not just numbers based on the order they appear.

Let me know if you have any questions.
0
 

Author Comment

by:ITMikeK
Comment Utility
Kevin,

   The UNION option worked great!  Now what were you saying about MERGE?  We are using 2012, but I haven't tried that yet.  The ultimate goal is to take next months file run this same UNION query and whatever doesn't exist in the first set from the new file, go ahead and insert it based on AccountNum and First_Date, like you mentioned.  Is there a quick way to complete this?
0
 

Author Comment

by:ITMikeK
Comment Utility
Actually, I looked it up and it is very easy to understand.  Thank you for your help!
0
 
LVL 59

Expert Comment

by:Kevin Cross
Comment Utility
You are most welcome!  Yes, I found MERGE easy to adopt.  

Best regards and happy coding,

Kevin
0

Featured Post

Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

Join & Write a Comment

There have been several questions about Large Transaction Log Files in SQL Server 2008, and how to get rid of them when disk space has become critical. This article will explain how to disable full recovery and implement simple recovery that carries…
Use this article to create a batch file to backup a Microsoft SQL Server database to a Windows folder.  The folder can be on the local hard drive or on a network share.  This batch file will query the SQL server to get the current date & time and wi…
It is a freely distributed piece of software for such tasks as photo retouching, image composition and image authoring. It works on many operating systems, in many languages.
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now