Solved

Improve speed of cursor

Posted on 2016-07-21
13
42 Views
Last Modified: 2016-07-21
I have an SSIS ETL process in which i use the multiple flat file connection to create 2 main tables:
  • DE_MASTER
  • DE_MASTER_ARCHIVE

I then have a cursor to union these 2 tables in SELECT Statement.  In the SELECT statement i am selecting most (almost all) of the columns.  This cursor populates other normalized tables, so the source tables are essentially read only after the ETL process is done.

My question is:

How can i improve the performance of this SELECT?  The union of the 2 tables produces about 3m rows and takes about 15 mins
There are no indexes on each of the source tables in the union.
0
Comment
Question by:gdinunzio
  • 5
  • 4
  • 3
  • +1
13 Comments
 
LVL 20

Accepted Solution

by:
Russ Suter earned 250 total points
ID: 41723263
Adding indexes where they would be useful could improve performance but your best bet by far is to retool the process to use a set-based approach instead of a cursor. With VERY few exceptions, cursor-based actions can be remade into set-based ones and the performance gains are significant to say the least.

Without you providing the code there's not much specific advice I can give you on how you might convert this into a set-based approach.
0
 
LVL 142

Assisted Solution

by:Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3] earned 250 total points
ID: 41723277
i fully agree with above: avoid cursors at all costs.
insert into  ... from ... join ....
and updates with joins usually does the job.
https://www.experts-exchange.com/articles/1517/UPDATES-with-JOIN-for-everybody.html

and if really you need a row by row handling
https://www.experts-exchange.com/articles/13640/processing-cursor-vs-temp-table-syntax.html
1
 
LVL 65

Expert Comment

by:Jim Horn
ID: 41723283
Would help if you could copy-paste the T-SQL into this question.
0
Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

 
LVL 20

Expert Comment

by:Russ Suter
ID: 41723321
and if really you need a row by row handling
https://www.experts-exchange.com/articles/13640/processing-cursor-vs-temp-table-syntax.html
I like your thinking here. Even temp tables are far more efficient than cursors. Put simply, ask most database experts and they'll tell you that cursors are pretty much the worst thing you can do in a database. Even the oft vilified trigger does not receive so much disdain.
0
 

Author Comment

by:gdinunzio
ID: 41723395
Apologies but i had to spend a minute obfuscating the sensitive data names...

Here is the basic union statement:

SELECT             
            m.N_ID, RTRIM(m.N_Name), TRY_PARSE(m.N_DOB AS DATE) AS DoB, m.N_SEX, m.N_INT,
            TRY_PARSE(m.N_DATE AS DATE),             
            TRY_PARSE(m.N_TIME AS TIME),
            TRY_PARSE(m.N_DATE_PR AS DATE),
            TRY_PARSE(m.N_TIME_PR AS TIME), m.N_INT_PR,
                        
            NULLIF(m.C_FIRA, ''), m.C_PREL, m.C_PMIS, m.C_OPEN,      

            m.A_ID,      TRY_PARSE(m.A_DT AS DATE) AS ADt, TRY_PARSE(SUBSTRING (m.A_TM, 1, 2) + ':' + SUBSTRING (m.A_TM, 3, 2) AS TIME) AS ATm      
      FROM DE_MASTER AS m      
      UNION
      SELECT             
            ma.N_ID, RTRIM(ma.N_Name), TRY_PARSE(ma.N_DOB AS DATE) AS DoB, ma.N_SEX, ma.N_INT,
            TRY_PARSE(ma.N_DATE AS DATE),             
            TRY_PARSE(ma.N_TIME AS TIME),
            TRY_PARSE(ma.N_DATE_PR AS DATE),
            TRY_PARSE(ma.N_TIME_PR AS TIME), ma.N_INT_PR,
                        
            NULLIF(ma.C_FIRA, ''), ma.C_PREL, ma.C_PMIS, ma.C_OPEN,      

            ma.A_ID,      TRY_PARSE(ma.A_DT AS DATE) AS ADt, TRY_PARSE(SUBSTRING (ma.A_TM, 1, 2) + ':' + SUBSTRING (ma.A_TM, 3, 2) AS TIME) AS ATm      
      FROM DE_MASTER_ARCHIVE AS ma      

The cursor presently creates name records in table N then needs the "N" PK from that insert to create records in table c then needs the "C" PK to create records in table "A"
0
 
LVL 20

Expert Comment

by:Russ Suter
ID: 41723404
I don't see a cursor here. Is the code you provided extracted from the inside of a cursor block?

That aside I do see a performance issue. Just about every column you're selecting is being run through a function. That makes indexing next to impossible. Certainly the built-in SQL optimizer won't have a clue what to do with this. You really should try to avoid using functions like this. I'm not sure why you're doing a TRY_PARSE() on a field called N_TIME. Shouldn't that field already be the correct data type?
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 41723409
try also union all instead of union
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 41723410
try also union all instead of union
0
 

Author Comment

by:gdinunzio
ID: 41723420
Unfortunately no...the source data was all text...even the dates were text.
0
 
LVL 20

Expert Comment

by:Russ Suter
ID: 41723475
UNION ALL won't give you any performance advantage. All that will do is show duplicates between union statements if there are any.

This is starting to look like what you really need to do is parse the text outside of SQL (perhaps using a .NET project) then transfer the data into your database. While SQL Server is capable of data manipulation that isn't what it's good at. Let the .NET framework do the heavy lifting and type coercion and then SQL can do its job more efficiently.
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 41723479
union does an implicit distinct... hence i do see a performance impact...
0
 

Author Closing Comment

by:gdinunzio
ID: 41723535
Thanks guys...excellent info!
0
 
LVL 20

Expert Comment

by:Russ Suter
ID: 41723551
union does an implicit distinct... hence i do see a performance impact...
Ah, yes you're right. I hadn't thought of it that way for some reason. It could improve performance but in this case any improvement would be completely overshadowed by the rest of the slowdown.

Also, I tend to use good indexes when I do this stuff so I've never noticed any practical benefit even though it's theoretically there. Still, if you have no practical reason not to use UNION ALL then why not squeeze every nanosecond of performance out of it?
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
SqlDataBase 7 48
Looking for a solution or even half a solution for custom web portal 5 47
Linked Server Issue with SQL2012 3 25
Run SQL Server Proc from Access 11 30
Having an SQL database can be a big investment for a small company. Hardware, setup and of course, the price of software all add up to a big bill that some companies may not be able to absorb.  Luckily, there is a free version SQL Express, but does …
The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question