Avatar of gdinunzio
gdinunzio
 asked on

Improve speed of cursor

I have an SSIS ETL process in which i use the multiple flat file connection to create 2 main tables:
  • DE_MASTER
  • DE_MASTER_ARCHIVE

I then have a cursor to union these 2 tables in SELECT Statement.  In the SELECT statement i am selecting most (almost all) of the columns.  This cursor populates other normalized tables, so the source tables are essentially read only after the ETL process is done.

My question is:

How can i improve the performance of this SELECT?  The union of the 2 tables produces about 3m rows and takes about 15 mins
There are no indexes on each of the source tables in the union.
Microsoft SQL Server

Avatar of undefined
Last Comment
Russ Suter

8/22/2022 - Mon
ASKER CERTIFIED SOLUTION
Russ Suter

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
SOLUTION
Guy Hengel [angelIII / a3]

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
Jim Horn

Would help if you could copy-paste the T-SQL into this question.
Russ Suter

and if really you need a row by row handling
https://www.experts-exchange.com/articles/13640/processing-cursor-vs-temp-table-syntax.html
I like your thinking here. Even temp tables are far more efficient than cursors. Put simply, ask most database experts and they'll tell you that cursors are pretty much the worst thing you can do in a database. Even the oft vilified trigger does not receive so much disdain.
gdinunzio

ASKER
Apologies but i had to spend a minute obfuscating the sensitive data names...

Here is the basic union statement:

SELECT             
            m.N_ID, RTRIM(m.N_Name), TRY_PARSE(m.N_DOB AS DATE) AS DoB, m.N_SEX, m.N_INT,
            TRY_PARSE(m.N_DATE AS DATE),             
            TRY_PARSE(m.N_TIME AS TIME),
            TRY_PARSE(m.N_DATE_PR AS DATE),
            TRY_PARSE(m.N_TIME_PR AS TIME), m.N_INT_PR,
                        
            NULLIF(m.C_FIRA, ''), m.C_PREL, m.C_PMIS, m.C_OPEN,      

            m.A_ID,      TRY_PARSE(m.A_DT AS DATE) AS ADt, TRY_PARSE(SUBSTRING (m.A_TM, 1, 2) + ':' + SUBSTRING (m.A_TM, 3, 2) AS TIME) AS ATm      
      FROM DE_MASTER AS m      
      UNION
      SELECT             
            ma.N_ID, RTRIM(ma.N_Name), TRY_PARSE(ma.N_DOB AS DATE) AS DoB, ma.N_SEX, ma.N_INT,
            TRY_PARSE(ma.N_DATE AS DATE),             
            TRY_PARSE(ma.N_TIME AS TIME),
            TRY_PARSE(ma.N_DATE_PR AS DATE),
            TRY_PARSE(ma.N_TIME_PR AS TIME), ma.N_INT_PR,
                        
            NULLIF(ma.C_FIRA, ''), ma.C_PREL, ma.C_PMIS, ma.C_OPEN,      

            ma.A_ID,      TRY_PARSE(ma.A_DT AS DATE) AS ADt, TRY_PARSE(SUBSTRING (ma.A_TM, 1, 2) + ':' + SUBSTRING (ma.A_TM, 3, 2) AS TIME) AS ATm      
      FROM DE_MASTER_ARCHIVE AS ma      

The cursor presently creates name records in table N then needs the "N" PK from that insert to create records in table c then needs the "C" PK to create records in table "A"
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
Russ Suter

I don't see a cursor here. Is the code you provided extracted from the inside of a cursor block?

That aside I do see a performance issue. Just about every column you're selecting is being run through a function. That makes indexing next to impossible. Certainly the built-in SQL optimizer won't have a clue what to do with this. You really should try to avoid using functions like this. I'm not sure why you're doing a TRY_PARSE() on a field called N_TIME. Shouldn't that field already be the correct data type?
Guy Hengel [angelIII / a3]

try also union all instead of union
Guy Hengel [angelIII / a3]

try also union all instead of union
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
gdinunzio

ASKER
Unfortunately no...the source data was all text...even the dates were text.
Russ Suter

UNION ALL won't give you any performance advantage. All that will do is show duplicates between union statements if there are any.

This is starting to look like what you really need to do is parse the text outside of SQL (perhaps using a .NET project) then transfer the data into your database. While SQL Server is capable of data manipulation that isn't what it's good at. Let the .NET framework do the heavy lifting and type coercion and then SQL can do its job more efficiently.
Guy Hengel [angelIII / a3]

union does an implicit distinct... hence i do see a performance impact...
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
gdinunzio

ASKER
Thanks guys...excellent info!
Russ Suter

union does an implicit distinct... hence i do see a performance impact...
Ah, yes you're right. I hadn't thought of it that way for some reason. It could improve performance but in this case any improvement would be completely overshadowed by the rest of the slowdown.

Also, I tend to use good indexes when I do this stuff so I've never noticed any practical benefit even though it's theoretically there. Still, if you have no practical reason not to use UNION ALL then why not squeeze every nanosecond of performance out of it?