• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 169
  • Last Modified:

sql server 2005

Currently this query takes 12 minutes to append 44,770,498 records to a table:

INSERT INTO TRP.Cap_Projection_Pln (CapID, ContractsID, Cap_Date
                        ,CalcMeth
                        ,Rate
                        ,CapYield
                        ,TargetTrendProjection)
SELECT  base.Cap_Id, Contr.PPGContr_ID, m.Mn
            ,Contr.Meth
            ,Contr.Rate
            ,TRP.udf_GetCapYield((Contr.Meth), base.PlnMbrs, (base.PlnCap * Budget), (Contr.Rate))
            ,(base.PlnCap * Budget)
FROM      TRP.Cap_Contr_Temp2 AS Contr INNER JOIN TRP.Cap_Data_ProjBase base ON
                        Contr.GeoId = base.Geo_id AND
                        Contr.TypeOfContract = base.Product AND
                        Contr.Silver = base.Silver
            CROSS JOIN TRP.Cap_Mnth m
WHERE m.Mn >= Contr.PlanEffDt AND m.Mn < Contr.PlanExpDt

I'm wondering if there's anyway this query can be modified so that it will run faster or takes less time to append that many records to a table?

Any input is appreciated.
0
HNA071252
Asked:
HNA071252
  • 11
  • 11
3 Solutions
 
momi_sabagCommented:
you can try 2 things
first run this query
SELECT  base.Cap_Id, Contr.PPGContr_ID, m.Mn
            ,Contr.Meth
            ,Contr.Rate
            ,TRP.udf_GetCapYield((Contr.Meth), base.PlnMbrs, (base.PlnCap * Budget), (Contr.Rate))
            ,(base.PlnCap * Budget)
into just_a_test_table
FROM      TRP.Cap_Contr_Temp2 AS Contr INNER JOIN TRP.Cap_Data_ProjBase base ON
                        Contr.GeoId = base.Geo_id AND
                        Contr.TypeOfContract = base.Product AND
                        Contr.Silver = base.Silver
            CROSS JOIN TRP.Cap_Mnth m
WHERE m.Mn >= Contr.PlanEffDt AND m.Mn < Contr.PlanExpDt

this will show you how fast this statement can be if you rule out logging and indexes on your target table
if it is still slow, it means ou need to improve the query
first things that comes to mind is to get rid of the cross join and use an inner join
also make sure you have indexes in place to support the join operation
0
 
Scott PletcherSenior DBACommented:
50M rows could require a lot of disk space.  Make sure the *log* file on the db is pre-allocated large enough to contain that much new data.

Make sure the data file has a fairly large fixed growth amount (for example, 50M rather than 5M) and that IFI is enabled OR, if you don't want to enable IFI, pre-allocate data space as well.
0
 
HNA071252Author Commented:
Per momi_sabag, I was trying to run this query, but got the error:

An object or column name is missing or empty. For SELECT INTO statements, verify each column has a name. For other statements, look for empty alias names. Aliases defined as "" or [] are not allowed. Add a name or single space as the alias name.

SELECT  base.Cap_Id, Contr.PPGContr_ID, m.Mn
            ,Contr.Meth
            ,Contr.Rate
            ,TRP.udf_GetCapYield((Contr.Meth), base.PlnMbrs, (base.PlnCap * Budget), (Contr.Rate))
            ,(base.PlnCap * Budget)
into TRP.Cap_Projection_Act_test
FROM      TRP.Cap_Contr_Temp2 AS Contr INNER JOIN TRP.Cap_Data_ProjBase base ON
                        Contr.GeoId = base.Geo_id AND
                        Contr.TypeOfContract = base.Product AND
                        Contr.Silver = base.Silver
            CROSS JOIN TRP.Cap_Mnth m
WHERE m.Mn >= Contr.PlanEffDt AND m.Mn < Contr.PlanExpDt

Please let me know how to fix it.
0
NFR key for Veeam Agent for Linux

Veeam is happy to provide a free NFR license for one year.  It allows for the non‑production use and valid for five workstations and two servers. Veeam Agent for Linux is a simple backup tool for your Linux installations, both on‑premises and in the public cloud.

 
HNA071252Author Commented:
I'm pretty new to Sql Server, didn't quite understand Scott's comment, not sure how to see the log file or enable IFI, pre-allocate data space etc.... Can you please elaborate more?
0
 
Scott PletcherSenior DBACommented:
I'd suggest not using a SELECT ... INTO, since that will cause roughly the same delays as inserting into an existing table.

But running the SELECT stand-alone is a good idea.  Also, I changed from a CROSS JOIN into an INNER join to gain efficiency there.

The other factor is the scalar function (udf_GetCapYield), which could be harming performance.  Would need to see the code for the function to be sure, though.  Presumably SQL will create a spool to cache the results to avoid recomputations, but would need to see the query plan to confirm or refute that.


Try just this statement:


SELECT COUNT(*)
FROM (
SELECT  base.Cap_Id, Contr.PPGContr_ID, m.Mn
            ,Contr.Meth
            ,Contr.Rate
            ,TRP.udf_GetCapYield((Contr.Meth), base.PlnMbrs, (base.PlnCap * Budget), (Contr.Rate)) AS CapYield
            ,(base.PlnCap * Budget) AS CapBudget
FROM      TRP.Cap_Contr_Temp2 AS Contr INNER JOIN TRP.Cap_Data_ProjBase base ON
                        Contr.GeoId = base.Geo_id AND
                        Contr.TypeOfContract = base.Product AND
                        Contr.Silver = base.Silver
INNER JOIN TRP.Cap_Mnth m ON
    m.Mn >= Contr.PlanEffDt AND m.Mn < Contr.PlanExpDt
) AS derived
0
 
HNA071252Author Commented:
The SELECT .... INTO took a lot longer at 20 mins, so this option is out of the way.

Also the INNER JOIN (see below) took 13 mins 19 secs, so it's still longer than the CROSS JOIN:

INSERT INTO TRP.Cap_Projection_Pln (CapID, ContractsID, Cap_Date
                        ,CalcMeth
                        ,Rate
                        ,CapYield
                        ,TargetTrendProjection)
SELECT  base.Cap_Id, Contr.PPGContr_ID, m.Mn
            ,Contr.Meth
            ,Contr.Rate
            ,TRP.udf_GetCapYield((Contr.Meth), base.PlnMbrs, (base.PlnCap * Budget), (Contr.Rate))
            ,(base.PlnCap * Budget)
FROM      TRP.Cap_Contr_Temp2 AS Contr INNER JOIN TRP.Cap_Data_ProjBase base ON
                        Contr.GeoId = base.Geo_id AND
                        Contr.TypeOfContract = base.Product AND
                        Contr.Silver = base.Silver
            INNER JOIN TRP.Cap_Mnth m ON
                        m.Mn >= Contr.PlanEffDt AND m.Mn < Contr.PlanExpDt

Please let me know if there's any other way to improve this query
0
 
HNA071252Author Commented:
Also here's udf_GetCapYield for your review:

set ANSI_NULLS ON
set QUOTED_IDENTIFIER ON
go


ALTER FUNCTION [TRP].[udf_GetCapYield]
(@CalcMeth nvarchar(3), @MbrMn int, @CapAmt float, @Rate float)
RETURNS float
AS
BEGIN
      DECLARE @Yield float
      DECLARE @PlanDemo float
      DECLARE @MDCRRate float
      SET @MdcrRate = 1
      SET @Yield = 0

      IF @MbrMn <> 0
            SET @Yield = @CapAmt/@MbrMn
      IF @Rate = 0
            SET @PlanDemo = 1
      ELSE
            SET @PlanDemo = @Yield/@Rate

      IF @CalcMeth = 'APM' OR @CalcMeth = 'BGR'
            SET @Yield = @Rate * @PlanDemo
      IF @CalcMeth = 'PPM'
            SET @Yield = @Rate
      IF @CalcMeth = 'NCB' OR @CalcMeth = 'CAB'
            SET @Yield = @Rate * @PlanDemo * @MdcrRate

RETURN @Yield

END
0
 
Scott PletcherSenior DBACommented:
1) Please run the SELECT that just returns a COUNT(*), as I posted earlier, and clock it.

2) Post the results of these on your db:

USE <your_db_name>

EXEC sp_helpfile

DBCC SHOWFILESTATS --for the fg containing the table being INSERTed into

--The count of rows returned by: DBCC LOGINFO

--dbcc sqlperf(logspace) for just the db in question
0
 
Scott PletcherSenior DBACommented:
The function could be tightened quite a bit, even coded as a scalar function, but I don't think it's the main issue here.

An inline table function or a computed column would be more efficient, however.
0
 
HNA071252Author Commented:
The SELECT that just returns a COUNT(*) only run less than a second. See attach.
EE-Question.doc
0
 
Scott PletcherSenior DBACommented:
>> The SELECT that just returns a COUNT(*) only run less than a second <<

I think that will end up confirming my original suspicion -- the SELECT is not the performance issue, it's the INSERT.

The database is ~61GB.  10% growth is 6GB -- if IFI is not on and that needs preformatted, that will take a LONG time.

The log is ~18.3G.  Log space must ALWAYS be preformatted, and 1.8G will take a while to preformat as well.


Sorry, but I also need to see the last two pieces of info to really provide the best help.

How many rows are returned by this command?: DBCC LOGINFO
[Just note the total number of rows -- a single int value -- returned from SSMS and post it here -- I do NOT need to see the rows themselves.]


The % used shown for the db in question by the command: DBCC sqlperf(logspace)
0
 
Scott PletcherSenior DBACommented:
Will also need to see the recovery model for the db:

SELECT
    recovery_model_desc, log_reuse_wait_desc
FROM sys.databases
WHERE
    name = N'<your_db_name_here>'
0
 
HNA071252Author Commented:
Here it is. See attach.
EE-Question-2.doc
0
 
Scott PletcherSenior DBACommented:
OK, I misread the numbers before.

It's ~594GB data file, with ~17GB free, and a ~1.8GB log file, with ~1.5G free.  That's more typical.  Growth for both files is 10%, which is not a good choice as files get larger.


Run the commands before before running the INSERT.  These commands may take a while to run, but are much faster stand-alone than if they occur in reaction to a INSERT:

ALTER DATABASE PSF_GEN
MODIFY FILE ( NAME = PSF_GEN_DATA, FILEGROWTH = 16GB )

ALTER DATABASE PSF_GEN
-- preallocate more log space, but reduce the growth amount
MODIFY FILE ( NAME = PSF_GEN_LOG, SIZE = 6GB, FILEGROWTH = 1GB )


Then try the INSERT again and see what times you get.
0
 
HNA071252Author Commented:
I ran the first command, but I couldn't run the 2nd one it gave me this error:

MODIFY FILE failed. Specified size is less than current size.

I ran the query again it took a couple minutes longer.

Can you please tell me how I can reverse back the ALTER DATABASE command before to show the 10% increase?
0
 
Scott PletcherSenior DBACommented:
ALTER DATABASE PSF_GEN
MODIFY FILE ( NAME = PSF_GEN_DATA, FILEGROWTH = 10% )


But that's a big mistake.  60GB (!) is way too much to grow at one time.
0
 
HNA071252Author Commented:
I don't know why I'm running some processes and it's taking a lot longer than it usually took, so I figure I would change it back to where it was. But here's how it look now, see attach 3.
EE-Question-3.doc
0
 
Scott PletcherSenior DBACommented:
It's the log that has gotten huge.  To do these HUGE inserts, you need to preallocate the log space, not let it grow automatically.

If it's not permanent data, you should be using tempdb for this, assuming that tempdb is properly tuned.
0
 
HNA071252Author Commented:
But I cannot preallocate log space (with this command below) because the Specified size is less than current size.


ALTER DATABASE PSF_GEN
-- preallocate more log space, but reduce the growth amount
MODIFY FILE ( NAME = PSF_GEN_LOG, SIZE = 1GB, FILEGROWTH = 1GB )
0
 
Scott PletcherSenior DBACommented:
If I'm reading it right, the log's up to 70G(!) now, so you should have no problem with log space this time.

It's still a HUGE amount of data to write to the log tho.

You might want to seriously consider BULK_LOGGED recovery model for that db.
0
 
HNA071252Author Commented:
I don't understand BULK_LOGGED,.... what do I do to improve the speed of my processes?
0
 
HNA071252Author Commented:
"You might want to seriously consider BULK_LOGGED recovery model for that db."

What does this mean? Can you please elaborate?
0
 
Scott PletcherSenior DBACommented:
There are 3 recovery models for a db: simple, bulk-logged and full.

Both simple and bulk-logged have to do less internal work, and write less data to the log, than full model.

Since you are inserting a VERY LARGE number of rows at once, specifying bulk-logged would significantly reduce the overhead of those inserts.

The differences are technical, but mostly come down to recovery capability:

with full recovery, you can always use log backups to recovery to any point-in-time.
with bulk-logged, you cannot, you can only recover to a diff or to the point of the first bulk load after the last full backup.
0

Featured Post

Prep for the ITIL® Foundation Certification Exam

December’s Course of the Month is now available! Enroll to learn ITIL® Foundation best practices for delivering IT services effectively and efficiently.

  • 11
  • 11
Tackle projects and never again get stuck behind a technical roadblock.
Join Now