asked on

Must optimize this procedure, need an assist

This is v2008 R2. A customer asked me to optimize a script. Typically not a problem, but I've actually never seen anything structured like this before. It is an insert into a very poorly designed table that unfortunately, cannot be altered at this time. The table has 219 NVARCHAR columns, all drastically oversized when the column length is compared to the actual datalength. But again, I can't alter the table right now.

I am not certain, but I believe the NVARCHAR(MAX) columns may be one of the bigger problems. There are 11 of them. This one, for example --

Attributes.value('(/cpCollection/group/property[@id="RelatedAssets"]/value)[1]', 'NVarChar(MAX)'),

These are the relevant data stats:

select count(*) from Products 37736
select max(len(relatedassets)) from Products 0
select count(*) from Products where relatedAssets IS NOT NULL 37736
select count(*) from Products where relatedAssets = '' 37736

I've checked all 11 NVARCHAR(MAX) columns. They are all similar in that most of the data values in the columns are blank/empty strings. But again, I can't change the table def right now. I've let them know it is very questionable, but I've got to get the script optimized before we can look at revising the table, if in fact, they're going to do it. Side question; how much storage is reserved, if any, on a NVARCHAR(MAX) column? Say it's a single blank character/empty space. What is the actual overhead on that in a NVARCHAR(MAX) column?

Regardless, one would think I could still improve the DML to get the data into the table. These are very small datasets. It's an insert of about 37K records, which is running upwards of 5hrs. I have the MAX datalength for every column. With that information, what is the most effective approach for improving the performance of this insert?

The attached is just a draft. I'm not worried about the TRUNCATE. It is the INSERT that I'm looking for tips on improving.
ProcedureName.sql

Olaf Doschke

What is taking long here is parsing the XML in the XML field (named Attribute) of the othertable the insert selects from. I don't see a way to optimiue it at this step. It should be possible to optimize the process of importing this XML data a few steps ahead, when they arrive in the othertable. Parsing out simple values of a whole XML and doing that for each value restarrting from scratch obviously is not very performant. So the code adding the XML into the Attribute should already transform the XML to the sigle attribute values you really need in one pass of parsing the XML sequentially instead of this way.

Bye, Olaf.

ste5an

Instead of parsing the column over and over, This should be faster:

Extract the values first into a temp table:

DECLARE @OtherTable TABLE
    (
      DataId INT ,
      AutoNumber INT ,
      Attributes XML
    );
                                                                    
INSERT  INTO @OtherTable
        ( DataId ,
          AutoNumber ,
          Attributes
        )
VALUES  ( 1 ,
          1 ,
          N'<cpCollection><group><property id="RelatedAssets">a</property><property id="PleaseNote">test</property></group></cpCollection>'
        );

SELECT  DataId ,
        AutoNumber ,
        Property.value('./@id', 'NVARCHAR(MAX)') AS PropertyID ,
        Property.value('.', 'NVARCHAR(MAX)') AS PropertyValue
FROM    @OtherTable
        OUTER APPLY Attributes.nodes('/cpCollection/group/property') A ( Property );

Open in new window

Then do a pivot on it for you insert operation.