Link to home
Start Free TrialLog in
Avatar of dbaSQL
dbaSQLFlag for United States of America

asked on

Must optimize this procedure, need an assist

This is v2008 R2.  A customer asked me to optimize a script.  Typically not a problem, but I've actually never seen anything structured like this before.  It is an insert into a very poorly designed table that unfortunately, cannot be altered at this time.  The table has 219 NVARCHAR columns, all drastically oversized when the column length is compared to the actual datalength.  But again, I can't alter the table right now.  

I am not certain, but I believe the NVARCHAR(MAX) columns may be one of the bigger problems.   There are 11 of them.   This one, for example --

      Attributes.value('(/cpCollection/group/property[@id="RelatedAssets"]/value)[1]', 'NVarChar(MAX)'),

These are the relevant data stats:

select count(*) from Products                                                                    37736
select max(len(relatedassets)) from Products                                           0
select count(*) from Products where relatedAssets IS NOT NULL       37736
select count(*) from Products where relatedAssets = ''                         37736

I've checked all 11 NVARCHAR(MAX) columns.  They are all similar in that most of the data values in the columns are blank/empty strings.  But again, I can't change the table def right now.  I've let them know it is very questionable, but I've got to get the script optimized before we can look at revising the table, if in fact, they're going to do it.  Side question;  how much storage is reserved, if any, on a NVARCHAR(MAX) column?  Say it's a single blank character/empty space.  What is the actual overhead on that in a NVARCHAR(MAX) column?

Regardless, one would think I could still improve the DML to get the data into the table.  These are very small datasets.  It's an insert of about 37K records, which is running upwards of 5hrs.  I have the MAX datalength for every column.  With that information, what is the most effective approach for improving the performance of this insert?  

The attached is just a draft.  I'm not worried about the TRUNCATE.  It is the INSERT that I'm looking for tips on improving.
ProcedureName.sql
Avatar of Olaf Doschke
Olaf Doschke
Flag of Germany image

What is taking long here is parsing the XML in the XML field (named Attribute) of the othertable the insert selects from. I don't see a way to optimiue it at this step. It should be possible to optimize the process of importing this XML data a few steps ahead, when they arrive in the othertable. Parsing out simple values of a whole XML and doing that for each value restarrting from scratch obviously is not very performant. So the code adding the XML into the Attribute should already transform the XML to the sigle attribute values you really need in one pass of parsing the XML sequentially instead of this way.

Bye, Olaf.
Instead of parsing the column over and over, This should be faster:

Extract the values first into a temp table:

DECLARE @OtherTable TABLE
    (
      DataId INT ,
      AutoNumber INT ,
      Attributes XML
    );
                                                                    
INSERT  INTO @OtherTable
        ( DataId ,
          AutoNumber ,
          Attributes
        )
VALUES  ( 1 ,
          1 ,
          N'<cpCollection><group><property id="RelatedAssets">a</property><property id="PleaseNote">test</property></group></cpCollection>'
        );

SELECT  DataId ,
        AutoNumber ,
        Property.value('./@id', 'NVARCHAR(MAX)') AS PropertyID ,
        Property.value('.', 'NVARCHAR(MAX)') AS PropertyValue
FROM    @OtherTable
        OUTER APPLY Attributes.nodes('/cpCollection/group/property') A ( Property );

Open in new window


Then do a pivot on it for you insert operation.
Avatar of dbaSQL

ASKER

A pivot for the insert ?
Yup.
DECLARE @OtherTable TABLE
    (
      DataId INT ,
      AutoNumber INT ,
      Attributes XML
    );
                                                                    
INSERT  INTO @OtherTable
        ( DataId ,
          AutoNumber ,
          Attributes
        )
VALUES  ( 1 ,
          1 ,
          N'<cpCollection><group><property id="RelatedAssets">a</property><property id="PleaseNote">test</property></group></cpCollection>'
        );

DECLARE @Temp TABLE
    (
      DataId INT ,
      AutoNumber INT ,
      PropertyID NVARCHAR(MAX) ,
      PropertyValue NVARCHAR(MAX)
    );

INSERT  INTO @Temp
        SELECT  DataId ,
                AutoNumber ,
                Property.value('./@id', 'NVARCHAR(MAX)') AS PropertyID ,
                Property.value('.', 'NVARCHAR(MAX)') AS PropertyValue
        FROM    @OtherTable
                OUTER APPLY Attributes.nodes('/cpCollection/group/property') A ( Property );

SELECT  P.DataId ,
        P.AutoNumber ,
        P.RelatedAssets ,
        P.PleaseNote
FROM    @Temp T PIVOT ( MIN(PropertyValue) FOR PropertyID IN ( [RelatedAssets], [PleaseNote] ) ) P;

Open in new window

Avatar of dbaSQL

ASKER

Olaf, there are no XML columns in the table.  We do not have to use this as the method for insertion, but again, because I have not seen this structured INSERT before, I'm not entirely sure of the correct translation.  I could just remove all of the ATTRIBUTE paths in the insert, and just use the column names as per a normal insert, no?

ste5an, because there are no columns in the targeted table of XML datatype, would you still suggest the method you've posted?
Avatar of dbaSQL

ASKER

Nope.  While there are no  XML columns in the table that is inserted to, the data is coming from an XML column in the source instance.  I am not an XML parsing guru by any means.  I will attempt your suggestion now, ste5an.
Avatar of dbaSQL

ASKER

as I mentioned, ste5an, I am not an XML guru.  The statement you've suggested ; Presumably I am adding all of the other values in to the Attributes string for the inser to @OtherTable.  Where then am I joining the @OtherTable to the actual OtherTable before the Products table insert, to get the values that are NOT in the OtherTable..XML column?
>the data is coming from an XML column in the source instance

Well, that was, what I told you initially:

in the XML field (named Attribute) of the othertable

Code like Attribute.value(...) makes no sense, if Attribute would not be an XML string or field.

While ste5ans solution might work, I'd try to go up that rain of data merging and see, if you could not even change how the data arrives as XML, there are better tools than SQL Server to handle XML and convert it to the single attributes you need, so this could already be addressed outside of SQL Server.

Bye, Olaf.
In regard of the part of the problem not having similar schema, you have to use default values for the non existant columns, eg via COALESCE or ISNULL or simply via specifying literal values.

Along the lines of:
INSERT INTO TABLE_B (col1, col2, newcol, col3) SELECT col1, col2, 'text' as newcol, col3 FROM TABLE_A;

Open in new window


or
INSERT INTO TABLE_B (col1, col2, newcol, col3) SELECT col1, col2, COALESCE(expression1,'text') as newcol, col3 FROM TABLE_A;

Open in new window


In this version with COALESCE the expression1 can be anything, that eventually has a value for the newcol column, but if it's NULL the value is set to 'text' (or a number, date or whatever type you need.

Bye, Olaf.
Avatar of dbaSQL

ASKER

>>there are better tools than SQL Server to handle XML and convert it to the single attributes you need, so this could already be addressed outside of SQL Server.

As I mentioned, Olaf, I am not in a position to do anything other than optimize this INSERT the absolute best that I can right now.  There is also no reason to introduce a new, non-SQL tool to perform this manipulation, because this database and application is on it's way out.  My goal is simple. I need to improve this statement as best as possible.  A 5 hour runtime is simply not acceptable for inserting 37K records from a different table in the same database -- even if the XML parsing is a part of the process.
If you don't even look at the previous step you have no chance to see, whether there is a much simpler solution than to use a pivot here at this step.

If you get the task of parting peas and lentils and you find a few steps earlier in the process peas and lentils are mixed together, then this is the step of the process to change.

I'm hinting: There is some place the XML is generated. It might come in as the import files generated from data, then export/import step could be changed much easier and with much less effort.

Bye, Olaf.
I could not help further, when you're not posting your actual code and a concise and complete example. This includes table DDL and sample data INSERT statements as runnable T-SQL script.
Avatar of dbaSQL

ASKER

>>If you don't even look at the previous step you have no chance to see, whether there is a much simpler solution than to use a pivot here at this step.  
Olaf, I have already discussed this with the customer. It is not an option.  
I love lentils.

ste5an, the sample was good.  I appreciate it.  I am not 100% clear, but I will try to start modeling the actual construct with the other properties pulled from the XML.  If you could help me with the join between @OtherTable and OtherTable, I would be very grateful.  

Also, the script in the procedure is the statement entirely.  The actual code, I mean.  Aside from the genericized object names:  ProcedureName, Products, OtherTable.  Other than these names, the code is actual.

I have attached the table definition for both the Products and OtherTable tables.
ProductsAndOtherTableDef.sql
Avatar of dbaSQL

ASKER

ste5an, Your example is good, but it is passing in the test values explicitly.  it is not clear to me how to model the attributes into @OtherTable for the actual select from the real table.
ASKER CERTIFIED SOLUTION
Avatar of ste5an
ste5an
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of dbaSQL

ASKER

>>@OtherTable is just a placeholder for your table named otherTable.
Now that's embarrassing.   Let me see if I can do this.  Back soon.
Avatar of dbaSQL

ASKER

>>       Property.value('./@id', 'NVARCHAR(MAX)') AS PropertyID ,
       Property.value('.', 'NVARCHAR(MAX)') AS PropertyValue,

This part of the insert into @Temp puzzles me, ste5an.  Would you mind adding one or even two more of the properties from the XML, so that I can see more clearly how to structure this?
Just run this against your table:

SELECT  DataId ,
        AutoNumber ,
        Property.value('./@id', 'NVARCHAR(MAX)') AS PropertyID ,
        Property.value('.', 'NVARCHAR(MAX)') AS PropertyValue
FROM    dbo.OtherTable
        OUTER APPLY Attributes.nodes('/cpCollection/group/property') A ( Property );

Open in new window

Avatar of dbaSQL

ASKER

very helpful.  thank you.  back soon, hopefully
Avatar of dbaSQL

ASKER

this is becoming huge... I have to step out soon, ste5an.  hopefully you will be around later.  I'll get back on this tonight.  I am finally at this point in the logic.  just now beginning it.


INSERT  dbo.Products
        ( DataId ,
          AutoNumber ,
          RelatedAssets ,
          InsertionTime
             )
        SELECT  P.DataId ,
                P.AutoNumber ,
                P.RelatedAssets ,
                CASE WHEN P.InsertionTime = '' THEN NULL
                     ELSE CAST(P.InsertionTime AS DATETIME)
                END
        FROM    @Temp T PIVOT ( MIN(PropertyValue) FOR PropertyID IN ( RelatedAssets, InsertionTime ) ) P;
Avatar of dbaSQL

ASKER

Well, that SELECT you suggested here ID: 41510140 worked fine.  I even added several other columns to the output in addition to yours, it looked good.  From there I spend a ton of time getting the other values into the process -- but when I attempt to run it, I just get a whole lot of 'invalid column names..'

Like this:


Msg 207, Level 16, State 1, Line 11
Invalid column name 'RelatedAssets'.
Msg 207, Level 16, State 1, Line 12
Invalid column name 'InsertionTime'.
Msg 207, Level 16, State 1, Line 13
Invalid column name 'AvailableInKitSetAsst'.
Msg 207, Level 16, State 1, Line 14
Invalid column name 'KitSetAsst'.
Msg 207, Level 16, State 1, Line 15
Invalid column name 'NewItemMeetingDate'.

This is my sequence of events:

DECLARE @Temp table
Load it from the OtherTable with OUTER APPLY on my Attributes.nodes
Load the final table with pivoted select on @Temp

Are you available, ste5an?
Avatar of dbaSQL

ASKER

I just realized 'it worked fine' because when I ran it before, it was just all of the columns on the table that were not the XML properties.  Now that I've added in all of the XML nodes (if that's what they're called), the columns are not being found.  

I will keep working it.
Avatar of dbaSQL

ASKER

Msg 207, Level 16, State 1, Line 12
Invalid column name 'InsertionTime'.

For the above error, I've added this to the statement:
Property.value('@insertiontime','datetime') InsertionTime,

No longer errors as invalid, but the returned data is just 'NULL'
No, because the Property table, which ste5ans query generates, is not having XML in it, it has tow columns PropertyID and PropertyValue, so looking up @insertiontype in Property, means looking for the record with PropertyID="InsertionTime" and reading its PropertyValue, maybe casting to the right result type.

Ste5ans query takes the XML and creates a classic table from it, but it can't go as far as having InsertionTime etc columns, all the XML nodes are transferred to a record each, consisting of a pair of PropertyID (or name) and PropertyValue.

Bye, Olaf.
The only thing you need to do is to add the columns here. For each attribute parsing call in you original SQL like

Attributes.value('(/cpCollection/group/property[@id="RelatedAssets"]/value)[1]', 'CHAR(1)')

You need to add the id as column in the PIVOT clause
               
PIVOT ( MIN(PropertyValue) FOR PropertyID IN ( RelatedAssets ) ) P;

Then you can use this name in the SELECT for the INSERT as:

SELECT  P.DataId ,
                P.AutoNumber ,
                P.RelatedAssets
FROM    @Temp T PIVOT [..]

You just need to apply the type cast as Olaf said.
Avatar of dbaSQL

ASKER

I have done this for all of  the values:
SELECT  P.DataId ,
                P.AutoNumber ,
                P.RelatedAssets
FROM    @Temp T PIVOT [..]

but this part is still unclear to me.  can you add one or two additional columns into the shortened example you gave me, ste5an?

PIVOT ( MIN(PropertyValue) FOR PropertyID IN ( RelatedAssets ) ) P;
HHCIB?  Just copy every id value from the XPaths of your original query into the column clause:

DECLARE @Temp TABLE
    (
      DataId INT ,
      AutoNumber INT ,
      PropertyID NVARCHAR(MAX) ,
      PropertyValue NVARCHAR(MAX)
    );

INSERT  INTO @Temp
        SELECT  DataId ,
                AutoNumber ,
                Property.value('./@id', 'NVARCHAR(MAX)') AS PropertyID ,
                Property.value('.', 'NVARCHAR(MAX)') AS PropertyValue
        FROM    dbo.OtherTable
                OUTER APPLY Attributes.nodes('/cpCollection/group/property') A ( Property )
        WHERE   ( ModuleId = '3c3cfc3d-4c00-4f0b-bf68-ca2d9f42b352' )
             
    
SELECT  *
FROM    @Temp T PIVOT ( MIN(PropertyValue) FOR PropertyID IN ( RelatedAssets, TextBox1, ProjSalesPerItem, ProdCatSection ) ) P;

Open in new window

Avatar of dbaSQL

ASKER

I wasn't sure what 'HHCIB' meant.... Google defined it for me right away.  I must not be expressing or conveying myself well at all.  I will keep at it and provide status when complete.  Thank you, ste5an.
Well, sometimes patience is not a friend of mine.. sry.
Indeed pivot is not in everybodies toolset - mentally. Every functionality SQL Server offers is there, but you also have to comprehend it.

In the end the IN clause has to have all the property names in quiotes, PropertyID is a char value and can only be IN a list of char values.

...FOR PropertyID IN ( 'RelatedAssets', 'TextBox1', 'ProjSalesP'...)

It was no wonder you got the series of invalid column name errors. Those column names only result in the pivot, but not beforehand. So @Temp does not have these columns, P, the alias of the pivot result - has the columns P.RelatedAssets, P.TextBox1, ...

...in the end it makes me think, if you really need to name all the properties, couldn't you simply do FOR 1=1 or FOR PropertyID LIKE '%' as short for any propertyID, which would put all the original IDs as column names of P?

It's really hard to do without any sample data available.

Bye, Olaf.
Avatar of dbaSQL

ASKER

Are you able to point out the flaws in the attached? This has become terribly critical.
working.sql
The part

FOR PropertyID IN(
		  DataId,
		  AutoNumber,
		  ModifyDate,
		  StartDate,
		  EndDate,
  		  IsEnabled,
...

Open in new window


Only has to have the "column" names coming from the XML values, not all the otehr fields. And all those names have to be in quotes. They are just the literal name values of PropertID and the pivoting makes those name values columns.

All other normal columns don't come out of the pivoting of @Temp data.

Bye, Olaf.
I'm wrong about putting the PropertyID values in quotes. This sample put together from already posted samples results in a final record having the XML nodes as columns:

DECLARE @Temp TABLE
    (
      DataId INT ,
      AutoNumber INT ,
      PropertyID NVARCHAR(MAX) ,
      PropertyValue NVARCHAR(MAX)
    );

DECLARE @OtherTable TABLE
    (
      DataId INT ,
      AutoNumber INT ,
      Attributes XML
    );
                                                                    
INSERT  INTO @OtherTable
        ( DataId ,
          AutoNumber ,
          Attributes
        )
VALUES  ( 1 ,
          1 ,
          N'<cpCollection><group><property id="RelatedAssets">a</property><property id="PleaseNote">test</property></group></cpCollection>'
        );

Insert Into @Temp
SELECT  DataId ,
        AutoNumber ,
        Property.value('./@id', 'NVARCHAR(MAX)') AS PropertyID ,
        Property.value('.', 'NVARCHAR(MAX)') AS PropertyValue
FROM    @OtherTable
        OUTER APPLY Attributes.nodes('/cpCollection/group/property') A ( Property );

Select * From @Temp

SELECT  P.DataId ,
                P.AutoNumber ,
                P.RelatedAssets ,
                CASE WHEN P.PleaseNote = '' THEN NULL ELSE CAST(P.PleaseNote AS char(4)) END as PleaseNote
        FROM    @Temp T PIVOT ( MIN(PropertyValue) FOR PropertyID IN ( [b]RelatedAssets, PleaseNote [/b]) ) P;

Open in new window


Not PropertyID only has the names coming from the XML snippet, not the DataId and AutoNumber columns already being single columns of @Temp before pivoting.

Do you see now how this works? The two records of @Temp are transposed to the two fields of the final result.

The PropertyValue column is NVARCHAR(MAX), for this can hold any textual part of the XML, the conversion to whatever end result type has to be done, where NVARCHAR(MAX) does not convert implicitly to the dbo.Products destination table column type.

We can't do this for you without knowing table definitions and XML snippets.

Bye, Olaf.
Avatar of dbaSQL

ASKER

>>We can't do this for you without knowing table definitions and XML snippets.
I have already sent the table definition in this post: ID: 41509902
I don't have any snippets, but the script that i provided is exactly what is being run.  

my attempt to run the piece you just posted fails w/this:
Msg 102, Level 15, State 1, Line 40
Incorrect syntax near 'RelatedAssets'.

if  i take the b's out from around the values in the pivot, it works fine.  I think one of the problems here is that what works is always the samples with test data values.  When I am selecting the data from the XML column in the OtherTable, I cannot bring this together. Yet.
Avatar of dbaSQL

ASKER

Aside from removing the non-XML columns from the PropertyID IN list, where are the columns supposed to be?  I've got them all in there, yet the logic still won't even compile.  Seemingly the columns are referenced too many times.

I've got a ton of these:
Msg 207, Level 16, State 1, Procedure procedurename, Line 282
Invalid column name 'RelatedAssets'.
Msg 207, Level 16, State 1, Procedure procedurename, Line 283
Invalid column name 'InsertionTime'.

and a whole bunch of these;
Msg 265, Level 16, State 1, Procedure procedurename, Line 999
The column name "RelatedAssets" specified in the PIVOT operator conflicts with the existing column name in the PIVOT argument.
Msg 265, Level 16, State 1, Procedure procedurename, Line 1000
The column name "InsertionTime" specified in the PIVOT operator conflicts with the existing column name in the PIVOT argument.


and just one of these:
Msg 8156, Level 16, State 1, Procedure procedurename, Line 1228
The column 'RelatedAssets' was specified multiple times for 'p'.


Are the xml column names supposed to be in the select from @Temp or just the PropertyID IN list?
Well, sorry, the part between [ b ] and [ /b ] should have been rendered bold, that wasn't intended part of the code that was just code to render text bold, it doesn't work inside a code section.

I see the table definitions, but I don't see othertable data in post  41509902, the dbo.Othertable contains the XML that is parsed ... AS PropertyID and ... AS PropertyValue. I'd be interested in values of the Attribute field, we already knew it has to be an XML field, but only knowing the XML values in it could make the whole process testable.

Bye, Olaf.
Avatar of dbaSQL

ASKER

>> I don't see othertable data in post  41509902, the dbo.Othertable contains the XML that is parsed
It's at the bottom of the script.  Both tables, same script, i just checked.

<<but only knowing the XML values in it could make the whole process testable.
is that not in the original script that I attached?  All of this --

Attributes.value('(/cpCollection/group/property[@id="RelatedAssets"]/value)[1]', 'NVarChar(MAX)'),
CASE Attributes.value('(/cpCollection/group/property[@id="InsertionTime"]/value)[1]', 'VarChar(36)') WHEN '' THEN NULL ELSE Attributes.value('(/cpCollection/group/property[@id="InsertionTime"]/value)[1]', 'DateTime') END,             Attributes.value('(/cpCollection/group/property[@id="AvailableInKitSetAsst"]/value)[1]', 'NVarChar(256)'),
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
No,

Attributes.value('(/cpCollection/group/property[@id="RelatedAssets"]/value)[1]', 'NVarChar(MAX)'),

This is not XML, it is a expression parsing some single node from the XML in the Atttributes field.

All I/we can see from that is, that the XML contains a strucutre with cpCollection elements containing group elements containing property elements.

Bye, Olaf.
Avatar of dbaSQL

ASKER

>>All the column, which come from the XML can't be part of @Temp.
Yes!  I was just typing that, thanking you both for all of your help, and saying that the insert into @temp is the problem, because the columns are not known yet.   let me look at your examples more closely.
Simply do

SELECT Top 2 Attributes FROM dbo.Othertable

Open in new window


Bye, Olaf.
Avatar of dbaSQL

ASKER

Attached.
top2.txt
Avatar of dbaSQL

ASKER

i think i see the light....  joke.  sort of.  
below is the new @temp declaration and insert.  I believe I finally see where my mistake was.  unfortunately, it is not clear to me yet whether my approach is going to be the solution that I need (tomorrow at 4AM).  

 I say that only because it is still running....  i'm not trying to do anything other than this piece right now.  i suppose i could TOP it to see if it functions properly. Assuming this is the correct construct, I would immediately insert into the OtherTable the the pivot.

The logic we are trying to remedy ran more than 5 hours.  I simply must do better than that.  
what would you advise?


DECLARE @Temp TABLE (
              DataId UNIQUEIDENTIFIER,
              AutoNumber INT,
              ModifyDate DATETIME,
              StartDate DATETIME,
              EndDate DATETIME,
                IsEnabled BIT,
              Title NVARCHAR(256),
              [Target] NVARCHAR(256),
              Summary NVARCHAR(512),
              PropertyID NVARCHAR(MAX),
              PropertyValue NVARCHAR(MAX)
              );

      INSERT @Temp
      SELECT
              DataId,
              AutoNumber,
              ModifyDate,
              StartDate,
              EndDate,
                IsEnabled,
              Title,
              [Target],
              Summary,
              Property.value('./@id', 'NVARCHAR(MAX)') PropertyID,
              Property.value('.', 'NVARCHAR(MAX)') PropertyValue
      FROM
            dbo.OtherTable OUTER APPLY Attributes.nodes('/cpCollection/group/property') a (Property)
      WHERE
            (ModuleId = '3c3cfc3d-4c00-4f0b-bf68-ca2d9f42b352');
Avatar of dbaSQL

ASKER

definitely making headway!  wow.  I cannot believe how I did not see this last night. it is right there in your code samples.  i really am so sorry for the wasted time.  truly.  
i'm almost there.  back  shortly with status.
Execute this as another sample:

DECLARE @Temp TABLE
    (
	 [DataId] [uniqueidentifier] NULL,
	 [ModuleId] [uniqueidentifier] NULL,
	 [BatchId] [uniqueidentifier] NULL,
	 [AutoNumber] [int] NULL,
	 [IntegrationId] [nvarchar](256) NULL,
	 [StartDate] [datetime]  NULL,
	 [EndDate] [datetime] NULL,
	 [Title] [nvarchar](256)  NULL,
	 [Summary] [nvarchar](512) NULL,
	 [IsEnabled] [bit]  NULL,
	 [IsFullTextIndexed] [bit]  NULL,
	 [IsEcommerce] [bit]  NULL,
	 [AllowDataSync] [bit]  NULL,
	 [CreateDate] [datetime]  NULL,
	 [ModifyDate] [datetime] NULL,
	 [Target] [nvarchar](256)  NULL,
     PropertyID NVARCHAR(MAX) ,
     PropertyValue NVARCHAR(MAX)
    );

DECLARE @OtherTable TABLE
    (
	[DataId] [uniqueidentifier] NULL,
	[ModuleId] [uniqueidentifier] NULL,
	[BatchId] [uniqueidentifier] NULL,
	[AutoNumber] [int] NULL,
	[IntegrationId] [nvarchar](256) NULL,
	[StartDate] [datetime] NULL,
	[EndDate] [datetime] NULL,
	[Title] [nvarchar](256) NULL,
	[Summary] [nvarchar](512) NULL,
	[IsEnabled] [bit] NULL,
	[IsFullTextIndexed] [bit] NULL,
	[IsEcommerce] [bit]  NULL,
	[AllowDataSync] [bit] NULL,
	[Attributes] [xml] NULL,
	[CreateDate] [datetime] NULL,
	[ModifyDate] [datetime] NULL,
	[Target] [nvarchar](256) NULL
    );
                                                                    
INSERT  INTO @OtherTable
        (
          Attributes
        )
VALUES  ( N'<cpCollection moduleId="38460957-9aea-4165-8f68-4d7391afe1de" dataId="119108ac-01ab-4126-9bd3-0000acda2844">
  <group id="Serialize" name="Serialize">
    <property id="RelatedProducts">
      <value />
    </property>
    <property id="LegacyData">
      <value />
    </property>
    <property id="Instructor">
      <value />
    </property>
    <property id="Taxonomy">
      <value>bf83a5f7-52d3-47b2-a82f-4f4fd3f184dd</value>
    </property>
    <property id="Keywords">
      <value>test, test2</value>
    </property>
  </group>
</cpCollection>'
        );

Insert Into @Temp
SELECT [DataId],
	   [ModuleId],
	   [BatchId],
	   [AutoNumber],
	   [IntegrationId],
	   [StartDate],
	   [EndDate],
	   [Title],
	   [Summary],
	   [IsEnabled],
	   [IsFullTextIndexed],
	   [IsEcommerce],
	   [AllowDataSync],
	   [CreateDate],
	   [ModifyDate],
	   [Target],
       Property.value('./@id', 'NVARCHAR(MAX)') AS PropertyID ,
       Property.value('.', 'NVARCHAR(MAX)') AS PropertyValue
FROM   @OtherTable
       OUTER APPLY Attributes.nodes('/cpCollection/group/property') A ( Property );

Select * From @Temp

SELECT  P.*
        FROM    @Temp T PIVOT ( MIN(PropertyValue) FOR PropertyID IN (RelatedProducts,LegacyData,Instructor,Taxonomy,Keywords)) P;

Open in new window


I took part of the XML and the IN cluase only has those names coming from that XML.
Ignore all the nulls, that's just because I don't have othertable sample data aside of the XML, but it's sufficient to show how the XML is split into several records and then pivoted to create one row with all the needed columns.

If you still don't see how it works, I can't help you further, also because my time now runs out.

I wish you good luck.

Bye, Olaf.
E.g.

-- Use the correct data types instead of INT:
DECLARE @Temp TABLE
    (
      DataId INT ,
      AutoNumber INT ,
      ModifyDate INT ,
      CreateDate INT ,
      StartDate INT ,
      EndDate INT ,
      IsEnabled INT ,
      Title INT ,
      [Target] INT ,
      Summary INT ,
      PropertyID NVARCHAR(MAX) ,
      PropertyValue NVARCHAR(MAX)
    );

INSERT  INTO @Temp
        SELECT  DataId ,
                AutoNumber ,
                ModifyDate, 
				CreateDate ,
                StartDate ,
                EndDate ,
                IsEnabled ,
                Title ,
                [Target] ,
                Summary ,
                Property.value('./@id', 'NVARCHAR(MAX)') AS PropertyID ,
                Property.value('.', 'NVARCHAR(MAX)') AS PropertyValue
        FROM    dbo.OtherTable
                OUTER APPLY Attributes.nodes('/cpCollection/group/property') A ( Property );

-- This is how to get the result set:
SELECT  P.*
FROM    @Temp T PIVOT ( MIN(PropertyValue) FOR PropertyID IN ( 
						[RelatedAssets],
                        [InsertionTime],
                        [AvailableInKitSetAsst],
                        [KitSetAsst],
                        [NewItemMeetingDate],
                        [PleaseNote],
                        [RelatedProductsTitle],
                        [RelatedContent],
                        [RelatedProducts1],
                        [RelatedProducts2],
                        [RelatedProducts3],
                        [AdditionalInfo],
                        [CopyOK],
                        [CountryOfOrigin],
                        [YesNo1],
                        [YesNo2],
                        [GroupHeader],
                        [GroupingAttribute],
                        [GroupID],
                        [InfoTemplate],
                        [Abrasive_Mesh],
                        [Accuracy],
                        [ArticleNo],
                        [Author_Host],
                        [BailOpening],
                        [BandShankWidth],
                        [BindingStyle],
                        [Brand],
                        [Capacity],
                        [CapInsideDimension],
                        [CaratWeight],
                        [CastTemperature],
                        [ChannelWidth],
                        [Clarity],
                        [ClaspLength],
                        [ClaspType],
                        [COE], [Color],
                        [ColorFamily],
                        [ColorNo],
                        [CupSize],
                        [CureTemperature],
                        [CureTime],
                        [CustomService],
                        [CuttingSurfaceDimension],
                        [Density],
                        [Diameter],
                        [Dimensions],
                        [DrillStyle],
                        [Durometer],
                        [FabricationMethod],
                        [FaceShape],
                        [FaceSize],
                        [Finished],
                        [FitsCordChainSize],
                        [FocalLength],
                        [Form_CastingMetal],
                        [Form_Clay],
                        [Form_Fabrication],
                        [Form_Modeling],
                        [Gauge],
                        [GermanCutNo],
                        [Grade_Bead_Stone_Cab],
                        [Grade_Clay],
                        [HandleLength],
                        [Hardness_Clay],
                        [Hardness_Fabrication],
                        [HeadLength],
                        [HeadShape],
                        [HeadSize],
                        [HeadType],
                        [HeadWeight],
                        [HoleSize],
                        [HotFoilStampingType],
                        [ImpressionPlacement],
                        [InsideDimension],
                        [InteriorDimensions],
                        [JawLength],
                        [JawStyle],
                        [JewelryType],
                        [KaratPurity],
                        [LargeSetName],
                        [LengthPerWeight],
                        [LengthRange],
                        [LensDiameter],
                        [Letter],
                        [LinkGaThickness],
                        [LinkID],
                        [LinkLength],
                        [LiquidusTemperature],
                        [MagnificationPower],
                        [Material_Chain],
                        [Material_General_Kits_Sets],
                        [Material_Bead],
                        [Material_GiftBox_Display],
                        [Material_Modeling_Carving],
                        [Material_Pendant_Bail_Component],
                        [Material_Ring],
                        [Material_Rotary_Cutting_Polishing],
                        [Material_Stone_Cab_Cameo],
                        [Material_Stringing],
                        [MaximumCuttingCapacity],
                        [MaximumLength],
                        [Media],
                        [MeltTemperature],
                        [MetalColor],
                        [MetalColor_Fabrication],
                        [MetalType],
                        [MetalType_Fabrication],
                        [ModelNo],
                        [Modes],
                        [MohsHardness],
                        [MountedStoneSizes],
                        [MountingBezelSizes],
                        [NetWeight],
                        [NoOfStrands],
                        [OutsideDimension],
                        [OverallLength],
                        [PackSize],
                        [PadSize],
                        [PageCount],
                        [PegSize],
                        [PinSize],
                        [PlatformDimensions],
                        [PostSize],
                        [Power],
                        [ProngHeight],
                        [Range],
                        [RefractiveIndex],
                        [RingID],
                        [RingSizeUS],
                        [RunningTime],
                        [SectionDimensions],
                        [SettingType],
                        [ShankSize],
                        [Shape_Bead],
                        [Shape_Disc_Stamping_Tag],
                        [Shape_Fabrication_Metal],
                        [Shape_File],
                        [Shape_General_Kits_Sets],
                        [Shape_Jumpring_Springring_Headpin],
                        [Shape_Pendant_Bail_Component],
                        [Shape_Ring],
                        [Shape_Stone_Cab_Cameo],
                        [ShipWeight],
                        [Shrinkage],
                        [SignetPlateSize],
                        [Size], 
						[SizeNo],
                        [SolidusTemperature],
                        [SpecificGravity],
                        [SpoolPkgQuantity],
                        [StepDimension],
                        [StoneDepth],
                        [StoneShape],
                        [StoneSize],
                        [StoneType_Bead],
                        [StoneType_Stone_Cab_Cameo],
                        [Style_Chain_Necklace_Braclet],
                        [Style_Clasp],
                        [Style_Disc_Stamping_Tag],
                        [Style_Earring_Nut],
                        [Style_General_Kits_Sets],
                        [Style_Giftbox_Display],
                        [Style_Incidentals],
                        [Style_Jumpring_Springring_Headpin],
                        [Style_Pendant_Bail_Component],
                        [Style_Ring],
                        [Style_Setting],
                        [Subject],
                        [SubSetName],
                        [Sustainable],
                        [SwissCut],
                        [TensileStrength],
                        [TestStrength],
                        [TipWidth],
                        [Treatment],
                        [TubeInsideDimension],
                        [TweezerStyle],
                        [Type_Clay_Enamel_Glass],
                        [Type_File],
                        [Type_Giftbox_Display],
                        [Type_Optical],
                        [VulcanizationTemperature],
                        [WallHeight],
                        [WallThickness],
                        [WeightPerLength],
                        [WheelDiameter],
                        [Width],
                        [WidthRange],
                        [KitSetIncludes],
                        [MarketingKeywords],
                        [Keywords], 
						[OK],
                        [OrigPrice],
                        [PhotoOK], 
						[PM],
                        [ProdCatSection],
                        [ProdPageNmbr],
                        [RTFEditor1],
                        [RTFEditor2],
                        [ProductName],
                        [ProductionStage],
                        [ProductNamePrint],
                        [RTFEditor3],
                        [SM],
                        [IntegrationId],
                        [UnitCost],
                        [WebEAV],
                        [CatalogEAV],
                        [CatalogPage],
                        [CatalogDate],
                        [InitOrder],
                        [ProjSalesPerItem],
                        [TtlProjSales],
                        [Has360],
                        [Colorspace],
                        [Dimensionspixels],
                        [FileName],
                        [ICCProfileIdentifier],
                        [ModificationUser],
                        [ModifiedTime],
                        [OriginalFileSize],
                        [YesNo3],
                        [TextBox1],
                        [Resolutionppi],
                        [CheckedOut],
                        [Taxonomy],
                        [QualityMarked]
				) ) P;

Open in new window



Use the result set for your INSERT statement. Just add the necessary CASTs and CASE statements as in your original INSERT.
Avatar of dbaSQL

ASKER

New script is attached.  I did TOP it, just to see it work;  the @Temp was loaded fine, but it failed to insert due to NULL values not allowed in RelatedAssets.  However, that column does have a default:

    ALTER TABLE dbo.Products ADD  DEFAULT (N'') FOR [RelatedAssets]

This is also the column that i mentioned in the very first post -- RelatedAssets.  There is no data in the column, in any of the production records.  it is a blank space that the very wise table designer is inserting with a default constraint.  but that's another story...

anyway, @Temp is being loaded, I've done a select just on that data, and it looks good.  Many of the propertyvalue's are blank, but that is the way they designed this thing.  given the default constraint, I am not sure why the process is failing.  

do you see my flaw?





 This one, for example --

      Attributes.value('(/cpCollection/group/property[@id="RelatedAssets"]/value)[1]', 'NVarChar(MAX)'),

These are the relevant data stats:

select count(*) from Products                                                                    37736
select max(len(relatedassets)) from Products                                           0
select count(*) from Products where relatedAssets IS NOT NULL       37736
select count(*) from Products where relatedAssets = ''                         37736
newScript.sql
Avatar of dbaSQL

ASKER

I changed the TOP to 500, and ran it again.  It loaded @Temp immediatly, but failed with the no NULLs in RelatedAssets error.  I think it's just that the default will only apply if we DON'T insert into it explicitly.  yet we are attempting to do so, because i am trying to optimize this script and building it out properly... maybe that's it.


this is the only reference to RelatedAssets in the raw script that they gave me:
Attributes.value('(/cpCollection/group/property[@id="RelatedAssets"]/value)[1]', 'NVarChar(MAX)'),

i don't know XML manipulation well enough to understand why that, too, is not producing the same error.
Take a look at the XML colum for the failing row. Does the property with ID RelatedAssets exists and has a value?

When not, then this maybe another reason for the long running process, cause a rollback could happen.

When it has a value, which does not match the data type of your Products table RelatedAssets column try to cast it, maybe you even need a CASE to do it properly.

p.s. can you post at least some data of the Attribute column?
Avatar of dbaSQL

ASKER

Is there any way to use a default parm to input the XML column if the propertyvalue is a blank space?
Avatar of dbaSQL

ASKER

>>Take a look at the XML colum for the failing row. Does the property with ID RelatedAssets exists and has a value?
That's what I was saying earlier, ste5an.  RelatedAssets has no data in any of the records.  The default constraint is a blank space.  They are writing it in that way.  I've worked around that by declaring a parm in the logic, and using that in the insert, but this is only possible because again, that column has no data anywhere, and they load it with a blank space via default.

DECLARE @RelatedAssets CHAR(2) = ''

A similar problem occurred with another column - Pleasenote.  This value is populated in many cases, but in just as many, it is a blank space.  I cannot work around it like I did with RelatedAssets, because in  many cases the data is there.  This column also has the default constraint of a blank space -- (N'') -- but I can't work around it like I did RelatedAssets, because that will overwrite all of the valid data.

I hate to even consider this, as I am sure there will be overhead, but can I check the p.Pleasenote and pass in the blank space only when it is NULL ?  

<<p.s. can you post at least some data of the Attribute column?
I did up there in ID: 41512522.  I can post more if you need.
Use ISNULL(columnName, ' ') to get a blank.
Avatar of dbaSQL

ASKER

If not for the blank spaces, I could do this, and coalesce it with the p.Pleasenote

DECLARE @PleaseNote CHAR(2) = ''
Avatar of dbaSQL

ASKER

a lot of them, same thing.  i'm going through it one at a time with the ISNULLs.  back soon.
Avatar of dbaSQL

ASKER

I really need to become much  more versed with XML manipulation.  this is nuts.  
i finally got it functional (see attached), and it only insert three records into the Products table.
I know this system is questionable.  It could be just that, but I don't know the XML back to front enough to know whether that's the case.

I topped it at 500.   Would you say the attached is correct?
maybe.sql
Avatar of dbaSQL

ASKER

I also wonder how you might approach it to minimize logging during the execution?  It's been running about 17 minutes.  The log isn't too bad, but it is growing.  Both the tempdb and the database.
Avatar of dbaSQL

ASKER

@#*$''


(9189814 row(s) affected)
Msg 8152, Level 16, State 10, Line 48
String or binary data would be truncated.
The statement has been terminated.
Avatar of dbaSQL

ASKER

lovely error output.  would be great if it could tell me where the problem is...
I am selecting the whole construct into a new table.  Assuming it completes successfully, I will compare the column lengths and figure out where the problem is.    I welcome other all suggestions.  :-)
Avatar of dbaSQL

ASKER

It ran for 00:28:52 and completed successfully with this output:

(9189814 row(s) affected)
(37736 row(s) affected)

Presumably, that is the 1st insert into @Temp, followed by the final insert into Products.  But - it wasn't Products, it was my test table, so I can try to find the source of the 8152 error.

If I can find that, I feel very, very good about this.  28 minutes vs 5 hours is a godsend.
Avatar of dbaSQL

ASKER

Hopefully one of you is still awake.... I figured out the 8152 error.  All of the columns that we collected from the XML were defined as NVARCHAR(MAX).  This is because we set it up that way here:

              PropertyID NVARCHAR(MAX),
              PropertyValue NVARCHAR(MAX)

In all but 11 cases, this will exceed the column datalength in the table that is targeted for the final dataset.

What is the correct way to address this?  Is there anyway to defined the output column dynamically, based on that which is retrieved from the XML ?
Avatar of dbaSQL

ASKER

ste5an or Olaf, are either of you still available?
Avatar of dbaSQL

ASKER

I left the PropertyID and PropertyValue at NVARCHAR(MAX), but I LTRIM(RTRIM med every one of the columns on the final insert.

It ran this time for 00:20:01, and failed with this:

(9189814 row(s) affected)
Msg 8152, Level 16, State 10, Line 48
String or binary data would be truncated.
The statement has been terminated.

Still working it.  All suggestions are appreciated.  This must be functional before morning.
You're almost there, you just need the final conversions of the Nvarchar values to the dbo.product datatypes now.

You have the needed Code in your original attachment.

This can't be automatically done by neither the pivot step and even less by the step extracting all the. XML properties into the PropertyValue column. A column can't have different types for different rows. And the pivot step putting all the Nvarchar values into separate columns can't know the destination type you need.

Your original Code must have had CASTs or CONVERTs of all the single parsed values. Look back at it.

Bye, Olaf.
Avatar of dbaSQL

ASKER

I am so glad that you are still awake, Olaf.  Thank you!  You say the original code must have had CASTS or CONVERTS, but no.  It doesn't.  I've pasted a few of the values below.  You'll see there are no explicit CASTS/CONVERTS, but at the end of each Attribues.value, the correct datatype and length is there. InsertionTime, for example, is varchar(36), then AvailableKitSetAsst is NVARCHAR(256), NewItemMeetingDate is VARCHAR(36), etc...

How would you handle it, Olaf?  Please.


CASE Attributes.value('(/cpCollection/group/property[@id="InsertionTime"]/value)[1]', 'VarChar(36)') WHEN '' THEN NULL ELSEAttributes.value('(/cpCollection/group/property[@id="InsertionTime"]/value)[1]', 'DateTime') END AS [InsertionTime],

Attributes.value('(/cpCollection/group/property[@id="AvailableInKitSetAsst"]/value)[1]', 'NVarChar(256)') AS [AvailableInKitSetAsst],
 
CASE Attributes.value('(/cpCollection/group/property[@id="NewItemMeetingDate"]/value)[1]', 'VarChar(36)') WHEN '' THEN NULL ELSE Attributes.value('(/cpCollection/group/property[@id= "NewItemMeetingDate"]/value)[1]',  'DateTime') END AS [NewItemMeetingDate],
   
Attributes.value('(/cpCollection/group/property[@id="PleaseNote"]/value)[1]', 'NVarChar(MAX)') AS [PleaseNote],

Attributes.value('(/cpCollection/group/property[@id="RelatedProductsTitle"]/value)[1]', 'NVarChar(256)') AS [RelatedProductsTitle],
Avatar of dbaSQL

ASKER

I suppose I could just take this information, and do the CAST/CONVERT myself.  but not without some decent overhead, i'm guessing.
Overhead? No. Diligence? Yes.

The overhead that took the time was doing all these separate Attributes.value() expressions. You're already past any need of XML knowledge, you now just have all the columns you need, just as nvarchar(max) values and need to convert them as needed.

We've already pointed out this in many posts pf me and ste5an:

41511226 "casting to the right result type."
41511335 "You just need to apply the type cast as Olaf said."
41512352 "...the conversion to whatever end result type has to be done, where NVARCHAR(MAX) does not convert implicitly to the dbo.Products destination table column type."
...
41512562 "Use the result set for your INSERT statement. Just add the necessary CASTs and CASE statements as in your original INSERT."

Bye, Olaf.
Ste5an also gave an example for the cast of a XML datetime string snippet put into p.InsertionTime to the SQL Server datetimetype:

              CASE WHEN P.InsertionTime = '' THEN NULL
                     ELSE CAST(P.InsertionTime AS DATETIME)
                END As InsertionTime

Open in new window


Bye, Olaf.
A final note: You should keep in mind XML is nothing but a string.

nvarchar(max) was choosen as type of the PropertyValue, because it can hold any string part of the xml node in a property element. It may be oversized in many cases, but it's autotrimmed to the value length, many things can automatically convert implicitly without a CAST or CONVERT, eg values being a uniqueidentifier should go over as they are.

PropertyValue can't have the right datatype, which you should know as dba, it is one column and a column has one type - for all rows, you can't change type per row. And the pivot can create the columns from the records, but it can only populate them with what it has in PropertyValue, so all columns are nvarchar in that P result of the pivot.

If you had a xml conversion tool or the Attributes column would already be expanded in the othertable this could be done much more elegant for sure. SQL Server is not the tool for this.

Bye, Olaf.
Avatar of dbaSQL

ASKER

Hoping either of you are still up.   We have success.  The attached completed in 00:16:25.  I've looked at the data in my test table, comparison to to the production data -- it looks good.  

I'm going to wrap it into a procedure and do it once more.  i welcome all suggestions.

16 minutes vs 5 hours is a win.
withCASTS.sql
Caveat: Use NVARCHAR() for the property value column. XML files use in almost every case Unicode. Using VARCHAR as you did will result in loss of characters not matching that charset.

And belief me, sooner or later on of those columns will contain such content.
Avatar of dbaSQL

ASKER

be that the case, the casts i perform later in the construct should be all that's necessary to come back from the NVARCHAR(MAX).  no?
No, cause you already stored the information in @Temp using the wrong data type. Those casts cannot restore lost information.

Just change the type in @Temp to NVARCHAR(MAX).
First of all congratulations.

I only have the same critizism as Ste5an: Why Change from NVARCHAR(MAX)? This type always will be filled with whatever length the XML property element has, not more, not less.

Any errors you had because of truncated data were from the step inserting in dbo.Products, not because of the MAX size, so there is no reason to limit this to 6035, your target fields are often much shorter anyway.

I also think you'll only ever have NULL values in P when a property is absent in some Attributes XML. You can't bee too cautious, when target fields are not nullable.

While we're at that topic: At one step you thought a default value of dbo.Products would be taken instead of NULLs. Well, No. The Default values come in effect, if a column isn't targetted in an INSERT or if you insert the keyword DEFAULT, but not, when you insert a NULL literally or from a source column, which is NULL, therefore you indeed need all the ISNULL or COALESCE expressions. That also has nothing to do with XML or pivoting at all.

Bye, Olaf.
Avatar of dbaSQL

ASKER

Will do, ste5an.  I ran it 6 different times myself since my last post, and they just ran it in production.  It completed in 00:10:22, which is a tremendous gain over the 5+ hours it was running before, and the've confirmed the data is good.  I cannot thank you both enough.  Lot learned on this.  Definitely appreciated.
Avatar of dbaSQL

ASKER

Understood about the default, Olaf.  As I mentioned in this one: ID: 41512589
Regardless, I still can't thank you both enough.  I learned a great deal today.  If you don't mind, I will split the points.
Ste5an deserves most points for the main ideas of the solution.
hmm, where are the good ol' times of beerware?
Avatar of dbaSQL

ASKER

All of your input and your patience is greatly appreciated.  Thank you both for the assist.
Avatar of dbaSQL

ASKER

ste5an, you are correct. I did strip off the tailend of a few values.  I put your suggested correction in, but not until afterwards.  Thinking surely this will be fine.... but no, it wasn't.    I'm running a skinny version of what you both helped me to model the other day, and am trying to dump the truncated and identifiers into a temp table.  it's running forever, in comparison to the final product that I used the other day.  May I show you?  I don't mind opening another ticket, but I am here first, in hopes of getting your attention.
Avatar of dbaSQL

ASKER

Disregard.  I got it.  But thank you anyway.