Loading Complex XML Using SSIS

AID: 9158
  • Status: Published

7070 points

  • ByValentinoV
  • TypeTutorial
  • Posted on2012-01-10 at 13:16:39
Awards
  • Experts Exchange Approved

Introduction


In my previous article I showed you how the XML Source component can be used to load XML files into a SQL Server database, using fairly simple XML structures.  In this follow-up article I will demonstrate how to tackle the complex XML issue.

The Complex XML Example


You probably know that SSRS reports, RDLs, are actually XML files.  And they’re not the easiest types of XML files around.  To humans they are still readable but the structure can be quite complex.  So there we’ve got our example: an RDL.  More specifically I’ll be using the RDL that’s available for download in one of my earlier articles.

The Goal


Every good example has got a goal.  Our goal today is to retrieve a list of datasets and fields as defined in the RDL.  Shouldn’t be too difficult, right?

Using The XML Source Component


Let’s try to get this done through the XML Source component with which we’re very familiar by now.  You know the drill: drag an XML Source into your Data Flow, open it up and configure the XML and XSD locations.

Note: to be able to do this I cheated a bit by manually manipulating the RDL a little.  More precisely I removed all the namespace references from the <report> tag and further down the XML (removed “rd:”).

With both files configured, let’s have a look at the Columns page:

 

image01.png
  • 84 KB
  • The XML Source component handling a really complex XML file
The XML Source component handling a really complex XML file



Look at that massive list of output flows!  In total I’ve gotten 45 of them, all for free!  Even if you’re up to the task of creating 45 output tables, do you really want to find out how to get these joined together?  To prevent creating that bunch of tables you may consider using the Merge Join component… 45 times in your data flow. Didn’t think so!

Sure, it would run fine if you manage to get it all constructed.  But in my opinion this is just too silly to try out because there’s an interesting alternative.

And that alternative is XSLT – eXtensible Stylesheet Language Transformations.

Using XSLT


With XSLT you describe what you want to retrieve from the XML document and what it should look like.  In this example we’ll be retrieving the list of datasets and their fields, in CSV format.  CSV stands for Comma-Separated Values, although I prefer the term “Character-Separated Values” as the separator is not always a comma.

To be able to write correct XSLT, you need to know what the XML structure looks like.  Here are the first 31 lines of the sample RDL file mentioned earlier.

<?xml version="1.0" encoding="utf-8"?>
<Report>
  <AutoRefresh>0</AutoRefresh>
  <InitialPageName>A Very Unique Name</InitialPageName>
  <DataSources>
    <DataSource Name="srcContosoDW">
      <DataSourceReference>ContosoDW</DataSourceReference>
      <SecurityType>None</SecurityType>
      <DataSourceID>b7a3d32c-e95d-4acf-bb99-9d60755303ea</DataSourceID>
    </DataSource>
  </DataSources>
  <DataSets>
    <DataSet Name="dsProductList">
      <Query>
        <DataSourceName>srcContosoDW</DataSourceName>
        <CommandText>select DPC.ProductCategoryName, DPS.ProductSubcategoryName, DP.ProductName
from dbo.DimProduct DP
inner join dbo.DimProductSubcategory DPS
    on DPS.ProductSubcategoryKey = DP.ProductSubcategoryKey
inner join dbo.DimProductCategory DPC
    on DPC.ProductCategoryKey = DPS.ProductCategoryKey;</CommandText>
      </Query>
      <Fields>
        <Field Name="ProductCategoryName">
          <DataField>ProductCategoryName</DataField>
          <TypeName>System.String</TypeName>
        </Field>
        <Field Name="ProductSubcategoryName">
          <DataField>ProductSubcategoryName</DataField>
          <TypeName>System.String</TypeName>
        </Field>
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:

Select allOpen in new window



As you can see, the main node is called Report.  Nested under Report we’ve got DataSets, which can have several DataSet elements.  Each DataSet has a set of Fields with one or more Field elements.  Using that information we come to the following XSLT.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions">
  <xsl:output method="text" version="1.0" encoding="UTF-8" indent="no"/>
  <xsl:template match="/">
    <xsl:text>DataSource;DataSet;Field</xsl:text>
    <xsl:text>&#13;&#10;</xsl:text>

    <xsl:for-each select="Report/DataSets/DataSet/Fields/Field">
      <xsl:text>"</xsl:text>
      <xsl:value-of select="../../Query/DataSourceName"/>
      <xsl:text>";"</xsl:text>
      <xsl:value-of select="../../@Name"/>
      <xsl:text>";"</xsl:text>
      <xsl:value-of select="@Name"/>
      <xsl:text>"</xsl:text>
      <xsl:text>&#13;&#10;</xsl:text>
    </xsl:for-each>

  </xsl:template>
</xsl:stylesheet>
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:

Select allOpen in new window



So, what is the XSLT describing?  On line three, we say that the output should be text in UTF-8 encoding.  The “template match” on the fourth line takes the whole XML document into consideration, hence the forward slash.  Then on line five we start writing output through the xsl:text tag.  This is our header line.  As you can see we’re using the semi-colon as column separator in the CSV output.  Line six adds a CRLF (carriage-return + line feed) to the output.

Then the fun part starts.  If you have experience with XPath, the way XSLT walks through the XML document should look familiar to you.

The xsl:for-each tag loops over all the Fields in all the DataSets in the document.

Using the xsl:value-of tag, we can fetch values out of the XML.  The first value being retrieved is the name of the data source that dataset is using.  (I’ve added the retrieval of the data source to demonstrate how element values are retrieved.)  The path to the DataSourceName element is Report/DataSets/DataSet/Query/ so we use the double-dot syntax to navigate two levels up in the XML tree.  The value of the element itself is retrieved by just using its name, as demonstrated in the XSLT above.

The next value-of tag retrieves the Name attribute of the DataSet, hence the two levels up, and the final value-of fetches the Name attribute of the Field element.

Now that the XSLT is clear for everyone, how do we apply it to our XML document?  Here comes the time for SSIS once more!

Open up the BIDS with the Control Flow of an SSIS package active and throw in an XML Task component.

 

image02.png
  • 41 KB
  • The XML Task, one of the Control Flow Items in Integration Services
The XML Task, one of the Control Flow Items in Integration Services



Double-click the component to open up the XML Task Editor.  This is what it looks like by default:

 

image03.png
  • 57 KB
  • XML Task Editor: default settings
XML Task Editor: default settings



As this is an all-round XML task that can handle several XML-related tasks, the first setting that we need to modify is called OperationType.  That’s not too complicated because it comes with a dropdown and XSLT is one of the possible values.

 

image04.png
  • 8 KB
  • The different operation types supported by the XML Task
The different operation types supported by the XML Task



With XSLT selected, the editor transforms into the following:

 

image05.png
  • 51 KB
  • The XML Task Editor with XSLT as OperationType
The XML Task Editor with XSLT as OperationType



Now we need to configure where the task can find our XML file, through the Source property.  Click the Source textbox to make the dropdown appear and select <New File connection…>.

 

image06.png
  • 5 KB
  • You can create a new File Connection through the XML Task Editor
You can create a new File Connection through the XML Task Editor



In the File Connection Manager Editor, leave the Usage type at Existing file and select the RDL.

Next up we’re going to specify where the task can find the XSLT that needs to be applied to the XML.  That can be done through the Second Operand settings.  As SecondOperandType, select File Connection.  Use the dropdown of the SecondOperand property to create a second new file connection that points to your XSLT file.

With that set up as well, only one step remains.  The task still doesn’t know where the output should be saved.  Or that it actually should get saved.  So first switch the SaveOperationResult property to True.  As you can see, DestinationType is already set to File Connection, that’s what we need.  Use the dropdown of the Destination property to create a third new file connection.  This time however, Usage Type should be set to Create File.  Specify path and filename for the output file and click OK to close the File Connection Manager Editor.

This is what our XML Task now looks like in the editor:

 

image07.png
  • 55 KB
  • The XML Task Editor with all input and output files specified, as expected for our XSLT experiment
The XML Task Editor with all input and output files specified, as expected for our XSLT experiment



As shown above, I’ve called the output file DatasetInfo.csv.

One more property that can be interesting is the OverwriteDestination property.  Setting it to True can ease the testing of your package if you need to execute it multiple times, which you’ll probably want when your XSLT is not giving the expected output.  Don’t forget to set it to False afterwards (depending on what behavior you actually expect from your package).

Okay, now close the XML Task Editor and execute the package.  If you haven’t made any mistakes, the task should color green and you should have an extra file on your hard drive somewhere.  Here’s what the content of my DatasetInfo.csv looks like:


DataSource;DataSet;Field
"srcContosoDW";"dsProductList";"ProductCategoryName"
"srcContosoDW";"dsProductList";"ProductSubcategoryName"
"srcContosoDW";"dsProductList";"ProductName"
"srcContosoDW";"dsProductList";"ProductCategoryColor"
"srcContosoDW";"dsProductList";"EasterEgg"


Look at that, a list of fields, all part of the dsProductList dataset.


“Hang on, wasn’t this article going to demonstrate how to get complex XML files imported into our database?  And now you’re writing the data to a file?!”


Well yeah, you’re right.  Unfortunately the XML Task does not offer the possibility to write to a table in a database.  So to get the data imported into your database you’ll need to set up a Data Flow that imports the CSV files.  But that shouldn’t be too difficult to achieve, right?

Mission accomplished!

Conclusion


With this article I have shown how Integration Services can be used to retrieve data out of complex XML files, without actually using the XML Source component.  I hope you’ve enjoyed reading it as much as I had while writing.  Or maybe you know another interesting method to get complex XML imported.  Feel free to post comments!

Have fun!

PS: if you liked this article, may I kindly ask you to click that little Yes-button?  Thanks!

Valentino.

References
XSLT (Wikipedia)
CSV (Wikipedia)
XML Task (MDSN)

Originally appeared at my blog: http://blog.hoegaerden.be/2011/04/20/loading-complex-xml-using-ssis

    Asked On
    2012-01-10 at 13:16:39ID9158
    Tags

    SSIS

    ,

    SQL Server Integration Services

    ,

    Integration Services

    ,

    XML

    ,

    XSLT

    Topic

    SSIS

    Views
    2140

    Comments

    Expert Comment

    by: x-men on 2012-01-16 at 04:18:33ID: 34558

    nice explanation, I've been "playing" with XML imports to SQL databases for some time, and ended up "dissecting" the XML files with PowerShell, and then inserting the rows to SQL.

    The "we come to the following XSLT" part is my Achilles' heel, even with the help of Visual Studio ability to generate schemas…

    Maybe your next tutorial will be on generating XSLT for complex XML. (the help I've found, until now, is based on very simple XML structures)

    Author Comment

    by: ValentinoV on 2012-01-16 at 05:47:37ID: 34560

    Hi x-men,

    Would you mind explaining what you mean with "generating XSLT"?  You're not confusing it with an XSD, right?
    Or could you give an example of what you'd like to achieve?

    Best regards,
    Valentino.

    Expert Comment

    by: x-men on 2012-01-16 at 06:48:10ID: 34564

    I believe that the XSL is a template with data configuration, so that the engine will know how to deliver / arrange the XML data.

    The case I have in mind is about a log, dumped by some application, that is similar to a XML file. Maybe the fact that is similar and not a actual XML, is the reason for my troubles, however, I'd like to know more about how to create these templates.

    Author Comment

    by: ValentinoV on 2012-01-16 at 23:24:12ID: 34575

    So what you're looking for is more like a "getting started with XSLT", or am I wrong?

    Here's an interesting site that may help you further: http://www.w3schools.com/xsl/default.asp

    I'll see what I can do about writing something up myself as well, in the context of getting data out of XML for easier import into a database.

    Add your Comment

    Please Sign up or Log in to comment on this article.

    Loading Advertisement...

    Top SSIS_SSAS Experts

    1. huslayer

      21,710

      10 points yesterday

      Profile
      Rank: Sage
    2. ValentinoV

      14,230

      20 points yesterday

      Profile
      Rank: Sage
    3. TimHumphries

      7,400

      0 points yesterday

      Profile
      Rank: Wizard
    4. jimhorn

      4,100

      0 points yesterday

      Profile
      Rank: Genius
    5. 8080_Diver

      4,100

      0 points yesterday

      Profile
      Rank: Genius
    6. BCUNNEY

      4,000

      0 points yesterday

      Profile
      Rank: Guru
    7. vdr1620

      3,800

      0 points yesterday

      Profile
      Rank: Wizard
    8. TempDBA

      3,800

      0 points yesterday

      Profile
      Rank: Sage
    9. PedroCGD

      3,010

      0 points yesterday

      Profile
      Rank: Sage
    10. planocz

      2,800

      0 points yesterday

      Profile
      Rank: Genius
    11. jogos

      2,800

      0 points yesterday

      Profile
      Rank: Sage
    12. itcouple

      2,800

      0 points yesterday

      Profile
      Rank: Guru
    13. reza_rad

      2,470

      30 points yesterday

      Profile
      Rank: Genius
    14. PatelAlpesh

      2,110

      0 points yesterday

      Profile
      Rank: Wizard
    15. mwvisa1

      2,000

      0 points yesterday

      Profile
      Rank: Genius
    16. lundnak

      2,000

      0 points yesterday

      Profile
      Rank: Master
    17. Crashman

      2,000

      0 points yesterday

      Profile
      Rank: Master
    18. mjfagan

      2,000

      0 points yesterday

      Profile
      Rank: Master
    19. EugeneZ

      1,800

      0 points yesterday

      Profile
      Rank: Genius
    20. fomand

      1,400

      0 points yesterday

      Profile
      Rank: Master
    21. trofimoval

      1,002

      0 points yesterday

      Profile
      Rank: Guru
    22. acperkins

      1,000

      0 points yesterday

      Profile
      Rank: Genius
    23. ScottPletcher

      1,000

      0 points yesterday

      Profile
      Rank: Genius
    24. MohammedU

      600

      0 points yesterday

      Profile
      Rank: Wizard
    25. p_nuts

      498

      0 points yesterday

      Profile
      Rank: Wizard

    Hall Of Fame