Solved

Import XML Spreadsheet to SQL Table

Posted on 2011-02-14
7
1,801 Views
Last Modified: 2012-05-11
I need to import an XML Spreadsheet into a SQL table using components of SSIS.  From what I've been reading, it looks like that might not be an easy task since the schema produced by Excel is quite complicated.  I have certainly not found an easy answer so far!

I've been looking online for solutions to this and it seems like I might need to first use either an XSLT or Microsoft.XMLDOM to transform the XML Spreadsheet to a simpler form of XML or some other file type.  And then use that simpler file as the source for the SSIS import.  

I do see lots of submissions on this general topic but I haven't been able to get any of them to work for me.  I don't have much experience with XSLT.  

I need some help in moving forward with this task and I have the following questions.

1. The Excel file that produced the XML file has 6 columns and some 'header' rows above the columns that are merged together.  I started down the path of merging the XML Source outputs together but I didn't have any luck.  Am I right to look for another direction or do I need to work harder to get that part to work?

2. I have found some XSLT files that do indeed transform the XML Spreadsheet to XML or text, but my biggest problem there is that the outputs don't have either field or line separators -- it just comes out as one long string.  Can you provide help with 'fixing' the XSLT?  The goal would be to convert the XML Spreadsheet to a format that I could use as a source for the SSIS.

3. It looks like in order to use Microsoft.XMLDOM I would also need to have a good XSLT in order for it to write the XML Spreadsheet to another format?

4. Is there another way to accomplish this using the XML Spreadsheet and some component(s) of SSIS?

Simplified XML (more complete file attached)
      <Row>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Name</Data>
        </Cell>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Depth</Data>
        </Cell>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Quantity</Data>
        </Cell>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Number</Data>
        </Cell>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Units</Data>
        </Cell>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Version</Data>
        </Cell>
      </Row>
      <Row ss:Height="15">
        <Cell ss:StyleID="s23" ss:Formula="" ss:HRef="">
          <Data ss:Type="String">A1-100.SL</Data>
        </Cell>
        <Cell ss:StyleID="data">
          <Data ss:Type="String">1</Data>
        </Cell>
        <Cell ss:StyleID="data">
          <Data ss:Type="String">1</Data>
        </Cell>
        <Cell ss:StyleID="data">
          <Data ss:Type="String">A1</Data>
        </Cell>
        <Cell ss:StyleID="data">
          <Data ss:Type="String">
          </Data>
        </Cell>
        <Cell ss:StyleID="data">
          <Data ss:Type="String">100</Data>
        </Cell>
      </Row>

XSLT that gives me data with no separators.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="text" encoding="iso-8859-1"/>
   <xsl:param name="separator" select="';'"/>
  <xsl:param name="line-separator" select="'&#13;&#10;'"/>
<xsl:template match="/Table">
    <xsl:variable name="WSName" select="Row/Cell/Data/@ss:Type" />
    <xsl:value-of select="$separator"/>
    <xsl:value-of select="$line-separator"/>
    <Row>
    <xsl:for-each select="Row/Cell">
            <Cell>
                  <Data><xsl:value-of select="@ss:Type"/></Data>
            </Cell>
    </xsl:for-each>
    </Row>
</xsl:template>
</xsl:stylesheet>
 SimpleTest.xml
0
Comment
Question by:sanw2020
  • 4
  • 2
7 Comments
 
LVL 10

Accepted Solution

by:
Asim Nazir earned 500 total points
ID: 34894353
0
 
LVL 6

Expert Comment

by:dan_mason
ID: 34895145
When you say an XML spreadsheet generated by Excel, do you mean an xlsx document? In which case you can use OPENROWSET as shown below.

In the code, HDR defines whether you want to import row headers or not. Make sure in the query at the end you include the dollar-sign after the worksheet name.

To use this method you need either to have Office 2007 (or greater) installed on the server, or to install the ACE provider itself, which can be downloaded from:

http://www.microsoft.com/downloads/details.aspx?FamilyID=7554F536-8C28-4598-9B72-EF94E038C891&displaylang=en

SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0','EXCEL 12.0;DATABASE=D:\files\test.xlsx;HDR=NO;IMEX=1','select * from [Sheet1$]')

Open in new window

0
 

Author Comment

by:sanw2020
ID: 34895980
Thanks for your responses.  I'll try the T-SQL method suggested by asimnazir-- I hadn't seen any information on doing it this way.

Dan, my statement of "XML document generated by Excel" was misleading or incorrect.  What I meant was that this is a .XML file, but when you double-click on it, it opens in Excel.  It has the following namespace information.  

<?xml version="1.0" encoding="utf-8"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40">

I cannot guarantee that the sytem where the final solution will be run on will have Office installed.
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 6

Expert Comment

by:dan_mason
ID: 34896291
Yes that definitely looks like it started life as an Excel spreadsheet - if you have an .xlsx file available on your computer it's interesting to open it using a ZIP program, as they are actually ZIP files that contain a number of XML documents just like yours, each XML document being a worksheet.

Just to add, if there's a way of ensuring that the system where the solution will run has the ACE provider installed, the OPENROWSET method should be much quicker to get up-and-running; have done these jobs both ways myself and the XML route is quite cumbersome when the schema is so complicated as that - easy to pull in, less easy to manipulate.

0
 

Author Comment

by:sanw2020
ID: 34896545
Dan, the solution I'm working on will be installed at multiple client sites and will probably run on the SQL server where Office is not typically installed.  The XML file is being generated by another system and based on one of the namespace references, it looks like they may be exporting from their system to this HTML 4.01 format.  

I now have the XML file stored in a SQL table as datatype XML.  And I followed the link above to look at parsing the XML data in SQL. The XML file used in the article has nicely named tags so that when you query the data you can refer to the elements by name.  

SELECT tab.col.value(‘./id[1]‘,‘varchar(50)’) AS ‘id’,
tab.col.value(‘./city[1]‘,‘varchar(100)’) AS ‘city’,
tab.col.value(‘./state[1]‘,‘varchar(50)’) AS ’state’,
FROM [XmlImportTest]
CROSS APPLY
xml_data.nodes(‘//Customer’) AS tab(col)

Each data cell in my XML spreadsheet is referred to in the same way --
   <Cell ss:StyleID="data">
          <Data ss:Type="String">1</Data>
   </Cell>

Can you offer any guidance on how I might turn the SELECT query with nicely named tags in to a SELECT query that works with my XML spreadsheet format?  
0
 

Author Comment

by:sanw2020
ID: 34899791
Ok, with the tips you guys have given me, I'm making progress.  I can now load my XML spreadsheet into an XML field in a SQL table.  From there, I can parse the data into a SQL table with columns and rows!  It's getting very close...

 Progress
The last thing I would like to be able to do is get each of the top 5 merged rows into their own columns -- this is essentially 'header' data.  Right now they're being treated like the other data rows.  I've looked at some XQuery samples to see how I might filter out or separately process these headers rows, but I'm not getting the statement to work.  

The type of statement that worked to return all data rows was:
SELECT x.xl.value('(Cell/Data)[1]','VARCHAR(100)') as CompLink
FROM @Data1.nodes('/Workbook/Worksheet/Table/Row') x(xl)

Now I'm trying to filter it with something like this, but it's returning an empty value.  I would guess that means my filter syntax is incorrect.  
SELECT x.xl.value('(Cell[MergeAcross="5"]/Data)[1]','VARCHAR(100)') as BOMInfo
FROM @Data1.nodes('/Workbook/Worksheet/Table/Row') x(xl)

The row I'm trying to filter looks like this:
<Row><Cell ss:StyleID="s23" ss:Formula="" ss:HRef="" ss:MergeAcross="5"><Data ss:Type="String">3908-100</Data></Cell></Row>

Any help would be appreciated.
0
 

Author Comment

by:sanw2020
ID: 34916992
Can anyone offer any help with my XQuery question?
0

Featured Post

Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

Join & Write a Comment

Introduction This Article briefly covers methods of calculating the NPV and IRR variants in Excel as well as the limitations in calculating and interpreting IRR results. Paraphrasing Richard Shockley, author of my favourite finance reference tex…
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
The viewer will learn how to simulate a series of coin tosses with the rand() function and learn how to make these “tosses” depend on a predetermined probability. Flipping Coins in Excel: Enter =RAND() into cell A2: Recalculate the random variable…
This Micro Tutorial will demonstrate the scrolling table in Microsoft Excel using the INDEX function.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now