Solved

Import XML Spreadsheet to SQL Table

Posted on 2011-02-14
7
1,854 Views
Last Modified: 2012-05-11
I need to import an XML Spreadsheet into a SQL table using components of SSIS.  From what I've been reading, it looks like that might not be an easy task since the schema produced by Excel is quite complicated.  I have certainly not found an easy answer so far!

I've been looking online for solutions to this and it seems like I might need to first use either an XSLT or Microsoft.XMLDOM to transform the XML Spreadsheet to a simpler form of XML or some other file type.  And then use that simpler file as the source for the SSIS import.  

I do see lots of submissions on this general topic but I haven't been able to get any of them to work for me.  I don't have much experience with XSLT.  

I need some help in moving forward with this task and I have the following questions.

1. The Excel file that produced the XML file has 6 columns and some 'header' rows above the columns that are merged together.  I started down the path of merging the XML Source outputs together but I didn't have any luck.  Am I right to look for another direction or do I need to work harder to get that part to work?

2. I have found some XSLT files that do indeed transform the XML Spreadsheet to XML or text, but my biggest problem there is that the outputs don't have either field or line separators -- it just comes out as one long string.  Can you provide help with 'fixing' the XSLT?  The goal would be to convert the XML Spreadsheet to a format that I could use as a source for the SSIS.

3. It looks like in order to use Microsoft.XMLDOM I would also need to have a good XSLT in order for it to write the XML Spreadsheet to another format?

4. Is there another way to accomplish this using the XML Spreadsheet and some component(s) of SSIS?

Simplified XML (more complete file attached)
      <Row>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Name</Data>
        </Cell>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Depth</Data>
        </Cell>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Quantity</Data>
        </Cell>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Number</Data>
        </Cell>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Units</Data>
        </Cell>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Version</Data>
        </Cell>
      </Row>
      <Row ss:Height="15">
        <Cell ss:StyleID="s23" ss:Formula="" ss:HRef="">
          <Data ss:Type="String">A1-100.SL</Data>
        </Cell>
        <Cell ss:StyleID="data">
          <Data ss:Type="String">1</Data>
        </Cell>
        <Cell ss:StyleID="data">
          <Data ss:Type="String">1</Data>
        </Cell>
        <Cell ss:StyleID="data">
          <Data ss:Type="String">A1</Data>
        </Cell>
        <Cell ss:StyleID="data">
          <Data ss:Type="String">
          </Data>
        </Cell>
        <Cell ss:StyleID="data">
          <Data ss:Type="String">100</Data>
        </Cell>
      </Row>

XSLT that gives me data with no separators.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="text" encoding="iso-8859-1"/>
   <xsl:param name="separator" select="';'"/>
  <xsl:param name="line-separator" select="'&#13;&#10;'"/>
<xsl:template match="/Table">
    <xsl:variable name="WSName" select="Row/Cell/Data/@ss:Type" />
    <xsl:value-of select="$separator"/>
    <xsl:value-of select="$line-separator"/>
    <Row>
    <xsl:for-each select="Row/Cell">
            <Cell>
                  <Data><xsl:value-of select="@ss:Type"/></Data>
            </Cell>
    </xsl:for-each>
    </Row>
</xsl:template>
</xsl:stylesheet>
 SimpleTest.xml
0
Comment
Question by:sanw2020
  • 4
  • 2
7 Comments
 
LVL 10

Accepted Solution

by:
Asim Nazir earned 500 total points
ID: 34894353
0
 
LVL 6

Expert Comment

by:dan_mason
ID: 34895145
When you say an XML spreadsheet generated by Excel, do you mean an xlsx document? In which case you can use OPENROWSET as shown below.

In the code, HDR defines whether you want to import row headers or not. Make sure in the query at the end you include the dollar-sign after the worksheet name.

To use this method you need either to have Office 2007 (or greater) installed on the server, or to install the ACE provider itself, which can be downloaded from:

http://www.microsoft.com/downloads/details.aspx?FamilyID=7554F536-8C28-4598-9B72-EF94E038C891&displaylang=en

SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0','EXCEL 12.0;DATABASE=D:\files\test.xlsx;HDR=NO;IMEX=1','select * from [Sheet1$]')

Open in new window

0
 

Author Comment

by:sanw2020
ID: 34895980
Thanks for your responses.  I'll try the T-SQL method suggested by asimnazir-- I hadn't seen any information on doing it this way.

Dan, my statement of "XML document generated by Excel" was misleading or incorrect.  What I meant was that this is a .XML file, but when you double-click on it, it opens in Excel.  It has the following namespace information.  

<?xml version="1.0" encoding="utf-8"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40">

I cannot guarantee that the sytem where the final solution will be run on will have Office installed.
0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 
LVL 6

Expert Comment

by:dan_mason
ID: 34896291
Yes that definitely looks like it started life as an Excel spreadsheet - if you have an .xlsx file available on your computer it's interesting to open it using a ZIP program, as they are actually ZIP files that contain a number of XML documents just like yours, each XML document being a worksheet.

Just to add, if there's a way of ensuring that the system where the solution will run has the ACE provider installed, the OPENROWSET method should be much quicker to get up-and-running; have done these jobs both ways myself and the XML route is quite cumbersome when the schema is so complicated as that - easy to pull in, less easy to manipulate.

0
 

Author Comment

by:sanw2020
ID: 34896545
Dan, the solution I'm working on will be installed at multiple client sites and will probably run on the SQL server where Office is not typically installed.  The XML file is being generated by another system and based on one of the namespace references, it looks like they may be exporting from their system to this HTML 4.01 format.  

I now have the XML file stored in a SQL table as datatype XML.  And I followed the link above to look at parsing the XML data in SQL. The XML file used in the article has nicely named tags so that when you query the data you can refer to the elements by name.  

SELECT tab.col.value(‘./id[1]‘,‘varchar(50)’) AS ‘id’,
tab.col.value(‘./city[1]‘,‘varchar(100)’) AS ‘city’,
tab.col.value(‘./state[1]‘,‘varchar(50)’) AS ’state’,
FROM [XmlImportTest]
CROSS APPLY
xml_data.nodes(‘//Customer’) AS tab(col)

Each data cell in my XML spreadsheet is referred to in the same way --
   <Cell ss:StyleID="data">
          <Data ss:Type="String">1</Data>
   </Cell>

Can you offer any guidance on how I might turn the SELECT query with nicely named tags in to a SELECT query that works with my XML spreadsheet format?  
0
 

Author Comment

by:sanw2020
ID: 34899791
Ok, with the tips you guys have given me, I'm making progress.  I can now load my XML spreadsheet into an XML field in a SQL table.  From there, I can parse the data into a SQL table with columns and rows!  It's getting very close...

 Progress
The last thing I would like to be able to do is get each of the top 5 merged rows into their own columns -- this is essentially 'header' data.  Right now they're being treated like the other data rows.  I've looked at some XQuery samples to see how I might filter out or separately process these headers rows, but I'm not getting the statement to work.  

The type of statement that worked to return all data rows was:
SELECT x.xl.value('(Cell/Data)[1]','VARCHAR(100)') as CompLink
FROM @Data1.nodes('/Workbook/Worksheet/Table/Row') x(xl)

Now I'm trying to filter it with something like this, but it's returning an empty value.  I would guess that means my filter syntax is incorrect.  
SELECT x.xl.value('(Cell[MergeAcross="5"]/Data)[1]','VARCHAR(100)') as BOMInfo
FROM @Data1.nodes('/Workbook/Worksheet/Table/Row') x(xl)

The row I'm trying to filter looks like this:
<Row><Cell ss:StyleID="s23" ss:Formula="" ss:HRef="" ss:MergeAcross="5"><Data ss:Type="String">3908-100</Data></Cell></Row>

Any help would be appreciated.
0
 

Author Comment

by:sanw2020
ID: 34916992
Can anyone offer any help with my XQuery question?
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article I will describe the Backup & Restore method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
Some code to ensure data integrity when using macros within Excel. Also included code that helps secure your data within an Excel workbook.
Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…
This Micro Tutorial demonstrates in Microsoft Excel how to consolidate your marketing data by creating an interactive charts using form controls. This creates cool drop-downs for viewers of your chart to choose from.

810 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question