Solved

Import XML Spreadsheet to SQL Table

Posted on 2011-02-14
7
1,902 Views
Last Modified: 2012-05-11
I need to import an XML Spreadsheet into a SQL table using components of SSIS.  From what I've been reading, it looks like that might not be an easy task since the schema produced by Excel is quite complicated.  I have certainly not found an easy answer so far!

I've been looking online for solutions to this and it seems like I might need to first use either an XSLT or Microsoft.XMLDOM to transform the XML Spreadsheet to a simpler form of XML or some other file type.  And then use that simpler file as the source for the SSIS import.  

I do see lots of submissions on this general topic but I haven't been able to get any of them to work for me.  I don't have much experience with XSLT.  

I need some help in moving forward with this task and I have the following questions.

1. The Excel file that produced the XML file has 6 columns and some 'header' rows above the columns that are merged together.  I started down the path of merging the XML Source outputs together but I didn't have any luck.  Am I right to look for another direction or do I need to work harder to get that part to work?

2. I have found some XSLT files that do indeed transform the XML Spreadsheet to XML or text, but my biggest problem there is that the outputs don't have either field or line separators -- it just comes out as one long string.  Can you provide help with 'fixing' the XSLT?  The goal would be to convert the XML Spreadsheet to a format that I could use as a source for the SSIS.

3. It looks like in order to use Microsoft.XMLDOM I would also need to have a good XSLT in order for it to write the XML Spreadsheet to another format?

4. Is there another way to accomplish this using the XML Spreadsheet and some component(s) of SSIS?

Simplified XML (more complete file attached)
      <Row>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Name</Data>
        </Cell>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Depth</Data>
        </Cell>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Quantity</Data>
        </Cell>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Number</Data>
        </Cell>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Units</Data>
        </Cell>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Version</Data>
        </Cell>
      </Row>
      <Row ss:Height="15">
        <Cell ss:StyleID="s23" ss:Formula="" ss:HRef="">
          <Data ss:Type="String">A1-100.SL</Data>
        </Cell>
        <Cell ss:StyleID="data">
          <Data ss:Type="String">1</Data>
        </Cell>
        <Cell ss:StyleID="data">
          <Data ss:Type="String">1</Data>
        </Cell>
        <Cell ss:StyleID="data">
          <Data ss:Type="String">A1</Data>
        </Cell>
        <Cell ss:StyleID="data">
          <Data ss:Type="String">
          </Data>
        </Cell>
        <Cell ss:StyleID="data">
          <Data ss:Type="String">100</Data>
        </Cell>
      </Row>

XSLT that gives me data with no separators.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="text" encoding="iso-8859-1"/>
   <xsl:param name="separator" select="';'"/>
  <xsl:param name="line-separator" select="'&#13;&#10;'"/>
<xsl:template match="/Table">
    <xsl:variable name="WSName" select="Row/Cell/Data/@ss:Type" />
    <xsl:value-of select="$separator"/>
    <xsl:value-of select="$line-separator"/>
    <Row>
    <xsl:for-each select="Row/Cell">
            <Cell>
                  <Data><xsl:value-of select="@ss:Type"/></Data>
            </Cell>
    </xsl:for-each>
    </Row>
</xsl:template>
</xsl:stylesheet>
 SimpleTest.xml
0
Comment
Question by:sanw2020
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
7 Comments
 
LVL 10

Accepted Solution

by:
Asim Nazir earned 500 total points
ID: 34894353
0
 
LVL 6

Expert Comment

by:dan_mason
ID: 34895145
When you say an XML spreadsheet generated by Excel, do you mean an xlsx document? In which case you can use OPENROWSET as shown below.

In the code, HDR defines whether you want to import row headers or not. Make sure in the query at the end you include the dollar-sign after the worksheet name.

To use this method you need either to have Office 2007 (or greater) installed on the server, or to install the ACE provider itself, which can be downloaded from:

http://www.microsoft.com/downloads/details.aspx?FamilyID=7554F536-8C28-4598-9B72-EF94E038C891&displaylang=en

SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0','EXCEL 12.0;DATABASE=D:\files\test.xlsx;HDR=NO;IMEX=1','select * from [Sheet1$]')

Open in new window

0
 

Author Comment

by:sanw2020
ID: 34895980
Thanks for your responses.  I'll try the T-SQL method suggested by asimnazir-- I hadn't seen any information on doing it this way.

Dan, my statement of "XML document generated by Excel" was misleading or incorrect.  What I meant was that this is a .XML file, but when you double-click on it, it opens in Excel.  It has the following namespace information.  

<?xml version="1.0" encoding="utf-8"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40">

I cannot guarantee that the sytem where the final solution will be run on will have Office installed.
0
NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

 
LVL 6

Expert Comment

by:dan_mason
ID: 34896291
Yes that definitely looks like it started life as an Excel spreadsheet - if you have an .xlsx file available on your computer it's interesting to open it using a ZIP program, as they are actually ZIP files that contain a number of XML documents just like yours, each XML document being a worksheet.

Just to add, if there's a way of ensuring that the system where the solution will run has the ACE provider installed, the OPENROWSET method should be much quicker to get up-and-running; have done these jobs both ways myself and the XML route is quite cumbersome when the schema is so complicated as that - easy to pull in, less easy to manipulate.

0
 

Author Comment

by:sanw2020
ID: 34896545
Dan, the solution I'm working on will be installed at multiple client sites and will probably run on the SQL server where Office is not typically installed.  The XML file is being generated by another system and based on one of the namespace references, it looks like they may be exporting from their system to this HTML 4.01 format.  

I now have the XML file stored in a SQL table as datatype XML.  And I followed the link above to look at parsing the XML data in SQL. The XML file used in the article has nicely named tags so that when you query the data you can refer to the elements by name.  

SELECT tab.col.value(‘./id[1]‘,‘varchar(50)’) AS ‘id’,
tab.col.value(‘./city[1]‘,‘varchar(100)’) AS ‘city’,
tab.col.value(‘./state[1]‘,‘varchar(50)’) AS ’state’,
FROM [XmlImportTest]
CROSS APPLY
xml_data.nodes(‘//Customer’) AS tab(col)

Each data cell in my XML spreadsheet is referred to in the same way --
   <Cell ss:StyleID="data">
          <Data ss:Type="String">1</Data>
   </Cell>

Can you offer any guidance on how I might turn the SELECT query with nicely named tags in to a SELECT query that works with my XML spreadsheet format?  
0
 

Author Comment

by:sanw2020
ID: 34899791
Ok, with the tips you guys have given me, I'm making progress.  I can now load my XML spreadsheet into an XML field in a SQL table.  From there, I can parse the data into a SQL table with columns and rows!  It's getting very close...

 Progress
The last thing I would like to be able to do is get each of the top 5 merged rows into their own columns -- this is essentially 'header' data.  Right now they're being treated like the other data rows.  I've looked at some XQuery samples to see how I might filter out or separately process these headers rows, but I'm not getting the statement to work.  

The type of statement that worked to return all data rows was:
SELECT x.xl.value('(Cell/Data)[1]','VARCHAR(100)') as CompLink
FROM @Data1.nodes('/Workbook/Worksheet/Table/Row') x(xl)

Now I'm trying to filter it with something like this, but it's returning an empty value.  I would guess that means my filter syntax is incorrect.  
SELECT x.xl.value('(Cell[MergeAcross="5"]/Data)[1]','VARCHAR(100)') as BOMInfo
FROM @Data1.nodes('/Workbook/Worksheet/Table/Row') x(xl)

The row I'm trying to filter looks like this:
<Row><Cell ss:StyleID="s23" ss:Formula="" ss:HRef="" ss:MergeAcross="5"><Data ss:Type="String">3908-100</Data></Cell></Row>

Any help would be appreciated.
0
 

Author Comment

by:sanw2020
ID: 34916992
Can anyone offer any help with my XQuery question?
0

Featured Post

Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When you see single cell contains number and text, and you have to get any date out of it seems like cracking our heads.
How to get Spreadsheet Compare 2016 working with the 64 bit version of Office 2016
The viewer will learn how to use the =DISCRINV command to create a discrete random variable, use this command to model a set of probabilities and outcomes in a Monte Carlo simulation, and learn how to find the standard deviation of a set of probabil…
This Micro Tutorial will demonstrate how to create pivot charts out of a data set. I also added a drop-down menu which allows to choose from different categories in the data set and the chart will automatically update.

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question