Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Import XML Spreadsheet to SQL Table

Posted on 2011-02-14
7
Medium Priority
?
2,036 Views
Last Modified: 2012-05-11
I need to import an XML Spreadsheet into a SQL table using components of SSIS.  From what I've been reading, it looks like that might not be an easy task since the schema produced by Excel is quite complicated.  I have certainly not found an easy answer so far!

I've been looking online for solutions to this and it seems like I might need to first use either an XSLT or Microsoft.XMLDOM to transform the XML Spreadsheet to a simpler form of XML or some other file type.  And then use that simpler file as the source for the SSIS import.  

I do see lots of submissions on this general topic but I haven't been able to get any of them to work for me.  I don't have much experience with XSLT.  

I need some help in moving forward with this task and I have the following questions.

1. The Excel file that produced the XML file has 6 columns and some 'header' rows above the columns that are merged together.  I started down the path of merging the XML Source outputs together but I didn't have any luck.  Am I right to look for another direction or do I need to work harder to get that part to work?

2. I have found some XSLT files that do indeed transform the XML Spreadsheet to XML or text, but my biggest problem there is that the outputs don't have either field or line separators -- it just comes out as one long string.  Can you provide help with 'fixing' the XSLT?  The goal would be to convert the XML Spreadsheet to a format that I could use as a source for the SSIS.

3. It looks like in order to use Microsoft.XMLDOM I would also need to have a good XSLT in order for it to write the XML Spreadsheet to another format?

4. Is there another way to accomplish this using the XML Spreadsheet and some component(s) of SSIS?

Simplified XML (more complete file attached)
      <Row>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Name</Data>
        </Cell>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Depth</Data>
        </Cell>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Quantity</Data>
        </Cell>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Number</Data>
        </Cell>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Units</Data>
        </Cell>
        <Cell ss:StyleID="heading">
          <Data ss:Type="String">Version</Data>
        </Cell>
      </Row>
      <Row ss:Height="15">
        <Cell ss:StyleID="s23" ss:Formula="" ss:HRef="">
          <Data ss:Type="String">A1-100.SL</Data>
        </Cell>
        <Cell ss:StyleID="data">
          <Data ss:Type="String">1</Data>
        </Cell>
        <Cell ss:StyleID="data">
          <Data ss:Type="String">1</Data>
        </Cell>
        <Cell ss:StyleID="data">
          <Data ss:Type="String">A1</Data>
        </Cell>
        <Cell ss:StyleID="data">
          <Data ss:Type="String">
          </Data>
        </Cell>
        <Cell ss:StyleID="data">
          <Data ss:Type="String">100</Data>
        </Cell>
      </Row>

XSLT that gives me data with no separators.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="text" encoding="iso-8859-1"/>
   <xsl:param name="separator" select="';'"/>
  <xsl:param name="line-separator" select="'&#13;&#10;'"/>
<xsl:template match="/Table">
    <xsl:variable name="WSName" select="Row/Cell/Data/@ss:Type" />
    <xsl:value-of select="$separator"/>
    <xsl:value-of select="$line-separator"/>
    <Row>
    <xsl:for-each select="Row/Cell">
            <Cell>
                  <Data><xsl:value-of select="@ss:Type"/></Data>
            </Cell>
    </xsl:for-each>
    </Row>
</xsl:template>
</xsl:stylesheet>
 SimpleTest.xml
0
Comment
Question by:sanw2020
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
7 Comments
 
LVL 10

Accepted Solution

by:
Asim Nazir earned 2000 total points
ID: 34894353
0
 
LVL 6

Expert Comment

by:dan_mason
ID: 34895145
When you say an XML spreadsheet generated by Excel, do you mean an xlsx document? In which case you can use OPENROWSET as shown below.

In the code, HDR defines whether you want to import row headers or not. Make sure in the query at the end you include the dollar-sign after the worksheet name.

To use this method you need either to have Office 2007 (or greater) installed on the server, or to install the ACE provider itself, which can be downloaded from:

http://www.microsoft.com/downloads/details.aspx?FamilyID=7554F536-8C28-4598-9B72-EF94E038C891&displaylang=en

SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0','EXCEL 12.0;DATABASE=D:\files\test.xlsx;HDR=NO;IMEX=1','select * from [Sheet1$]')

Open in new window

0
 

Author Comment

by:sanw2020
ID: 34895980
Thanks for your responses.  I'll try the T-SQL method suggested by asimnazir-- I hadn't seen any information on doing it this way.

Dan, my statement of "XML document generated by Excel" was misleading or incorrect.  What I meant was that this is a .XML file, but when you double-click on it, it opens in Excel.  It has the following namespace information.  

<?xml version="1.0" encoding="utf-8"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40">

I cannot guarantee that the sytem where the final solution will be run on will have Office installed.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 6

Expert Comment

by:dan_mason
ID: 34896291
Yes that definitely looks like it started life as an Excel spreadsheet - if you have an .xlsx file available on your computer it's interesting to open it using a ZIP program, as they are actually ZIP files that contain a number of XML documents just like yours, each XML document being a worksheet.

Just to add, if there's a way of ensuring that the system where the solution will run has the ACE provider installed, the OPENROWSET method should be much quicker to get up-and-running; have done these jobs both ways myself and the XML route is quite cumbersome when the schema is so complicated as that - easy to pull in, less easy to manipulate.

0
 

Author Comment

by:sanw2020
ID: 34896545
Dan, the solution I'm working on will be installed at multiple client sites and will probably run on the SQL server where Office is not typically installed.  The XML file is being generated by another system and based on one of the namespace references, it looks like they may be exporting from their system to this HTML 4.01 format.  

I now have the XML file stored in a SQL table as datatype XML.  And I followed the link above to look at parsing the XML data in SQL. The XML file used in the article has nicely named tags so that when you query the data you can refer to the elements by name.  

SELECT tab.col.value(‘./id[1]‘,‘varchar(50)’) AS ‘id’,
tab.col.value(‘./city[1]‘,‘varchar(100)’) AS ‘city’,
tab.col.value(‘./state[1]‘,‘varchar(50)’) AS ’state’,
FROM [XmlImportTest]
CROSS APPLY
xml_data.nodes(‘//Customer’) AS tab(col)

Each data cell in my XML spreadsheet is referred to in the same way --
   <Cell ss:StyleID="data">
          <Data ss:Type="String">1</Data>
   </Cell>

Can you offer any guidance on how I might turn the SELECT query with nicely named tags in to a SELECT query that works with my XML spreadsheet format?  
0
 

Author Comment

by:sanw2020
ID: 34899791
Ok, with the tips you guys have given me, I'm making progress.  I can now load my XML spreadsheet into an XML field in a SQL table.  From there, I can parse the data into a SQL table with columns and rows!  It's getting very close...

 Progress
The last thing I would like to be able to do is get each of the top 5 merged rows into their own columns -- this is essentially 'header' data.  Right now they're being treated like the other data rows.  I've looked at some XQuery samples to see how I might filter out or separately process these headers rows, but I'm not getting the statement to work.  

The type of statement that worked to return all data rows was:
SELECT x.xl.value('(Cell/Data)[1]','VARCHAR(100)') as CompLink
FROM @Data1.nodes('/Workbook/Worksheet/Table/Row') x(xl)

Now I'm trying to filter it with something like this, but it's returning an empty value.  I would guess that means my filter syntax is incorrect.  
SELECT x.xl.value('(Cell[MergeAcross="5"]/Data)[1]','VARCHAR(100)') as BOMInfo
FROM @Data1.nodes('/Workbook/Worksheet/Table/Row') x(xl)

The row I'm trying to filter looks like this:
<Row><Cell ss:StyleID="s23" ss:Formula="" ss:HRef="" ss:MergeAcross="5"><Data ss:Type="String">3908-100</Data></Cell></Row>

Any help would be appreciated.
0
 

Author Comment

by:sanw2020
ID: 34916992
Can anyone offer any help with my XQuery question?
0

Featured Post

Learn Veeam advantages over legacy backup

Every day, more and more legacy backup customers switch to Veeam. Technologies designed for the client-server era cannot restore any IT service running in the hybrid cloud within seconds. Learn top Veeam advantages over legacy backup and get Veeam for the price of your renewal

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
Freeze panes is an option within all variants of Excel to enable parts of a sheet to remain stationary when the cursor is in another part of the sheet. This is a very useful feature which is overlooked or under used.
Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…
This Micro Tutorial will demonstrate in Google Sheets how to use the HYPERLINK function to create live links inside your spreadsheet.

610 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question