Solved

Handling large XML files (>50MB) in ASP...

Posted on 2004-09-18
10
212 Views
Last Modified: 2010-05-18
Hi all,

I have a problem that are beginning to annoy me beyond reason and besides I am running out of time and start to get desperat :-)

For my website (www.kjaerland.dk/DVD) I download and decode several CSV and XML files daily for insertion into my MySQL database. However some of the XML files I need to get a hold of starts to get very large - in the area of 45MB and above which means that I start having problems loading them.

One of my limitations are that I have my site hosted externally (Microsoft server and MySQL 4.0) so I am not able to alter server settings to fit my needs.

I have this bit of sample code that I use for simple testing:

    set objXML = Server.CreateObject("Msxml2.DomDocument")
    objXML.async = false
    objXML.setProperty "ServerHTTPRequest", true
    objXML.load("http://www. .... music_DK.xml")

    Response.Write "<br><br><strong>" & objXML.parseError.reason & "</strong><br><br>"

    Set NodeList = objXML.selectNodes("/cdon_products/countries/country/product")
    Response.Write NodeList.length & "<br>"

    set objXML = nothing

It works fine on small XML files but when the size gets large I get an "Not enough storage is available to complete this operation." error and nothing gets loaded. I am not at all an XML expert so I might be overlooking something.

The only processing I need to do on the XML file is to get a few fields per product in the list and then insert it into my MySQL database. Nothing needs to be displayed.

Does anyone have a good idea of how to get around this "memory" limitation and get to those XML data that I so badly need inserted into my database. I have been playing a small bit with loading the XML with async set to true but didn't get close to that working and immediatly got XSL messed into it (and then it started to go beyond my limited knowledge of the XML world)...

Regards,

Thomas Kjaer
0
Comment
Question by:kjaer
  • 4
  • 2
  • 2
10 Comments
 
LVL 15

Expert Comment

by:dualsoul
ID: 12094681
it's starange that you have such errors, 45Mb is not very large for XML processing.

if you only needs get data from XML, you can use SAX API to do that - it will not create DOM tree in-memory and there will be no memory consumption.
0
 
LVL 15

Expert Comment

by:dualsoul
ID: 12094685
refer to MSDN documentation of SAX2 implementation in MSXML.
0
 

Author Comment

by:kjaer
ID: 12096544
SAX sound more right - and a bit more complex - eventhough I have not yet found a usefull example. I did however find this comment:

"However, the MS XML SAX implementation requires the installation and registration of user-supplied COM objects on the web server. This is because the SAX parsing engine uses call-backs into the user-supplied COM objects."

Doesn't that mean I will get into new problems since host my site externally or isn't "installation and registration of user-supplied COM objects" as physical as it sounds?

Anyhow I will chase this a bit more... (and I'll write my webhost to figure out why the memory limit is so low. I get a similar error if I try to get the file using a aspHTTP component)

- Thomas Kjaer
0
 
LVL 21

Accepted Solution

by:
MogalManic earned 250 total points
ID: 12098384
Part of the problem is that using your algorithm your 45MB document might almost double the storage in memory becouse you are loading the data into memory twice.  The XML document is loaded into 'objXML' and then the data is loaded into the NodeList variable.

If you don't want to get into sax, then XSL might also work.  This is a simple XSL that turns a the data into an html table:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:param name='sortColumn'>CONSOLEID</xsl:param>
  <xsl:template match="/">
    <html>
      <head></head>
      <body>
         <table>
                 <tr>
                        <th>Album</th>
                        <th>Artist</th>
                        <th>Record Label</th>
                        <th>Price</th>
                 </tr>
           <xsl:for-each select="/cdon_products/countries/country/product">
                 <tr>
                        <td><xsl:value-of select="album"/></td>
                        <td><xsl:value-of select="artist"/></td>
                        <td><xsl:value-of select="record_label"/></td>
                        <td><xsl:value-of select="price"/></td>
                 </tr>
           </xsl:for-each>
     </body>
    </html>
  </xsl:template>
 
</xsl:stylesheet>
0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 

Author Comment

by:kjaer
ID: 12099399
Point taken MogalManic. However it's the first load that's failing, so before I get that working I can't do anything :-)
0
 
LVL 15

Expert Comment

by:dualsoul
ID: 12099695
Hi MogalManic.
> then XSL might also work.

it's not right, if there are really problem with memory, and 45Mb file proceesing using DOM causes out-of-memory than XSLT won't work either. Beacause all XSLT engine builds in-memory DOM like structure of input XML, so...memory consumption at least the same.


0
 
LVL 15

Assisted Solution

by:dualsoul
dualsoul earned 250 total points
ID: 12099700
>installation and registration of user-supplied COM objects

with SAX approach you need to implement your COM objects which will listen to SAX events, so you need to register them in target system as usuall (it's COM you know :))  - nothing more, just ordinary registration with registry.
0
 
LVL 21

Expert Comment

by:MogalManic
ID: 12118657
You might try XQuery.  This is a XML query language (some of the same functionality as XSLT but not as popular yet).  I don't know how well it works, but it might not load the whole XML document into memory if you are only processing it a little at a time???

http://aspnet.4guysfromrolla.com/articles/071603-1.aspx
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
PHP Curl Multi-exec 13 67
XML namaspace 2 44
Detect file exist or not 3 133
Create XML 5 46
The Problem How to write an Xquery that works like a SQL outer join, providing placeholders for absent data on the outer side?  I give a bit more background at the end. The situation expressed as relational data Let’s work through this.  I’ve …
The Client Need Led Us to RSS I recently had an investment company ask me how they might notify their constituents about their newsworthy publications.  Probably you would think "Facebook" or "Twitter" but this is an interesting client.  Their cons…
Along with being a a promotional video for my three-day Annielytics Dashboard Seminor, this Micro Tutorial is an intro to Google Analytics API data.
Learn how to create flexible layouts using relative units in CSS.  New relative units added in CSS3 include vw(viewports width), vh(viewports height), vmin(minimum of viewports height and width), and vmax (maximum of viewports height and width).

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now