Solved

Handling large XML files (>50MB) in ASP...

Posted on 2004-09-18
10
209 Views
Last Modified: 2010-05-18
Hi all,

I have a problem that are beginning to annoy me beyond reason and besides I am running out of time and start to get desperat :-)

For my website (www.kjaerland.dk/DVD) I download and decode several CSV and XML files daily for insertion into my MySQL database. However some of the XML files I need to get a hold of starts to get very large - in the area of 45MB and above which means that I start having problems loading them.

One of my limitations are that I have my site hosted externally (Microsoft server and MySQL 4.0) so I am not able to alter server settings to fit my needs.

I have this bit of sample code that I use for simple testing:

    set objXML = Server.CreateObject("Msxml2.DomDocument")
    objXML.async = false
    objXML.setProperty "ServerHTTPRequest", true
    objXML.load("http://www. .... music_DK.xml")

    Response.Write "<br><br><strong>" & objXML.parseError.reason & "</strong><br><br>"

    Set NodeList = objXML.selectNodes("/cdon_products/countries/country/product")
    Response.Write NodeList.length & "<br>"

    set objXML = nothing

It works fine on small XML files but when the size gets large I get an "Not enough storage is available to complete this operation." error and nothing gets loaded. I am not at all an XML expert so I might be overlooking something.

The only processing I need to do on the XML file is to get a few fields per product in the list and then insert it into my MySQL database. Nothing needs to be displayed.

Does anyone have a good idea of how to get around this "memory" limitation and get to those XML data that I so badly need inserted into my database. I have been playing a small bit with loading the XML with async set to true but didn't get close to that working and immediatly got XSL messed into it (and then it started to go beyond my limited knowledge of the XML world)...

Regards,

Thomas Kjaer
0
Comment
Question by:kjaer
  • 4
  • 2
  • 2
10 Comments
 
LVL 15

Expert Comment

by:dualsoul
Comment Utility
it's starange that you have such errors, 45Mb is not very large for XML processing.

if you only needs get data from XML, you can use SAX API to do that - it will not create DOM tree in-memory and there will be no memory consumption.
0
 
LVL 15

Expert Comment

by:dualsoul
Comment Utility
refer to MSDN documentation of SAX2 implementation in MSXML.
0
 

Author Comment

by:kjaer
Comment Utility
SAX sound more right - and a bit more complex - eventhough I have not yet found a usefull example. I did however find this comment:

"However, the MS XML SAX implementation requires the installation and registration of user-supplied COM objects on the web server. This is because the SAX parsing engine uses call-backs into the user-supplied COM objects."

Doesn't that mean I will get into new problems since host my site externally or isn't "installation and registration of user-supplied COM objects" as physical as it sounds?

Anyhow I will chase this a bit more... (and I'll write my webhost to figure out why the memory limit is so low. I get a similar error if I try to get the file using a aspHTTP component)

- Thomas Kjaer
0
 
LVL 21

Accepted Solution

by:
MogalManic earned 250 total points
Comment Utility
Part of the problem is that using your algorithm your 45MB document might almost double the storage in memory becouse you are loading the data into memory twice.  The XML document is loaded into 'objXML' and then the data is loaded into the NodeList variable.

If you don't want to get into sax, then XSL might also work.  This is a simple XSL that turns a the data into an html table:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:param name='sortColumn'>CONSOLEID</xsl:param>
  <xsl:template match="/">
    <html>
      <head></head>
      <body>
         <table>
                 <tr>
                        <th>Album</th>
                        <th>Artist</th>
                        <th>Record Label</th>
                        <th>Price</th>
                 </tr>
           <xsl:for-each select="/cdon_products/countries/country/product">
                 <tr>
                        <td><xsl:value-of select="album"/></td>
                        <td><xsl:value-of select="artist"/></td>
                        <td><xsl:value-of select="record_label"/></td>
                        <td><xsl:value-of select="price"/></td>
                 </tr>
           </xsl:for-each>
     </body>
    </html>
  </xsl:template>
 
</xsl:stylesheet>
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:kjaer
Comment Utility
Point taken MogalManic. However it's the first load that's failing, so before I get that working I can't do anything :-)
0
 
LVL 15

Expert Comment

by:dualsoul
Comment Utility
Hi MogalManic.
> then XSL might also work.

it's not right, if there are really problem with memory, and 45Mb file proceesing using DOM causes out-of-memory than XSLT won't work either. Beacause all XSLT engine builds in-memory DOM like structure of input XML, so...memory consumption at least the same.


0
 
LVL 15

Assisted Solution

by:dualsoul
dualsoul earned 250 total points
Comment Utility
>installation and registration of user-supplied COM objects

with SAX approach you need to implement your COM objects which will listen to SAX events, so you need to register them in target system as usuall (it's COM you know :))  - nothing more, just ordinary registration with registry.
0
 
LVL 21

Expert Comment

by:MogalManic
Comment Utility
You might try XQuery.  This is a XML query language (some of the same functionality as XSLT but not as popular yet).  I don't know how well it works, but it might not load the whole XML document into memory if you are only processing it a little at a time???

http://aspnet.4guysfromrolla.com/articles/071603-1.aspx
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

I was working on a PowerPoint add-in the other day and a client asked me "can you implement a feature which processes a chart when it's pasted into a slide from another deck?". It got me wondering how to hook into built-in ribbon events in Office.
Many times as a report developer I've been asked to display normalized data such as three rows with values Jack, Joe, and Bob as a single comma-separated string such as 'Jack, Joe, Bob', and vice versa.  Here's how to do it. 
Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now