Solved

Handling large XML files (>50MB) in ASP...

Posted on 2004-09-18
10
217 Views
Last Modified: 2010-05-18
Hi all,

I have a problem that are beginning to annoy me beyond reason and besides I am running out of time and start to get desperat :-)

For my website (www.kjaerland.dk/DVD) I download and decode several CSV and XML files daily for insertion into my MySQL database. However some of the XML files I need to get a hold of starts to get very large - in the area of 45MB and above which means that I start having problems loading them.

One of my limitations are that I have my site hosted externally (Microsoft server and MySQL 4.0) so I am not able to alter server settings to fit my needs.

I have this bit of sample code that I use for simple testing:

    set objXML = Server.CreateObject("Msxml2.DomDocument")
    objXML.async = false
    objXML.setProperty "ServerHTTPRequest", true
    objXML.load("http://www. .... music_DK.xml")

    Response.Write "<br><br><strong>" & objXML.parseError.reason & "</strong><br><br>"

    Set NodeList = objXML.selectNodes("/cdon_products/countries/country/product")
    Response.Write NodeList.length & "<br>"

    set objXML = nothing

It works fine on small XML files but when the size gets large I get an "Not enough storage is available to complete this operation." error and nothing gets loaded. I am not at all an XML expert so I might be overlooking something.

The only processing I need to do on the XML file is to get a few fields per product in the list and then insert it into my MySQL database. Nothing needs to be displayed.

Does anyone have a good idea of how to get around this "memory" limitation and get to those XML data that I so badly need inserted into my database. I have been playing a small bit with loading the XML with async set to true but didn't get close to that working and immediatly got XSL messed into it (and then it started to go beyond my limited knowledge of the XML world)...

Regards,

Thomas Kjaer
0
Comment
Question by:kjaer
  • 4
  • 2
  • 2
10 Comments
 
LVL 15

Expert Comment

by:dualsoul
ID: 12094681
it's starange that you have such errors, 45Mb is not very large for XML processing.

if you only needs get data from XML, you can use SAX API to do that - it will not create DOM tree in-memory and there will be no memory consumption.
0
 
LVL 15

Expert Comment

by:dualsoul
ID: 12094685
refer to MSDN documentation of SAX2 implementation in MSXML.
0
 

Author Comment

by:kjaer
ID: 12096544
SAX sound more right - and a bit more complex - eventhough I have not yet found a usefull example. I did however find this comment:

"However, the MS XML SAX implementation requires the installation and registration of user-supplied COM objects on the web server. This is because the SAX parsing engine uses call-backs into the user-supplied COM objects."

Doesn't that mean I will get into new problems since host my site externally or isn't "installation and registration of user-supplied COM objects" as physical as it sounds?

Anyhow I will chase this a bit more... (and I'll write my webhost to figure out why the memory limit is so low. I get a similar error if I try to get the file using a aspHTTP component)

- Thomas Kjaer
0
Space-Age Communications Transitions to DevOps

ViaSat, a global provider of satellite and wireless communications, securely connects businesses, governments, and organizations to the Internet. Learn how ViaSat’s Network Solutions Engineer, drove the transition from a traditional network support to a DevOps-centric model.

 
LVL 21

Accepted Solution

by:
MogalManic earned 250 total points
ID: 12098384
Part of the problem is that using your algorithm your 45MB document might almost double the storage in memory becouse you are loading the data into memory twice.  The XML document is loaded into 'objXML' and then the data is loaded into the NodeList variable.

If you don't want to get into sax, then XSL might also work.  This is a simple XSL that turns a the data into an html table:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:param name='sortColumn'>CONSOLEID</xsl:param>
  <xsl:template match="/">
    <html>
      <head></head>
      <body>
         <table>
                 <tr>
                        <th>Album</th>
                        <th>Artist</th>
                        <th>Record Label</th>
                        <th>Price</th>
                 </tr>
           <xsl:for-each select="/cdon_products/countries/country/product">
                 <tr>
                        <td><xsl:value-of select="album"/></td>
                        <td><xsl:value-of select="artist"/></td>
                        <td><xsl:value-of select="record_label"/></td>
                        <td><xsl:value-of select="price"/></td>
                 </tr>
           </xsl:for-each>
     </body>
    </html>
  </xsl:template>
 
</xsl:stylesheet>
0
 

Author Comment

by:kjaer
ID: 12099399
Point taken MogalManic. However it's the first load that's failing, so before I get that working I can't do anything :-)
0
 
LVL 15

Expert Comment

by:dualsoul
ID: 12099695
Hi MogalManic.
> then XSL might also work.

it's not right, if there are really problem with memory, and 45Mb file proceesing using DOM causes out-of-memory than XSLT won't work either. Beacause all XSLT engine builds in-memory DOM like structure of input XML, so...memory consumption at least the same.


0
 
LVL 15

Assisted Solution

by:dualsoul
dualsoul earned 250 total points
ID: 12099700
>installation and registration of user-supplied COM objects

with SAX approach you need to implement your COM objects which will listen to SAX events, so you need to register them in target system as usuall (it's COM you know :))  - nothing more, just ordinary registration with registry.
0
 
LVL 21

Expert Comment

by:MogalManic
ID: 12118657
You might try XQuery.  This is a XML query language (some of the same functionality as XSLT but not as popular yet).  I don't know how well it works, but it might not load the whole XML document into memory if you are only processing it a little at a time???

http://aspnet.4guysfromrolla.com/articles/071603-1.aspx
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Introduction In my previous article (http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SSIS/A_9150-Loading-XML-Using-SSIS.html) I showed you how the XML Source component can be used to load XML files into a SQL Server database, us…
Browsing the questions asked to the Experts of this forum, you will be amazed to see how many times people are headaching about monster regular expressions (regex) to select that specific part of some HTML or XML file they want to extract. The examp…
This video shows how to quickly and easily add an email signature for all users on Exchange 2016. The resulting signature is applied on a server level by Exchange Online. The email signature template has been downloaded from: www.mail-signatures…
In an interesting question (https://www.experts-exchange.com/questions/29008360/) here at Experts Exchange, a member asked how to split a single image into multiple images. The primary usage for this is to place many photographs on a flatbed scanner…

820 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question