Solved

Split XML file up

Posted on 2016-08-24
2
18 Views
Last Modified: 2016-09-12
I have a large XML file Im trying to read into MySQL which works, but when I get to a few thousand records Im getting a fast-cgi error. Apparently according to my host I cannot change this value, so I have to import the XML document in under 60 seconds.

So Im wondering if its possible to split an XML file up into 100 records and then I'll call each 100 record batch independently until the end. My problem is how to easy split the XML file up.

My current idea is to read each line of the XML file until I find the </record> tag then count to 100 and then save them into a file, then carry on for the next 100 </record>. By doing it this way Im thinking by reading the file one line at a time might also reduce the memory usage as some of the XML files are massive.

Can anyone suggest another way of doing this, or is this going to be the best way?
0
Comment
Question by:tonelm54
  • 2
2 Comments
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 500 total points (awarded by participants)
Comment Utility
Yes, you can split an XML file, but whether such a process will work well, or easily, is a data-dependent question.  Please post an example of one of the XML files, or a link to one of the XML files and we can try to show you how to parse the file.  XML files do not really come in "lines" because whitespace (EOL characters, tabs, blanks, etc) outside the tags and data is not part of the standard.  It's often omitted to make the XML document smaller.  A multi-line XML document is easier for humans to read, but we can't depend on that sort of structure when we're writing code to handle the XML.

If your data provider offers the option of JSON, the file will be somewhat smaller.

In any case, processing large files is not something HTTP requests and PHP scripts were made for, so the best solution may lie in the direction of requesting several smaller files, instead of one large file.

If this is a file that comes from one of your own applications, consider building a file chain -- a collection of files with a signal tag that says whether the end-of-file has been reached.  The signal can say "end-of-file" or it can say the URL of the next XML document in the chain.  Then the PHP script can request each file in succession and process them one-at-a-time, until the end of file has been reached.
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
No response to request for test data, but the theory and practice of a solution is explained fully.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Suggested Solutions

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
Deprecated and Headed for the Dustbin By now, you have probably heard that some PHP features, while convenient, can also cause PHP security problems.  This article discusses one of those, called register_globals.  It is a thing you do not want.  …
The viewer will learn how to dynamically set the form action using jQuery.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now