Split XML file up

I have a large XML file Im trying to read into MySQL which works, but when I get to a few thousand records Im getting a fast-cgi error. Apparently according to my host I cannot change this value, so I have to import the XML document in under 60 seconds.

So Im wondering if its possible to split an XML file up into 100 records and then I'll call each 100 record batch independently until the end. My problem is how to easy split the XML file up.

My current idea is to read each line of the XML file until I find the </record> tag then count to 100 and then save them into a file, then carry on for the next 100 </record>. By doing it this way Im thinking by reading the file one line at a time might also reduce the memory usage as some of the XML files are massive.

Can anyone suggest another way of doing this, or is this going to be the best way?
tonelm54Asked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
Ray PaseurConnect With a Mentor Commented:
Yes, you can split an XML file, but whether such a process will work well, or easily, is a data-dependent question.  Please post an example of one of the XML files, or a link to one of the XML files and we can try to show you how to parse the file.  XML files do not really come in "lines" because whitespace (EOL characters, tabs, blanks, etc) outside the tags and data is not part of the standard.  It's often omitted to make the XML document smaller.  A multi-line XML document is easier for humans to read, but we can't depend on that sort of structure when we're writing code to handle the XML.

If your data provider offers the option of JSON, the file will be somewhat smaller.

In any case, processing large files is not something HTTP requests and PHP scripts were made for, so the best solution may lie in the direction of requesting several smaller files, instead of one large file.

If this is a file that comes from one of your own applications, consider building a file chain -- a collection of files with a signal tag that says whether the end-of-file has been reached.  The signal can say "end-of-file" or it can say the URL of the next XML document in the chain.  Then the PHP script can request each file in succession and process them one-at-a-time, until the end of file has been reached.
0
 
Ray PaseurCommented:
No response to request for test data, but the theory and practice of a solution is explained fully.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.