Solved

Split XML file up

Posted on 2016-08-24
2
54 Views
Last Modified: 2016-09-12
I have a large XML file Im trying to read into MySQL which works, but when I get to a few thousand records Im getting a fast-cgi error. Apparently according to my host I cannot change this value, so I have to import the XML document in under 60 seconds.

So Im wondering if its possible to split an XML file up into 100 records and then I'll call each 100 record batch independently until the end. My problem is how to easy split the XML file up.

My current idea is to read each line of the XML file until I find the </record> tag then count to 100 and then save them into a file, then carry on for the next 100 </record>. By doing it this way Im thinking by reading the file one line at a time might also reduce the memory usage as some of the XML files are massive.

Can anyone suggest another way of doing this, or is this going to be the best way?
0
Comment
Question by:tonelm54
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
2 Comments
 
LVL 110

Accepted Solution

by:
Ray Paseur earned 500 total points (awarded by participants)
ID: 41769365
Yes, you can split an XML file, but whether such a process will work well, or easily, is a data-dependent question.  Please post an example of one of the XML files, or a link to one of the XML files and we can try to show you how to parse the file.  XML files do not really come in "lines" because whitespace (EOL characters, tabs, blanks, etc) outside the tags and data is not part of the standard.  It's often omitted to make the XML document smaller.  A multi-line XML document is easier for humans to read, but we can't depend on that sort of structure when we're writing code to handle the XML.

If your data provider offers the option of JSON, the file will be somewhat smaller.

In any case, processing large files is not something HTTP requests and PHP scripts were made for, so the best solution may lie in the direction of requesting several smaller files, instead of one large file.

If this is a file that comes from one of your own applications, consider building a file chain -- a collection of files with a signal tag that says whether the end-of-file has been reached.  The signal can say "end-of-file" or it can say the URL of the next XML document in the chain.  Then the PHP script can request each file in succession and process them one-at-a-time, until the end of file has been reached.
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 41793917
No response to request for test data, but the theory and practice of a solution is explained fully.
0

Featured Post

WordPress Tutorial 2: Terminology

An important part of learning any new piece of software is understanding the terminology it uses. Thankfully WordPress uses fairly simple names for everything that make it easy to start using the software.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

623 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question