Link to home
Start Free TrialLog in
Avatar of Kim Walker
Kim WalkerFlag for United States of America

asked on

Problem reading .csv file from zip archive using PHP

I'm having a problem reading the contents of a .csv file from a zip archive in PHP. The zip archive contains several .csv files and is approximately 5K in size. My script works when looping through echoing the file names, but as soon as I add the line to read the contents of a file, the page hangs up and eventually results in a "Network Error (tcp_error)" "Operation timed out."

Here is my code, it's pretty basic and practically copied from the documentation.
if (is_file($rpt) ) {
	if ($zip = zip_open($rpt) ) {
		while ($zip_entry = zip_read($zip) ) {
			if (zip_entry_name($zip_entry) == 'config.csv') {
				if (zip_entry_open($zip,$zip_entry) ) {
					if ($buff =  zip_entry_read($zip_entry) ) {
						echo $buff;
			} else {
				echo zip_entry_name($zip_entry)."\n";

Open in new window

Here are the contents of the zip file I'm trying to read:

Open in new window

Avatar of rinfo

Link to home
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Please post a link to your test data, and I'll see if I can give you a tested and working code sample.  Thanks, ~Ray
Avatar of Kim Walker


rinfo, I copied your solution, removed lines 1, 2, 24-27, and added a closing parenthesis to line 3, and inserted in place of the 16 lines I posted in my question. It produced the same results.

Thanks to Ray_Paseur, however, I've discovered the problem is in the archive I'm trying to unzip. I redacted the files in the archive and re-uploaded them to the server and both our scripts generated the appropriate output from the redacted files.

The archive is produced by a service provider and uploaded automatically by them every thirty minutes. I need to write a script to access those files and append the contents to a database that is used to generated a dynamic report. The data is incremental so if I miss one archive, the dynamic report is inaccurate.

Can you suggest a reason that PHP might not be able to unzip the archive but I can unzip it on my local computer? I can even decompress the archive and re-compress it without modification on my local computer, upload it to the server and execute the script without error. But of course, I can't do that every thirty minutes.

Ray_Paseur, is there a way to upload the original archive and delete it after you've looked at it? It contains personally identifiable information, so I'm reluctant to upload it permanently.
I am pretty sure I can delete the file for you, but I hope you can give us test data instead of live data -- well-crafted test data is a requirement for successful programming.
This is data from test subjects all of whom are adults and none of the data is confidential. But I would appreciate it if we can delete it when the question is closed.
It appears that I can't upload the file to EE. I gave up when it hadn't finished after several minutes. So I've posted it where I can delete myself when the time comes. I should have thought of that earlier.
Link to home
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks, Ray_Paseur. Miraculously, the script is working now -- though it does push the 60-second timeout limit and the re-zipped archive processes almost instantaneously. I'm going to increase the time limit and proceed as is. But I will contact our service provider to see if they have any alternatives to offer. I'll post an update if anything changes in the next couple of days. But I expect to close the question with my own resolution and, since it's a new month, award some points for effort.

Glad you've got a working solution, but I still sense a disconnect here.  You can make PHP scripts survive and run as long as you want with set_time_limit(). I wonder why there is a difference in time between two ZIP archives of essentially the same data.  This sounds like a bug in the ZIP extension!
You're absolutely right, Ray_Paseur. I felt the same way. Now it appears that the server has been serving the last rendered page instead of the error. It seemed that no matter what changes I made to my script, I was getting the exact same results. I just looked at my error logs and this is what I found repeated over and over.
[Sun Sep 01 16:41:41 2013] [warn] [client] mod_fcgid: read data timeout in 45 seconds
[Sun Sep 01 16:41:41 2013] [error] [client] Premature end of script headers: process_report.php

Open in new window

Only after stopping and restarting Apache am I again getting the timeout errors in my browser even after increasing the timeout to 5 minutes.

This would not have been a viable solution anyway when I start to get these reports every 30 minutes for 10-15 different clients.

Now it's time for my service provide to start providing better service!
Link to home
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Good information, Slick812. Unfortunately, my 7z program doesn't give me any details of this nature. Can you suggest a free app that would?
I have the 7z program and I know that it will do many, many ZIP file configurations, but I do not see any file analysys in that program. Sorry I can not recomend any programs at this time, and just can not take the time now, However you seem like a capable developer, and web searches are my greatest programming resource and "savior" in doing code work, , , some search for zip file analysis may turn up something.
the producer of this file may have some info about it, but I have seen some Bzip2 with a .zip extention, and I know that PHP has a separate decompression for Bzip2, worth a shot maybe?
My service provider has responded that they use "7za" and included a file named 7za in their response. They also included a php file they described as "code that we have used to unzip." The php file contains nearly 750 lines of code. The only comments appear to be lines of code they've disabled. On line 713 is this function definition:
function unZipOnLinux($domain,$sourceFileName,$destinationPath){
  $destinationPath = $destinationPath.'/';
  $directoryPos = strrpos($sourceFileName,'/');
  $directory = substr($sourceFileName,0,$directoryPos+1);
  $dir = opendir( $directory );
  $info = pathinfo($sourceFileName);
  if ( strtolower($info['extension']) == 'zip' ) {
echo '7za e '.$sourceFileName .'  -o'. $destinationPath.'<br>';
   //system('unzip -q '.$sourceFileName .'  -d '. $destinationPath);
   system('/var/www/vhosts/'.$domain.'/httpdocs/dashboard/upload/7za e '.$sourceFileName .'  -o'. $destinationPath);
  closedir( $dir );

Open in new window

Do I upload the 7za file to my working folder?

They appear to extract a path from the $sourceFileName variable (see lines 3 and 4 in the code above). So do I submit the entire path to the zipped file or just the relative path from the 7za file?

I've searched for examples or commentary of using 7za in php on linux and have come up empty. What little I have found uses shell_exec.

Any advise would be helpful.
I read your last post and to me the "Important" line of code is this -
7za e '.$sourceFileName .'  -o'. $destinationPath

which seems to correspond to -
echo '7za e '.$sourceFileName .'  -o'. $destinationPath.'<br>';

this to me looks like it calls  a LINUX executable (7za) with  input-output parameters e and -o , , , by using system( )  and this was substituted for this line -
system('unzip -q '.$sourceFileName .'  -d '. $destinationPath);
  which uses the "Standard" unzip method.

I would think that the 7za is the 7z archive program for linux, and instead of using the .7z  file extension, they use the .zip extension.
I do know that the 7z archive file format is NOT compatible with the standard .zip archive file format. But you can set the 7z to do a DEFAULT "standard" zip file format, but apparently they did not do this.

for you to use this line  -
system('/var/www/vhosts/'.$domain.'/httpdocs/dashboard/upload/7za e '.$sourceFileName .'  -o'. $destinationPath);

on your server, you would need to have the 7za archive program installed , , and on a file path that your PHP can use with one of the PHP linux system functions calls, , like -
shell_exec('7za e '.$sourceFileName .'  -o'. $destinationPath);
exec('7za e '.$sourceFileName .'  -o'. $destinationPath);
system('7za e '.$sourceFileName .'  -o'. $destinationPath);
    Not all of these system functions are available (maybe none) in various PHP-LINUX setups, and they do vary somewhat in what is returned from the function.

In this system call it looks like the  7za executable is installed on this directory -

I could be wrong about that, since this seems like a highly unusual place to to have an archive program in linux?

anyhow, if u is not so familiar wid de PHP LINUX system stuff, you might start out with the easy -
$output = shell_exec('ls -lart');
echo "<pre>$output</pre>"; // from manual , uses shell script  BASH "ls" to list files in a directory

just to see if the shell_exec( ) thing works. you can use the various linux install methods to get and or install 7z, But you may find better help for this than me in the EE Linux section. .

I just found this, which may shed some light -
I finally found a site with instructions for installing the p7zip package properly for my CentOS linux installation. With proper installation, I don't need to include a path to the 7za bin file. I can expand the archive with the following php command:
system('7za e /var/www/vhosts/*.zip');

Open in new window

This expands the files to the same folder as the archive where I can process them and delete them.
I've split the points according to how much your comment contributed to my own resolution. I doubt if I'd have solved this without your comments. Thanks.