Solved

Why does this process the first file, then stop?

Posted on 2014-09-29
16
145 Views
Last Modified: 2014-09-30
I've got a directory with two JSON files in it. The code below looks at the directory and then decompresses them one by one while simultaneously updating a database that keeps track of what files have been done, when they started being processed and when they finished.

It works!

But it does the first file, then just quits. When you look at the database that's keeping tabs on what's being done, I have this:

File Name                                                         Start Time                     End Time
00_8ptcd6jgjn201311070000_day.json.gz | 2014-09-29 21:00:51| 0000-00-00 00:00:00
00_8ptcd6jgjn201311060000_day.json.gz | 2014-09-29 21:01:39| 2014-09-29 21:01:51

At 21:00:51, the first file started and nothing happened. Then the second file started and I can see the JSON file in the directory just as it's supposed to be. Why did the first file not decompress? What am I missing?

Here's my code:

<?php
$dir_name = 'JSON/';
if ($dh = opendir("$dir_name"))
{
  while (($file = readdir($dh)) !== false)
  {
    //omitting the system default of listing "." and ".."
		if ($file!="."&&$file!="..")
		{
			//make sure we're only reading files with a .gz extension
			$info = new SplFileInfo($file);
			if($info->getExtension()=="gz")
			{
				//at this point, look to see if the name of that file is in the database and needs to be processed
				$daniel = "select file_name from raw_files where file_name='$file'";
				$daniel_query=mysqli_query($cxn, $daniel);
					if(!$daniel_query)
					{
					$rats=mysqli_errno($cxn).': '.mysqli_error($cxn);
					die($rats);
					}
				$daniel_count=mysqli_num_rows($daniel_query);
					if(!$daniel_count>0)
					{
					//insert current date and time into your raw_files table
					$now= date('Y-m-d H:i:s');
					$nelson="insert into raw_files (file_name, start_time) value('$file', '$now')";
					$nelson_query=mysqli_query($cxn, $nelson);
						if(!$nelson_query)
						{
						$nuts=mysqli_errno($cxn).': '.mysqli_error($cxn);
						die($nuts);
						}
					$novie_id = $cxn->insert_id;
					//here's your decompression code
					$file_name = $file;
					// Raising this value may increase performance
					$buffer_size = 4096; // read 4kb at a time
					$out_file_name = str_replace('.gz', '', $file_name); 
					// Open our files (in binary mode)
					$the_file = gzopen($file_name, 'rb');
					$out_file = fopen('JSON/'.$out_file_name, 'wb'); 
					// Keep repeating until the end of the input file
						while(!gzeof($the_file)) 
						{
						// Read buffer-size bytes
						// Both fwrite and gzread and binary-safe
						  fwrite($out_file, gzread($the_file, $buffer_size));
						}  
					// Files are done, close files
					fclose($out_file);
					gzclose($the_file);
					//here's where you update the raw_files database with a time it was completed
					$right_now= date('Y-m-d H:i:s');
					$brice="update raw_files set end_time = '$right_now' where id=$novie_id";
					$brice_query=mysqli_query($cxn, $brice)
					or die("Brice didn't happen.");
				}
			//here's where you're doing your parsing and putting things into the verizon table
			//$the_new_file=str_replace('.gz',"",$file);
			//echo $the_new_file;
			//start
			//sleep(10);
			}
		}
	}
}
closedir($dh);
echo "done!";

?> 

Open in new window

0
Comment
Question by:brucegust
  • 6
  • 5
  • 5
16 Comments
 
LVL 34

Expert Comment

by:gr8gonzo
ID: 40350800
Maybe the first file isn't GZipped, even though the extension has .gz?

Try adding:
echo __LINE__;

Open in new window


...to various parts AFTER the insert queries and then run it. You should be able to see where the line #s stop (first file) and restart (second file) - that might give some insight.
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 40350879
As a general rule it's wise to test the return values from PHP functions.  It looks like the script does not test the return value from $the_file = gzopen($file_name, 'rb');.  You might also want to add error_reporting(E_ALL) to the top of the script.  If you have these gz files on a public-facing server where we can test, we would welcome the URL of the directory, and we could test the script with some breakpoints and diagnostics.

Some interesting user-contributed notes on this page:
http://php.net/manual/en/function.gzread.php

You might also consider using scandir() since it will let you get the files in a predictable order.
0
 

Author Comment

by:brucegust
ID: 40350909
Yo, Gonzo!

I'm not sure I'm following you. I added the thing you suggested and I got something like 404040done.

I was able to identify something though, tell me if this doesn't help better determine where things are breaking down.

I commented out some things and renamed some variables in an effort to figure out what was going on. Here's the code as it looks now:

<?php
$dir_name = 'JSON/';
if ($dh = opendir("$dir_name"))
{
  while (($file = readdir($dh)) !== false)
  {
    //omitting the system default of listing "." and ".."
		if ($file!="."&&$file!="..")
		{
			//make sure we're only reading files with a .gz extension
			$info = new SplFileInfo($file);
			if($info->getExtension()=="gz")
			{
				//at this point, look to see if the name of that file is in the database and needs to be processed
				$daniel = "select file_name from raw_files where file_name='$file'";
				$daniel_query=mysqli_query($cxn, $daniel);
					if(!$daniel_query)
					{
					$rats=mysqli_errno($cxn).': '.mysqli_error($cxn);
					die($rats);
					}
				$daniel_count=mysqli_num_rows($daniel_query);
					if(!$daniel_count>0)
					{
					//insert current date and time into your raw_files table
					/*$now= date('Y-m-d H:i:s');
					$nelson="insert into raw_files (file_name, start_time) value('$file', '$now')";
					$nelson_query=mysqli_query($cxn, $nelson);
						if(!$nelson_query)
						{
						$nuts=mysqli_errno($cxn).': '.mysqli_error($cxn);
						die($nuts);
						}
					$novie_id = $cxn->insert_id;*/
					//here's your decompression code
					// Raising this value may increase performance
					$buffer_size = 4096; // read 4kb at a time
					$out_file_name = str_replace('.gz', '',$file); 
					// Open our files (in binary mode)
					$the_file = gzopen($out_file_name, 'rb');
					$out_file = fopen('JSON/'.$out_file_name, 'wb'); 
					// Keep repeating until the end of the input file
						while(!gzeof($file)) 
						{
						// Read buffer-size bytes
						// Both fwrite and gzread and binary-safe
						  fwrite($out_file, gzread($file, $buffer_size));
						}  
					// Files are done, close files
					fclose($out_file);
					gzclose($the_file);
					//here's where you update the raw_files database with a time it was completed
					/*$right_now= date('Y-m-d H:i:s');
					$brice="update raw_files set end_time = '$right_now' where id=$novie_id";
					$brice_query=mysqli_query($cxn, $brice)
					or die("Brice didn't happen.");*/
				}
			//here's where you're doing your parsing and putting things into the verizon table
			//$the_new_file=str_replace('.gz',"",$file);
			//echo $the_new_file;
			//start
			//sleep(10);
			}
		}
	}
}
closedir($dh);
echo "done!";

?> 

Open in new window


When I do "echo $file" I get "00_8ptcd6jgjn201309050000_day.json.gz"

Perfect!

But when I run the code, I get "Warning: gzopen(00_8ptcd6jgjn201309050000_day.json): failed to open stream: No such file or directory in C:\wamp\www\json\decompress.php on line 66" which is this part of the code:

                              $out_file_name = str_replace('.gz', '',$file);
                              // Open our files (in binary mode)
                              $the_file = gzopen($out_file_name, 'rb');

Specifically, "$the_file"

When I go out to the directory, I see 00_8ptcd6jgjn201309050000_day.json, so the file is there, yet the page says that it doesn't exist.

What do you think?
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 34

Assisted Solution

by:gr8gonzo
gr8gonzo earned 150 total points
ID: 40350942
So __LINE__ is a special value in PHP. It will represent the line number that it's currently on. So for example, let's say I had this script (line number is on the left):

1     $foo = "bar";
2     echo __LINE__ . "\n";
3     echo $foo . "\n";
4     echo __LINE__ . "\n";

Running that should output:
2
bar
4

By outputting the __LINE__ in different parts of your script (around the area where you suspect a problem), you can sometimes gain an idea of the path that PHP is taking when it runs your file. It will tell you which lines PHP is hitting when it is running.

So your output, for example, tells me that you probably added that code once to line 40. I forgot to mention that you should add line breaks to make it a little easier to read.

It's just a little bit of a debugging trick that can help you narrow down the problem sometimes. Moving on...

In your original snippet, you had:
$the_file = gzopen($file_name, 'rb');

Now you are trying to gzopen the output file:
$the_file = gzopen($out_file_name, 'rb');

Might be a typo?
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 40351015
You might try it something like this.  You will be able to see the variables at certain points in the process, and the script should stop with an error message if something is completely out of whack.

<?php // demo/temp_brucegust.php
error_reporting(E_ALL);

$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);

foreach($arr as $file)
{
    $info = new SplFileInfo($file);
    if($info->getExtension()!= "gz") continue;
    echo PHP_EOL . "PROCESSING $file";

    $daniel = "SELECT file_name FROM raw_files WHERE file_name='$file'";
    $daniel_query=mysqli_query($cxn, $daniel);
    if(!$daniel_query)
    {
        var_dump($daniel);
        trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
    }
    $daniel_count=mysqli_num_rows($daniel_query);
    echo PHP_EOL . "FOUND $daniel_count DATABASE ROWS FOR $file";

    if ($daniel_count)
    {
        $now = date('c');
        $nelson="INSERT INTO raw_files (file_name, start_time) VALUES ('$file', '$now')";
        $nelson_query=mysqli_query($cxn, $nelson);
        if(!$nelson_query)
        {
            var_dump($nelson);
            trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
        }
        $new_id = $cxn->insert_id;
        echo PHP_EOL . "INSERTED ID=$new_id INTO raw_files TABLE";
    }
    else
    {
        trigger_error("NO DATA INSERTED FOR $file", E_USER_ERROR);
    }

    $buffer_size = 4096; // read 4kb at a time
    $out_file_name = str_replace('.gz', '',$file);

    $inp_handle = gzopen($file, 'rb');
    if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);

    $out_handle = fopen('JSON/'.$out_file_name, 'wb');
    if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);

    while(!gzeof($inp_handle))
    {
        $data = gzread($inp_handle);
        fwrite($out_handle, $data);
    }
    fclose($out_handle);
    gzclose($inp_handle);

    $now = date('c');
    $brice="UPDATE raw_files SET end_time = '$now' WHERE id=$new_id LIMIT 1";
    $brice_query=mysqli_query($cxn, $brice);
    if(!$brice_query)
    {
        var_dump($brice);
        trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
    }
}

echo "done!";

Open in new window

0
 

Author Comment

by:brucegust
ID: 40351020
Not a typo, just trying to figure out why the code doesn't "see" the JSON file that was supposedly just "opened."

The error that I'm getting is at line 40. It's there where I get Warning: gzopen(00_8ptcd6jgjn201309050000_day.json): failed to open stream: No such file or directory in C:\wamp\www\json\decompress.php on line 66

Why doesn't it see the file when I can see it in the directory?
0
 
LVL 34

Expert Comment

by:gr8gonzo
ID: 40351029
Your fopen BELOW the gzopen is what creates the output file. That's why you see it but gzopen doesnt. That said, you shouldnt be gzopen-ing the output file...
0
 

Author Comment

by:brucegust
ID: 40351049
Ray!

Here's the portion of code that you wrote that I experimented with:

<?php // demo/temp_brucegust.php
error_reporting(E_ALL);

$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);

foreach($arr as $file)
{
    $info = new SplFileInfo($file);
    if($info->getExtension()!= "gz") continue;
    echo PHP_EOL . "PROCESSING $file";



    $buffer_size = 4096; // read 4kb at a time
    $out_file_name = str_replace('.gz', '',$file);

    $inp_handle = gzopen($file, 'rb');
    if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);

    $out_handle = fopen('JSON/'.$out_file_name, 'wb');
    if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);

    while(!gzeof($inp_handle))
    {
        $data = gzread($inp_handle);
        fwrite($out_handle, $data);
    }
    fclose($out_handle);
    gzclose($inp_handle);


}

echo "done!";
?>

Open in new window


I figured I've got a ninja writing the decompressing code - that's the thing that's killing me right now. So, using that, this is the error I got:
ray.png
0
 

Author Comment

by:brucegust
ID: 40351050
What do you think? Where am I blowing it?
<?php // demo/temp_brucegust.php
error_reporting(E_ALL);

$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);

foreach($arr as $file)
{
    $info = new SplFileInfo($file);
    if($info->getExtension()!= "gz") continue;
    echo PHP_EOL . "PROCESSING $file";



    $buffer_size = 4096; // read 4kb at a time
    $out_file_name = str_replace('.gz', '',$file);

    $inp_handle = gzopen($file, 'rb');
    if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);

    $out_handle = fopen('JSON/'.$out_file_name, 'wb');
    if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);

    while(!gzeof($inp_handle))
    {
        $data = gzread($inp_handle);
        fwrite($out_handle, $data);
    }
    fclose($out_handle);
    gzclose($inp_handle);


}

echo "done!";
?>

Open in new window

0
 
LVL 34

Expert Comment

by:gr8gonzo
ID: 40351075
Change:
$inp_handle = gzopen($file, 'rb');

To:
$inp_handle = gzopen('JSON/'.$file, 'rb');
0
 

Author Comment

by:brucegust
ID: 40351086
Gonzo!

After implementing your suggestion I get:

line 52 expects two parameters...

The line in question is $data=gzread($inp_handle).

Here's the code with your recommendations...

<?php // demo/temp_brucegust.php
error_reporting(E_ALL);

$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);

foreach($arr as $file)
{
    $info = new SplFileInfo($file);
    if($info->getExtension()!= "gz") continue;
    echo PHP_EOL . "PROCESSING $file";

    $buffer_size = 4096; // read 4kb at a time
    $out_file_name = str_replace('.gz', '',$file);

    $inp_handle = gzopen('JSON/'.$file, 'rb'); 
    if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);

    $out_handle = fopen('JSON/'.$out_file_name, 'wb');
    if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);

    while(!gzeof($inp_handle))
    {
        $data = gzread($inp_handle);
        fwrite($out_handle, $data);
    }
    fclose($out_handle);
    gzclose($inp_handle);

    $now = date('c');
    $brice="UPDATE raw_files SET end_time = '$now' WHERE id=$new_id LIMIT 1";
    $brice_query=mysqli_query($cxn, $brice);
    if(!$brice_query)
    {
        var_dump($brice);
        trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
    }
}

echo "done!";

?> 

Open in new window

0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 40351116
How can you get an error on line 52 in a script that has only 44 lines?  Are you sure you're testing the right script?
0
 
LVL 109

Accepted Solution

by:
Ray Paseur earned 350 total points
ID: 40351122
Let's try this one...
<?php // demo/temp_brucegust.php
error_reporting(E_ALL);

$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);

foreach($arr as $file)
{
    $info = new SplFileInfo($file);
    if($info->getExtension()!= "gz") continue;
    echo PHP_EOL . "PROCESSING $file";

    $daniel = "SELECT file_name FROM raw_files WHERE file_name='$file'";
    $daniel_query=mysqli_query($cxn, $daniel);
    if(!$daniel_query)
    {
        var_dump($daniel);
        trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
    }
    $daniel_count=mysqli_num_rows($daniel_query);
    echo PHP_EOL . "FOUND $daniel_count DATABASE ROWS FOR $file";

    if ($daniel_count)
    {
        $now = date('c');
        $nelson="INSERT INTO raw_files (file_name, start_time) VALUES ('$file', '$now')";
        $nelson_query=mysqli_query($cxn, $nelson);
        if(!$nelson_query)
        {
            var_dump($nelson);
            trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
        }
        $new_id = $cxn->insert_id;
        echo PHP_EOL . "INSERTED ID=$new_id INTO raw_files TABLE";
    }
    else
    {
        trigger_error("NO DATA INSERTED FOR $file", E_USER_ERROR);
    }

    $buffer_size = 4096; // read 4kb at a time
    $out_file_name = str_replace('.gz', '',$file);

    $inp_handle = gzopen('JSON/' . $file, 'rb');
    if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);

    $out_handle = fopen('JSON/' . $out_file_name, 'wb');
    if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);

    while(!gzeof($inp_handle))
    {
        $data = gzread($inp_handle, $buffer_size);
        fwrite($out_handle, $data);
    }
    fclose($out_handle);
    gzclose($inp_handle);

    $now = date('c');
    $brice="UPDATE raw_files SET end_time = '$now' WHERE id=$new_id LIMIT 1";
    $brice_query=mysqli_query($cxn, $brice);
    if(!$brice_query)
    {
        var_dump($brice);
        trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
    }
}

echo "done!";

Open in new window

0
 

Author Comment

by:brucegust
ID: 40352018
That'll do it, Ray!

What was I not doing right?
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 40352106
Thanks for the points.  I don't really know what might have been wrong - when the error says line 52 but the script only has 44 lines, I don't read the code at all - I just try to produce something that I think might work.  

As a general rule, more data visualization is better when you're trying to debug some code, so you'll often see a lot of echo and var_dump() statements in my programming.  

As another general rule, the if() statement without the else{} control structure is often a path to confusion.  It's like saying "If something happens do this, but ignore the facts if something didn't happen."  That kind of selective way of thinking about facts leads to assumptions that often fail in unit tests.   If you like geek jokes, you'll appreciate this one:

The QA Engineer walks into a bar. Orders a beer. Orders 0 beers. Orders 999999999 beers. Orders a lizard. Orders -1 beers. Orders a sfdeljknesv.
0
 
LVL 34

Expert Comment

by:gr8gonzo
ID: 40352181
You didn't include the buffer size in your gzopen command.
0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
Foreword (July, 2015) Since I first wrote this article, years ago, a great many more people have begun using the internet.  They are coming online from every part of the globe, learning, reading, shopping and spending money at an ever-increasing ra…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question