Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 158
  • Last Modified:

Why does this process the first file, then stop?

I've got a directory with two JSON files in it. The code below looks at the directory and then decompresses them one by one while simultaneously updating a database that keeps track of what files have been done, when they started being processed and when they finished.

It works!

But it does the first file, then just quits. When you look at the database that's keeping tabs on what's being done, I have this:

File Name                                                         Start Time                     End Time
00_8ptcd6jgjn201311070000_day.json.gz | 2014-09-29 21:00:51| 0000-00-00 00:00:00
00_8ptcd6jgjn201311060000_day.json.gz | 2014-09-29 21:01:39| 2014-09-29 21:01:51

At 21:00:51, the first file started and nothing happened. Then the second file started and I can see the JSON file in the directory just as it's supposed to be. Why did the first file not decompress? What am I missing?

Here's my code:

<?php
$dir_name = 'JSON/';
if ($dh = opendir("$dir_name"))
{
  while (($file = readdir($dh)) !== false)
  {
    //omitting the system default of listing "." and ".."
		if ($file!="."&&$file!="..")
		{
			//make sure we're only reading files with a .gz extension
			$info = new SplFileInfo($file);
			if($info->getExtension()=="gz")
			{
				//at this point, look to see if the name of that file is in the database and needs to be processed
				$daniel = "select file_name from raw_files where file_name='$file'";
				$daniel_query=mysqli_query($cxn, $daniel);
					if(!$daniel_query)
					{
					$rats=mysqli_errno($cxn).': '.mysqli_error($cxn);
					die($rats);
					}
				$daniel_count=mysqli_num_rows($daniel_query);
					if(!$daniel_count>0)
					{
					//insert current date and time into your raw_files table
					$now= date('Y-m-d H:i:s');
					$nelson="insert into raw_files (file_name, start_time) value('$file', '$now')";
					$nelson_query=mysqli_query($cxn, $nelson);
						if(!$nelson_query)
						{
						$nuts=mysqli_errno($cxn).': '.mysqli_error($cxn);
						die($nuts);
						}
					$novie_id = $cxn->insert_id;
					//here's your decompression code
					$file_name = $file;
					// Raising this value may increase performance
					$buffer_size = 4096; // read 4kb at a time
					$out_file_name = str_replace('.gz', '', $file_name); 
					// Open our files (in binary mode)
					$the_file = gzopen($file_name, 'rb');
					$out_file = fopen('JSON/'.$out_file_name, 'wb'); 
					// Keep repeating until the end of the input file
						while(!gzeof($the_file)) 
						{
						// Read buffer-size bytes
						// Both fwrite and gzread and binary-safe
						  fwrite($out_file, gzread($the_file, $buffer_size));
						}  
					// Files are done, close files
					fclose($out_file);
					gzclose($the_file);
					//here's where you update the raw_files database with a time it was completed
					$right_now= date('Y-m-d H:i:s');
					$brice="update raw_files set end_time = '$right_now' where id=$novie_id";
					$brice_query=mysqli_query($cxn, $brice)
					or die("Brice didn't happen.");
				}
			//here's where you're doing your parsing and putting things into the verizon table
			//$the_new_file=str_replace('.gz',"",$file);
			//echo $the_new_file;
			//start
			//sleep(10);
			}
		}
	}
}
closedir($dh);
echo "done!";

?> 

Open in new window

0
brucegust
Asked:
brucegust
  • 6
  • 5
  • 5
2 Solutions
 
gr8gonzoConsultantCommented:
Maybe the first file isn't GZipped, even though the extension has .gz?

Try adding:
echo __LINE__;

Open in new window


...to various parts AFTER the insert queries and then run it. You should be able to see where the line #s stop (first file) and restart (second file) - that might give some insight.
0
 
Ray PaseurCommented:
As a general rule it's wise to test the return values from PHP functions.  It looks like the script does not test the return value from $the_file = gzopen($file_name, 'rb');.  You might also want to add error_reporting(E_ALL) to the top of the script.  If you have these gz files on a public-facing server where we can test, we would welcome the URL of the directory, and we could test the script with some breakpoints and diagnostics.

Some interesting user-contributed notes on this page:
http://php.net/manual/en/function.gzread.php

You might also consider using scandir() since it will let you get the files in a predictable order.
0
 
brucegustPHP DeveloperAuthor Commented:
Yo, Gonzo!

I'm not sure I'm following you. I added the thing you suggested and I got something like 404040done.

I was able to identify something though, tell me if this doesn't help better determine where things are breaking down.

I commented out some things and renamed some variables in an effort to figure out what was going on. Here's the code as it looks now:

<?php
$dir_name = 'JSON/';
if ($dh = opendir("$dir_name"))
{
  while (($file = readdir($dh)) !== false)
  {
    //omitting the system default of listing "." and ".."
		if ($file!="."&&$file!="..")
		{
			//make sure we're only reading files with a .gz extension
			$info = new SplFileInfo($file);
			if($info->getExtension()=="gz")
			{
				//at this point, look to see if the name of that file is in the database and needs to be processed
				$daniel = "select file_name from raw_files where file_name='$file'";
				$daniel_query=mysqli_query($cxn, $daniel);
					if(!$daniel_query)
					{
					$rats=mysqli_errno($cxn).': '.mysqli_error($cxn);
					die($rats);
					}
				$daniel_count=mysqli_num_rows($daniel_query);
					if(!$daniel_count>0)
					{
					//insert current date and time into your raw_files table
					/*$now= date('Y-m-d H:i:s');
					$nelson="insert into raw_files (file_name, start_time) value('$file', '$now')";
					$nelson_query=mysqli_query($cxn, $nelson);
						if(!$nelson_query)
						{
						$nuts=mysqli_errno($cxn).': '.mysqli_error($cxn);
						die($nuts);
						}
					$novie_id = $cxn->insert_id;*/
					//here's your decompression code
					// Raising this value may increase performance
					$buffer_size = 4096; // read 4kb at a time
					$out_file_name = str_replace('.gz', '',$file); 
					// Open our files (in binary mode)
					$the_file = gzopen($out_file_name, 'rb');
					$out_file = fopen('JSON/'.$out_file_name, 'wb'); 
					// Keep repeating until the end of the input file
						while(!gzeof($file)) 
						{
						// Read buffer-size bytes
						// Both fwrite and gzread and binary-safe
						  fwrite($out_file, gzread($file, $buffer_size));
						}  
					// Files are done, close files
					fclose($out_file);
					gzclose($the_file);
					//here's where you update the raw_files database with a time it was completed
					/*$right_now= date('Y-m-d H:i:s');
					$brice="update raw_files set end_time = '$right_now' where id=$novie_id";
					$brice_query=mysqli_query($cxn, $brice)
					or die("Brice didn't happen.");*/
				}
			//here's where you're doing your parsing and putting things into the verizon table
			//$the_new_file=str_replace('.gz',"",$file);
			//echo $the_new_file;
			//start
			//sleep(10);
			}
		}
	}
}
closedir($dh);
echo "done!";

?> 

Open in new window


When I do "echo $file" I get "00_8ptcd6jgjn201309050000_day.json.gz"

Perfect!

But when I run the code, I get "Warning: gzopen(00_8ptcd6jgjn201309050000_day.json): failed to open stream: No such file or directory in C:\wamp\www\json\decompress.php on line 66" which is this part of the code:

                              $out_file_name = str_replace('.gz', '',$file);
                              // Open our files (in binary mode)
                              $the_file = gzopen($out_file_name, 'rb');

Specifically, "$the_file"

When I go out to the directory, I see 00_8ptcd6jgjn201309050000_day.json, so the file is there, yet the page says that it doesn't exist.

What do you think?
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
gr8gonzoConsultantCommented:
So __LINE__ is a special value in PHP. It will represent the line number that it's currently on. So for example, let's say I had this script (line number is on the left):

1     $foo = "bar";
2     echo __LINE__ . "\n";
3     echo $foo . "\n";
4     echo __LINE__ . "\n";

Running that should output:
2
bar
4

By outputting the __LINE__ in different parts of your script (around the area where you suspect a problem), you can sometimes gain an idea of the path that PHP is taking when it runs your file. It will tell you which lines PHP is hitting when it is running.

So your output, for example, tells me that you probably added that code once to line 40. I forgot to mention that you should add line breaks to make it a little easier to read.

It's just a little bit of a debugging trick that can help you narrow down the problem sometimes. Moving on...

In your original snippet, you had:
$the_file = gzopen($file_name, 'rb');

Now you are trying to gzopen the output file:
$the_file = gzopen($out_file_name, 'rb');

Might be a typo?
0
 
Ray PaseurCommented:
You might try it something like this.  You will be able to see the variables at certain points in the process, and the script should stop with an error message if something is completely out of whack.

<?php // demo/temp_brucegust.php
error_reporting(E_ALL);

$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);

foreach($arr as $file)
{
    $info = new SplFileInfo($file);
    if($info->getExtension()!= "gz") continue;
    echo PHP_EOL . "PROCESSING $file";

    $daniel = "SELECT file_name FROM raw_files WHERE file_name='$file'";
    $daniel_query=mysqli_query($cxn, $daniel);
    if(!$daniel_query)
    {
        var_dump($daniel);
        trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
    }
    $daniel_count=mysqli_num_rows($daniel_query);
    echo PHP_EOL . "FOUND $daniel_count DATABASE ROWS FOR $file";

    if ($daniel_count)
    {
        $now = date('c');
        $nelson="INSERT INTO raw_files (file_name, start_time) VALUES ('$file', '$now')";
        $nelson_query=mysqli_query($cxn, $nelson);
        if(!$nelson_query)
        {
            var_dump($nelson);
            trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
        }
        $new_id = $cxn->insert_id;
        echo PHP_EOL . "INSERTED ID=$new_id INTO raw_files TABLE";
    }
    else
    {
        trigger_error("NO DATA INSERTED FOR $file", E_USER_ERROR);
    }

    $buffer_size = 4096; // read 4kb at a time
    $out_file_name = str_replace('.gz', '',$file);

    $inp_handle = gzopen($file, 'rb');
    if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);

    $out_handle = fopen('JSON/'.$out_file_name, 'wb');
    if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);

    while(!gzeof($inp_handle))
    {
        $data = gzread($inp_handle);
        fwrite($out_handle, $data);
    }
    fclose($out_handle);
    gzclose($inp_handle);

    $now = date('c');
    $brice="UPDATE raw_files SET end_time = '$now' WHERE id=$new_id LIMIT 1";
    $brice_query=mysqli_query($cxn, $brice);
    if(!$brice_query)
    {
        var_dump($brice);
        trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
    }
}

echo "done!";

Open in new window

0
 
brucegustPHP DeveloperAuthor Commented:
Not a typo, just trying to figure out why the code doesn't "see" the JSON file that was supposedly just "opened."

The error that I'm getting is at line 40. It's there where I get Warning: gzopen(00_8ptcd6jgjn201309050000_day.json): failed to open stream: No such file or directory in C:\wamp\www\json\decompress.php on line 66

Why doesn't it see the file when I can see it in the directory?
0
 
gr8gonzoConsultantCommented:
Your fopen BELOW the gzopen is what creates the output file. That's why you see it but gzopen doesnt. That said, you shouldnt be gzopen-ing the output file...
0
 
brucegustPHP DeveloperAuthor Commented:
Ray!

Here's the portion of code that you wrote that I experimented with:

<?php // demo/temp_brucegust.php
error_reporting(E_ALL);

$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);

foreach($arr as $file)
{
    $info = new SplFileInfo($file);
    if($info->getExtension()!= "gz") continue;
    echo PHP_EOL . "PROCESSING $file";



    $buffer_size = 4096; // read 4kb at a time
    $out_file_name = str_replace('.gz', '',$file);

    $inp_handle = gzopen($file, 'rb');
    if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);

    $out_handle = fopen('JSON/'.$out_file_name, 'wb');
    if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);

    while(!gzeof($inp_handle))
    {
        $data = gzread($inp_handle);
        fwrite($out_handle, $data);
    }
    fclose($out_handle);
    gzclose($inp_handle);


}

echo "done!";
?>

Open in new window


I figured I've got a ninja writing the decompressing code - that's the thing that's killing me right now. So, using that, this is the error I got:
ray.png
0
 
brucegustPHP DeveloperAuthor Commented:
What do you think? Where am I blowing it?
<?php // demo/temp_brucegust.php
error_reporting(E_ALL);

$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);

foreach($arr as $file)
{
    $info = new SplFileInfo($file);
    if($info->getExtension()!= "gz") continue;
    echo PHP_EOL . "PROCESSING $file";



    $buffer_size = 4096; // read 4kb at a time
    $out_file_name = str_replace('.gz', '',$file);

    $inp_handle = gzopen($file, 'rb');
    if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);

    $out_handle = fopen('JSON/'.$out_file_name, 'wb');
    if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);

    while(!gzeof($inp_handle))
    {
        $data = gzread($inp_handle);
        fwrite($out_handle, $data);
    }
    fclose($out_handle);
    gzclose($inp_handle);


}

echo "done!";
?>

Open in new window

0
 
gr8gonzoConsultantCommented:
Change:
$inp_handle = gzopen($file, 'rb');

To:
$inp_handle = gzopen('JSON/'.$file, 'rb');
0
 
brucegustPHP DeveloperAuthor Commented:
Gonzo!

After implementing your suggestion I get:

line 52 expects two parameters...

The line in question is $data=gzread($inp_handle).

Here's the code with your recommendations...

<?php // demo/temp_brucegust.php
error_reporting(E_ALL);

$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);

foreach($arr as $file)
{
    $info = new SplFileInfo($file);
    if($info->getExtension()!= "gz") continue;
    echo PHP_EOL . "PROCESSING $file";

    $buffer_size = 4096; // read 4kb at a time
    $out_file_name = str_replace('.gz', '',$file);

    $inp_handle = gzopen('JSON/'.$file, 'rb'); 
    if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);

    $out_handle = fopen('JSON/'.$out_file_name, 'wb');
    if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);

    while(!gzeof($inp_handle))
    {
        $data = gzread($inp_handle);
        fwrite($out_handle, $data);
    }
    fclose($out_handle);
    gzclose($inp_handle);

    $now = date('c');
    $brice="UPDATE raw_files SET end_time = '$now' WHERE id=$new_id LIMIT 1";
    $brice_query=mysqli_query($cxn, $brice);
    if(!$brice_query)
    {
        var_dump($brice);
        trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
    }
}

echo "done!";

?> 

Open in new window

0
 
Ray PaseurCommented:
How can you get an error on line 52 in a script that has only 44 lines?  Are you sure you're testing the right script?
0
 
Ray PaseurCommented:
Let's try this one...
<?php // demo/temp_brucegust.php
error_reporting(E_ALL);

$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);

foreach($arr as $file)
{
    $info = new SplFileInfo($file);
    if($info->getExtension()!= "gz") continue;
    echo PHP_EOL . "PROCESSING $file";

    $daniel = "SELECT file_name FROM raw_files WHERE file_name='$file'";
    $daniel_query=mysqli_query($cxn, $daniel);
    if(!$daniel_query)
    {
        var_dump($daniel);
        trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
    }
    $daniel_count=mysqli_num_rows($daniel_query);
    echo PHP_EOL . "FOUND $daniel_count DATABASE ROWS FOR $file";

    if ($daniel_count)
    {
        $now = date('c');
        $nelson="INSERT INTO raw_files (file_name, start_time) VALUES ('$file', '$now')";
        $nelson_query=mysqli_query($cxn, $nelson);
        if(!$nelson_query)
        {
            var_dump($nelson);
            trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
        }
        $new_id = $cxn->insert_id;
        echo PHP_EOL . "INSERTED ID=$new_id INTO raw_files TABLE";
    }
    else
    {
        trigger_error("NO DATA INSERTED FOR $file", E_USER_ERROR);
    }

    $buffer_size = 4096; // read 4kb at a time
    $out_file_name = str_replace('.gz', '',$file);

    $inp_handle = gzopen('JSON/' . $file, 'rb');
    if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);

    $out_handle = fopen('JSON/' . $out_file_name, 'wb');
    if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);

    while(!gzeof($inp_handle))
    {
        $data = gzread($inp_handle, $buffer_size);
        fwrite($out_handle, $data);
    }
    fclose($out_handle);
    gzclose($inp_handle);

    $now = date('c');
    $brice="UPDATE raw_files SET end_time = '$now' WHERE id=$new_id LIMIT 1";
    $brice_query=mysqli_query($cxn, $brice);
    if(!$brice_query)
    {
        var_dump($brice);
        trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
    }
}

echo "done!";

Open in new window

0
 
brucegustPHP DeveloperAuthor Commented:
That'll do it, Ray!

What was I not doing right?
0
 
Ray PaseurCommented:
Thanks for the points.  I don't really know what might have been wrong - when the error says line 52 but the script only has 44 lines, I don't read the code at all - I just try to produce something that I think might work.  

As a general rule, more data visualization is better when you're trying to debug some code, so you'll often see a lot of echo and var_dump() statements in my programming.  

As another general rule, the if() statement without the else{} control structure is often a path to confusion.  It's like saying "If something happens do this, but ignore the facts if something didn't happen."  That kind of selective way of thinking about facts leads to assumptions that often fail in unit tests.   If you like geek jokes, you'll appreciate this one:

The QA Engineer walks into a bar. Orders a beer. Orders 0 beers. Orders 999999999 beers. Orders a lizard. Orders -1 beers. Orders a sfdeljknesv.
0
 
gr8gonzoConsultantCommented:
You didn't include the buffer size in your gzopen command.
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 6
  • 5
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now