Solved

Why does this process the first file, then stop?

Posted on 2014-09-29
16
142 Views
Last Modified: 2014-09-30
I've got a directory with two JSON files in it. The code below looks at the directory and then decompresses them one by one while simultaneously updating a database that keeps track of what files have been done, when they started being processed and when they finished.

It works!

But it does the first file, then just quits. When you look at the database that's keeping tabs on what's being done, I have this:

File Name                                                         Start Time                     End Time
00_8ptcd6jgjn201311070000_day.json.gz | 2014-09-29 21:00:51| 0000-00-00 00:00:00
00_8ptcd6jgjn201311060000_day.json.gz | 2014-09-29 21:01:39| 2014-09-29 21:01:51

At 21:00:51, the first file started and nothing happened. Then the second file started and I can see the JSON file in the directory just as it's supposed to be. Why did the first file not decompress? What am I missing?

Here's my code:

<?php
$dir_name = 'JSON/';
if ($dh = opendir("$dir_name"))
{
  while (($file = readdir($dh)) !== false)
  {
    //omitting the system default of listing "." and ".."
		if ($file!="."&&$file!="..")
		{
			//make sure we're only reading files with a .gz extension
			$info = new SplFileInfo($file);
			if($info->getExtension()=="gz")
			{
				//at this point, look to see if the name of that file is in the database and needs to be processed
				$daniel = "select file_name from raw_files where file_name='$file'";
				$daniel_query=mysqli_query($cxn, $daniel);
					if(!$daniel_query)
					{
					$rats=mysqli_errno($cxn).': '.mysqli_error($cxn);
					die($rats);
					}
				$daniel_count=mysqli_num_rows($daniel_query);
					if(!$daniel_count>0)
					{
					//insert current date and time into your raw_files table
					$now= date('Y-m-d H:i:s');
					$nelson="insert into raw_files (file_name, start_time) value('$file', '$now')";
					$nelson_query=mysqli_query($cxn, $nelson);
						if(!$nelson_query)
						{
						$nuts=mysqli_errno($cxn).': '.mysqli_error($cxn);
						die($nuts);
						}
					$novie_id = $cxn->insert_id;
					//here's your decompression code
					$file_name = $file;
					// Raising this value may increase performance
					$buffer_size = 4096; // read 4kb at a time
					$out_file_name = str_replace('.gz', '', $file_name); 
					// Open our files (in binary mode)
					$the_file = gzopen($file_name, 'rb');
					$out_file = fopen('JSON/'.$out_file_name, 'wb'); 
					// Keep repeating until the end of the input file
						while(!gzeof($the_file)) 
						{
						// Read buffer-size bytes
						// Both fwrite and gzread and binary-safe
						  fwrite($out_file, gzread($the_file, $buffer_size));
						}  
					// Files are done, close files
					fclose($out_file);
					gzclose($the_file);
					//here's where you update the raw_files database with a time it was completed
					$right_now= date('Y-m-d H:i:s');
					$brice="update raw_files set end_time = '$right_now' where id=$novie_id";
					$brice_query=mysqli_query($cxn, $brice)
					or die("Brice didn't happen.");
				}
			//here's where you're doing your parsing and putting things into the verizon table
			//$the_new_file=str_replace('.gz',"",$file);
			//echo $the_new_file;
			//start
			//sleep(10);
			}
		}
	}
}
closedir($dh);
echo "done!";

?> 

Open in new window

0
Comment
Question by:brucegust
  • 6
  • 5
  • 5
16 Comments
 
LVL 34

Expert Comment

by:gr8gonzo
ID: 40350800
Maybe the first file isn't GZipped, even though the extension has .gz?

Try adding:
echo __LINE__;

Open in new window


...to various parts AFTER the insert queries and then run it. You should be able to see where the line #s stop (first file) and restart (second file) - that might give some insight.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 40350879
As a general rule it's wise to test the return values from PHP functions.  It looks like the script does not test the return value from $the_file = gzopen($file_name, 'rb');.  You might also want to add error_reporting(E_ALL) to the top of the script.  If you have these gz files on a public-facing server where we can test, we would welcome the URL of the directory, and we could test the script with some breakpoints and diagnostics.

Some interesting user-contributed notes on this page:
http://php.net/manual/en/function.gzread.php

You might also consider using scandir() since it will let you get the files in a predictable order.
0
 

Author Comment

by:brucegust
ID: 40350909
Yo, Gonzo!

I'm not sure I'm following you. I added the thing you suggested and I got something like 404040done.

I was able to identify something though, tell me if this doesn't help better determine where things are breaking down.

I commented out some things and renamed some variables in an effort to figure out what was going on. Here's the code as it looks now:

<?php
$dir_name = 'JSON/';
if ($dh = opendir("$dir_name"))
{
  while (($file = readdir($dh)) !== false)
  {
    //omitting the system default of listing "." and ".."
		if ($file!="."&&$file!="..")
		{
			//make sure we're only reading files with a .gz extension
			$info = new SplFileInfo($file);
			if($info->getExtension()=="gz")
			{
				//at this point, look to see if the name of that file is in the database and needs to be processed
				$daniel = "select file_name from raw_files where file_name='$file'";
				$daniel_query=mysqli_query($cxn, $daniel);
					if(!$daniel_query)
					{
					$rats=mysqli_errno($cxn).': '.mysqli_error($cxn);
					die($rats);
					}
				$daniel_count=mysqli_num_rows($daniel_query);
					if(!$daniel_count>0)
					{
					//insert current date and time into your raw_files table
					/*$now= date('Y-m-d H:i:s');
					$nelson="insert into raw_files (file_name, start_time) value('$file', '$now')";
					$nelson_query=mysqli_query($cxn, $nelson);
						if(!$nelson_query)
						{
						$nuts=mysqli_errno($cxn).': '.mysqli_error($cxn);
						die($nuts);
						}
					$novie_id = $cxn->insert_id;*/
					//here's your decompression code
					// Raising this value may increase performance
					$buffer_size = 4096; // read 4kb at a time
					$out_file_name = str_replace('.gz', '',$file); 
					// Open our files (in binary mode)
					$the_file = gzopen($out_file_name, 'rb');
					$out_file = fopen('JSON/'.$out_file_name, 'wb'); 
					// Keep repeating until the end of the input file
						while(!gzeof($file)) 
						{
						// Read buffer-size bytes
						// Both fwrite and gzread and binary-safe
						  fwrite($out_file, gzread($file, $buffer_size));
						}  
					// Files are done, close files
					fclose($out_file);
					gzclose($the_file);
					//here's where you update the raw_files database with a time it was completed
					/*$right_now= date('Y-m-d H:i:s');
					$brice="update raw_files set end_time = '$right_now' where id=$novie_id";
					$brice_query=mysqli_query($cxn, $brice)
					or die("Brice didn't happen.");*/
				}
			//here's where you're doing your parsing and putting things into the verizon table
			//$the_new_file=str_replace('.gz',"",$file);
			//echo $the_new_file;
			//start
			//sleep(10);
			}
		}
	}
}
closedir($dh);
echo "done!";

?> 

Open in new window


When I do "echo $file" I get "00_8ptcd6jgjn201309050000_day.json.gz"

Perfect!

But when I run the code, I get "Warning: gzopen(00_8ptcd6jgjn201309050000_day.json): failed to open stream: No such file or directory in C:\wamp\www\json\decompress.php on line 66" which is this part of the code:

                              $out_file_name = str_replace('.gz', '',$file);
                              // Open our files (in binary mode)
                              $the_file = gzopen($out_file_name, 'rb');

Specifically, "$the_file"

When I go out to the directory, I see 00_8ptcd6jgjn201309050000_day.json, so the file is there, yet the page says that it doesn't exist.

What do you think?
0
 
LVL 34

Assisted Solution

by:gr8gonzo
gr8gonzo earned 150 total points
ID: 40350942
So __LINE__ is a special value in PHP. It will represent the line number that it's currently on. So for example, let's say I had this script (line number is on the left):

1     $foo = "bar";
2     echo __LINE__ . "\n";
3     echo $foo . "\n";
4     echo __LINE__ . "\n";

Running that should output:
2
bar
4

By outputting the __LINE__ in different parts of your script (around the area where you suspect a problem), you can sometimes gain an idea of the path that PHP is taking when it runs your file. It will tell you which lines PHP is hitting when it is running.

So your output, for example, tells me that you probably added that code once to line 40. I forgot to mention that you should add line breaks to make it a little easier to read.

It's just a little bit of a debugging trick that can help you narrow down the problem sometimes. Moving on...

In your original snippet, you had:
$the_file = gzopen($file_name, 'rb');

Now you are trying to gzopen the output file:
$the_file = gzopen($out_file_name, 'rb');

Might be a typo?
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 40351015
You might try it something like this.  You will be able to see the variables at certain points in the process, and the script should stop with an error message if something is completely out of whack.

<?php // demo/temp_brucegust.php
error_reporting(E_ALL);

$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);

foreach($arr as $file)
{
    $info = new SplFileInfo($file);
    if($info->getExtension()!= "gz") continue;
    echo PHP_EOL . "PROCESSING $file";

    $daniel = "SELECT file_name FROM raw_files WHERE file_name='$file'";
    $daniel_query=mysqli_query($cxn, $daniel);
    if(!$daniel_query)
    {
        var_dump($daniel);
        trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
    }
    $daniel_count=mysqli_num_rows($daniel_query);
    echo PHP_EOL . "FOUND $daniel_count DATABASE ROWS FOR $file";

    if ($daniel_count)
    {
        $now = date('c');
        $nelson="INSERT INTO raw_files (file_name, start_time) VALUES ('$file', '$now')";
        $nelson_query=mysqli_query($cxn, $nelson);
        if(!$nelson_query)
        {
            var_dump($nelson);
            trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
        }
        $new_id = $cxn->insert_id;
        echo PHP_EOL . "INSERTED ID=$new_id INTO raw_files TABLE";
    }
    else
    {
        trigger_error("NO DATA INSERTED FOR $file", E_USER_ERROR);
    }

    $buffer_size = 4096; // read 4kb at a time
    $out_file_name = str_replace('.gz', '',$file);

    $inp_handle = gzopen($file, 'rb');
    if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);

    $out_handle = fopen('JSON/'.$out_file_name, 'wb');
    if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);

    while(!gzeof($inp_handle))
    {
        $data = gzread($inp_handle);
        fwrite($out_handle, $data);
    }
    fclose($out_handle);
    gzclose($inp_handle);

    $now = date('c');
    $brice="UPDATE raw_files SET end_time = '$now' WHERE id=$new_id LIMIT 1";
    $brice_query=mysqli_query($cxn, $brice);
    if(!$brice_query)
    {
        var_dump($brice);
        trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
    }
}

echo "done!";

Open in new window

0
 

Author Comment

by:brucegust
ID: 40351020
Not a typo, just trying to figure out why the code doesn't "see" the JSON file that was supposedly just "opened."

The error that I'm getting is at line 40. It's there where I get Warning: gzopen(00_8ptcd6jgjn201309050000_day.json): failed to open stream: No such file or directory in C:\wamp\www\json\decompress.php on line 66

Why doesn't it see the file when I can see it in the directory?
0
 
LVL 34

Expert Comment

by:gr8gonzo
ID: 40351029
Your fopen BELOW the gzopen is what creates the output file. That's why you see it but gzopen doesnt. That said, you shouldnt be gzopen-ing the output file...
0
 

Author Comment

by:brucegust
ID: 40351049
Ray!

Here's the portion of code that you wrote that I experimented with:

<?php // demo/temp_brucegust.php
error_reporting(E_ALL);

$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);

foreach($arr as $file)
{
    $info = new SplFileInfo($file);
    if($info->getExtension()!= "gz") continue;
    echo PHP_EOL . "PROCESSING $file";



    $buffer_size = 4096; // read 4kb at a time
    $out_file_name = str_replace('.gz', '',$file);

    $inp_handle = gzopen($file, 'rb');
    if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);

    $out_handle = fopen('JSON/'.$out_file_name, 'wb');
    if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);

    while(!gzeof($inp_handle))
    {
        $data = gzread($inp_handle);
        fwrite($out_handle, $data);
    }
    fclose($out_handle);
    gzclose($inp_handle);


}

echo "done!";
?>

Open in new window


I figured I've got a ninja writing the decompressing code - that's the thing that's killing me right now. So, using that, this is the error I got:
ray.png
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 

Author Comment

by:brucegust
ID: 40351050
What do you think? Where am I blowing it?
<?php // demo/temp_brucegust.php
error_reporting(E_ALL);

$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);

foreach($arr as $file)
{
    $info = new SplFileInfo($file);
    if($info->getExtension()!= "gz") continue;
    echo PHP_EOL . "PROCESSING $file";



    $buffer_size = 4096; // read 4kb at a time
    $out_file_name = str_replace('.gz', '',$file);

    $inp_handle = gzopen($file, 'rb');
    if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);

    $out_handle = fopen('JSON/'.$out_file_name, 'wb');
    if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);

    while(!gzeof($inp_handle))
    {
        $data = gzread($inp_handle);
        fwrite($out_handle, $data);
    }
    fclose($out_handle);
    gzclose($inp_handle);


}

echo "done!";
?>

Open in new window

0
 
LVL 34

Expert Comment

by:gr8gonzo
ID: 40351075
Change:
$inp_handle = gzopen($file, 'rb');

To:
$inp_handle = gzopen('JSON/'.$file, 'rb');
0
 

Author Comment

by:brucegust
ID: 40351086
Gonzo!

After implementing your suggestion I get:

line 52 expects two parameters...

The line in question is $data=gzread($inp_handle).

Here's the code with your recommendations...

<?php // demo/temp_brucegust.php
error_reporting(E_ALL);

$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);

foreach($arr as $file)
{
    $info = new SplFileInfo($file);
    if($info->getExtension()!= "gz") continue;
    echo PHP_EOL . "PROCESSING $file";

    $buffer_size = 4096; // read 4kb at a time
    $out_file_name = str_replace('.gz', '',$file);

    $inp_handle = gzopen('JSON/'.$file, 'rb'); 
    if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);

    $out_handle = fopen('JSON/'.$out_file_name, 'wb');
    if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);

    while(!gzeof($inp_handle))
    {
        $data = gzread($inp_handle);
        fwrite($out_handle, $data);
    }
    fclose($out_handle);
    gzclose($inp_handle);

    $now = date('c');
    $brice="UPDATE raw_files SET end_time = '$now' WHERE id=$new_id LIMIT 1";
    $brice_query=mysqli_query($cxn, $brice);
    if(!$brice_query)
    {
        var_dump($brice);
        trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
    }
}

echo "done!";

?> 

Open in new window

0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 40351116
How can you get an error on line 52 in a script that has only 44 lines?  Are you sure you're testing the right script?
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 350 total points
ID: 40351122
Let's try this one...
<?php // demo/temp_brucegust.php
error_reporting(E_ALL);

$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);

foreach($arr as $file)
{
    $info = new SplFileInfo($file);
    if($info->getExtension()!= "gz") continue;
    echo PHP_EOL . "PROCESSING $file";

    $daniel = "SELECT file_name FROM raw_files WHERE file_name='$file'";
    $daniel_query=mysqli_query($cxn, $daniel);
    if(!$daniel_query)
    {
        var_dump($daniel);
        trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
    }
    $daniel_count=mysqli_num_rows($daniel_query);
    echo PHP_EOL . "FOUND $daniel_count DATABASE ROWS FOR $file";

    if ($daniel_count)
    {
        $now = date('c');
        $nelson="INSERT INTO raw_files (file_name, start_time) VALUES ('$file', '$now')";
        $nelson_query=mysqli_query($cxn, $nelson);
        if(!$nelson_query)
        {
            var_dump($nelson);
            trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
        }
        $new_id = $cxn->insert_id;
        echo PHP_EOL . "INSERTED ID=$new_id INTO raw_files TABLE";
    }
    else
    {
        trigger_error("NO DATA INSERTED FOR $file", E_USER_ERROR);
    }

    $buffer_size = 4096; // read 4kb at a time
    $out_file_name = str_replace('.gz', '',$file);

    $inp_handle = gzopen('JSON/' . $file, 'rb');
    if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);

    $out_handle = fopen('JSON/' . $out_file_name, 'wb');
    if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);

    while(!gzeof($inp_handle))
    {
        $data = gzread($inp_handle, $buffer_size);
        fwrite($out_handle, $data);
    }
    fclose($out_handle);
    gzclose($inp_handle);

    $now = date('c');
    $brice="UPDATE raw_files SET end_time = '$now' WHERE id=$new_id LIMIT 1";
    $brice_query=mysqli_query($cxn, $brice);
    if(!$brice_query)
    {
        var_dump($brice);
        trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
    }
}

echo "done!";

Open in new window

0
 

Author Comment

by:brucegust
ID: 40352018
That'll do it, Ray!

What was I not doing right?
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 40352106
Thanks for the points.  I don't really know what might have been wrong - when the error says line 52 but the script only has 44 lines, I don't read the code at all - I just try to produce something that I think might work.  

As a general rule, more data visualization is better when you're trying to debug some code, so you'll often see a lot of echo and var_dump() statements in my programming.  

As another general rule, the if() statement without the else{} control structure is often a path to confusion.  It's like saying "If something happens do this, but ignore the facts if something didn't happen."  That kind of selective way of thinking about facts leads to assumptions that often fail in unit tests.   If you like geek jokes, you'll appreciate this one:

The QA Engineer walks into a bar. Orders a beer. Orders 0 beers. Orders 999999999 beers. Orders a lizard. Orders -1 beers. Orders a sfdeljknesv.
0
 
LVL 34

Expert Comment

by:gr8gonzo
ID: 40352181
You didn't include the buffer size in your gzopen command.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Both Easy and Powerful How easy is PHP? http://lmgtfy.com?q=how+easy+is+php (http://lmgtfy.com?q=how+easy+is+php)  Very easy.  It has been described as "a programming language even my grandmother can use." How powerful is PHP?  http://en.wikiped…
Deprecated and Headed for the Dustbin By now, you have probably heard that some PHP features, while convenient, can also cause PHP security problems.  This article discusses one of those, called register_globals.  It is a thing you do not want.  …
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now