Bruce Gust
asked on
Why does this process the first file, then stop?
I've got a directory with two JSON files in it. The code below looks at the directory and then decompresses them one by one while simultaneously updating a database that keeps track of what files have been done, when they started being processed and when they finished.
It works!
But it does the first file, then just quits. When you look at the database that's keeping tabs on what's being done, I have this:
File Name Start Time End Time
00_8ptcd6jgjn201311070000_ day.json.g z | 2014-09-29 21:00:51| 0000-00-00 00:00:00
00_8ptcd6jgjn201311060000_ day.json.g z | 2014-09-29 21:01:39| 2014-09-29 21:01:51
At 21:00:51, the first file started and nothing happened. Then the second file started and I can see the JSON file in the directory just as it's supposed to be. Why did the first file not decompress? What am I missing?
Here's my code:
It works!
But it does the first file, then just quits. When you look at the database that's keeping tabs on what's being done, I have this:
File Name Start Time End Time
00_8ptcd6jgjn201311070000_
00_8ptcd6jgjn201311060000_
At 21:00:51, the first file started and nothing happened. Then the second file started and I can see the JSON file in the directory just as it's supposed to be. Why did the first file not decompress? What am I missing?
Here's my code:
<?php
$dir_name = 'JSON/';
if ($dh = opendir("$dir_name"))
{
while (($file = readdir($dh)) !== false)
{
//omitting the system default of listing "." and ".."
if ($file!="."&&$file!="..")
{
//make sure we're only reading files with a .gz extension
$info = new SplFileInfo($file);
if($info->getExtension()=="gz")
{
//at this point, look to see if the name of that file is in the database and needs to be processed
$daniel = "select file_name from raw_files where file_name='$file'";
$daniel_query=mysqli_query($cxn, $daniel);
if(!$daniel_query)
{
$rats=mysqli_errno($cxn).': '.mysqli_error($cxn);
die($rats);
}
$daniel_count=mysqli_num_rows($daniel_query);
if(!$daniel_count>0)
{
//insert current date and time into your raw_files table
$now= date('Y-m-d H:i:s');
$nelson="insert into raw_files (file_name, start_time) value('$file', '$now')";
$nelson_query=mysqli_query($cxn, $nelson);
if(!$nelson_query)
{
$nuts=mysqli_errno($cxn).': '.mysqli_error($cxn);
die($nuts);
}
$novie_id = $cxn->insert_id;
//here's your decompression code
$file_name = $file;
// Raising this value may increase performance
$buffer_size = 4096; // read 4kb at a time
$out_file_name = str_replace('.gz', '', $file_name);
// Open our files (in binary mode)
$the_file = gzopen($file_name, 'rb');
$out_file = fopen('JSON/'.$out_file_name, 'wb');
// Keep repeating until the end of the input file
while(!gzeof($the_file))
{
// Read buffer-size bytes
// Both fwrite and gzread and binary-safe
fwrite($out_file, gzread($the_file, $buffer_size));
}
// Files are done, close files
fclose($out_file);
gzclose($the_file);
//here's where you update the raw_files database with a time it was completed
$right_now= date('Y-m-d H:i:s');
$brice="update raw_files set end_time = '$right_now' where id=$novie_id";
$brice_query=mysqli_query($cxn, $brice)
or die("Brice didn't happen.");
}
//here's where you're doing your parsing and putting things into the verizon table
//$the_new_file=str_replace('.gz',"",$file);
//echo $the_new_file;
//start
//sleep(10);
}
}
}
}
closedir($dh);
echo "done!";
?>
As a general rule it's wise to test the return values from PHP functions. It looks like the script does not test the return value from $the_file = gzopen($file_name, 'rb');. You might also want to add error_reporting(E_ALL) to the top of the script. If you have these gz files on a public-facing server where we can test, we would welcome the URL of the directory, and we could test the script with some breakpoints and diagnostics.
Some interesting user-contributed notes on this page:
http://php.net/manual/en/function.gzread.php
You might also consider using scandir() since it will let you get the files in a predictable order.
Some interesting user-contributed notes on this page:
http://php.net/manual/en/function.gzread.php
You might also consider using scandir() since it will let you get the files in a predictable order.
ASKER
Yo, Gonzo!
I'm not sure I'm following you. I added the thing you suggested and I got something like 404040done.
I was able to identify something though, tell me if this doesn't help better determine where things are breaking down.
I commented out some things and renamed some variables in an effort to figure out what was going on. Here's the code as it looks now:
When I do "echo $file" I get "00_8ptcd6jgjn201309050000 _day.json. gz"
Perfect!
But when I run the code, I get "Warning: gzopen(00_8ptcd6jgjn201309 050000_day .json): failed to open stream: No such file or directory in C:\wamp\www\json\decompres s.php on line 66" which is this part of the code:
$out_file_name = str_replace('.gz', '',$file);
// Open our files (in binary mode)
$the_file = gzopen($out_file_name, 'rb');
Specifically, "$the_file"
When I go out to the directory, I see 00_8ptcd6jgjn201309050000_ day.json, so the file is there, yet the page says that it doesn't exist.
What do you think?
I'm not sure I'm following you. I added the thing you suggested and I got something like 404040done.
I was able to identify something though, tell me if this doesn't help better determine where things are breaking down.
I commented out some things and renamed some variables in an effort to figure out what was going on. Here's the code as it looks now:
<?php
$dir_name = 'JSON/';
if ($dh = opendir("$dir_name"))
{
while (($file = readdir($dh)) !== false)
{
//omitting the system default of listing "." and ".."
if ($file!="."&&$file!="..")
{
//make sure we're only reading files with a .gz extension
$info = new SplFileInfo($file);
if($info->getExtension()=="gz")
{
//at this point, look to see if the name of that file is in the database and needs to be processed
$daniel = "select file_name from raw_files where file_name='$file'";
$daniel_query=mysqli_query($cxn, $daniel);
if(!$daniel_query)
{
$rats=mysqli_errno($cxn).': '.mysqli_error($cxn);
die($rats);
}
$daniel_count=mysqli_num_rows($daniel_query);
if(!$daniel_count>0)
{
//insert current date and time into your raw_files table
/*$now= date('Y-m-d H:i:s');
$nelson="insert into raw_files (file_name, start_time) value('$file', '$now')";
$nelson_query=mysqli_query($cxn, $nelson);
if(!$nelson_query)
{
$nuts=mysqli_errno($cxn).': '.mysqli_error($cxn);
die($nuts);
}
$novie_id = $cxn->insert_id;*/
//here's your decompression code
// Raising this value may increase performance
$buffer_size = 4096; // read 4kb at a time
$out_file_name = str_replace('.gz', '',$file);
// Open our files (in binary mode)
$the_file = gzopen($out_file_name, 'rb');
$out_file = fopen('JSON/'.$out_file_name, 'wb');
// Keep repeating until the end of the input file
while(!gzeof($file))
{
// Read buffer-size bytes
// Both fwrite and gzread and binary-safe
fwrite($out_file, gzread($file, $buffer_size));
}
// Files are done, close files
fclose($out_file);
gzclose($the_file);
//here's where you update the raw_files database with a time it was completed
/*$right_now= date('Y-m-d H:i:s');
$brice="update raw_files set end_time = '$right_now' where id=$novie_id";
$brice_query=mysqli_query($cxn, $brice)
or die("Brice didn't happen.");*/
}
//here's where you're doing your parsing and putting things into the verizon table
//$the_new_file=str_replace('.gz',"",$file);
//echo $the_new_file;
//start
//sleep(10);
}
}
}
}
closedir($dh);
echo "done!";
?>
When I do "echo $file" I get "00_8ptcd6jgjn201309050000
Perfect!
But when I run the code, I get "Warning: gzopen(00_8ptcd6jgjn201309
$out_file_name = str_replace('.gz', '',$file);
// Open our files (in binary mode)
$the_file = gzopen($out_file_name, 'rb');
Specifically, "$the_file"
When I go out to the directory, I see 00_8ptcd6jgjn201309050000_
What do you think?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
You might try it something like this. You will be able to see the variables at certain points in the process, and the script should stop with an error message if something is completely out of whack.
<?php // demo/temp_brucegust.php
error_reporting(E_ALL);
$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);
foreach($arr as $file)
{
$info = new SplFileInfo($file);
if($info->getExtension()!= "gz") continue;
echo PHP_EOL . "PROCESSING $file";
$daniel = "SELECT file_name FROM raw_files WHERE file_name='$file'";
$daniel_query=mysqli_query($cxn, $daniel);
if(!$daniel_query)
{
var_dump($daniel);
trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
}
$daniel_count=mysqli_num_rows($daniel_query);
echo PHP_EOL . "FOUND $daniel_count DATABASE ROWS FOR $file";
if ($daniel_count)
{
$now = date('c');
$nelson="INSERT INTO raw_files (file_name, start_time) VALUES ('$file', '$now')";
$nelson_query=mysqli_query($cxn, $nelson);
if(!$nelson_query)
{
var_dump($nelson);
trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
}
$new_id = $cxn->insert_id;
echo PHP_EOL . "INSERTED ID=$new_id INTO raw_files TABLE";
}
else
{
trigger_error("NO DATA INSERTED FOR $file", E_USER_ERROR);
}
$buffer_size = 4096; // read 4kb at a time
$out_file_name = str_replace('.gz', '',$file);
$inp_handle = gzopen($file, 'rb');
if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);
$out_handle = fopen('JSON/'.$out_file_name, 'wb');
if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);
while(!gzeof($inp_handle))
{
$data = gzread($inp_handle);
fwrite($out_handle, $data);
}
fclose($out_handle);
gzclose($inp_handle);
$now = date('c');
$brice="UPDATE raw_files SET end_time = '$now' WHERE id=$new_id LIMIT 1";
$brice_query=mysqli_query($cxn, $brice);
if(!$brice_query)
{
var_dump($brice);
trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
}
}
echo "done!";
ASKER
Not a typo, just trying to figure out why the code doesn't "see" the JSON file that was supposedly just "opened."
The error that I'm getting is at line 40. It's there where I get Warning: gzopen(00_8ptcd6jgjn201309 050000_day .json): failed to open stream: No such file or directory in C:\wamp\www\json\decompres s.php on line 66
Why doesn't it see the file when I can see it in the directory?
The error that I'm getting is at line 40. It's there where I get Warning: gzopen(00_8ptcd6jgjn201309
Why doesn't it see the file when I can see it in the directory?
Your fopen BELOW the gzopen is what creates the output file. That's why you see it but gzopen doesnt. That said, you shouldnt be gzopen-ing the output file...
ASKER
Ray!
Here's the portion of code that you wrote that I experimented with:
I figured I've got a ninja writing the decompressing code - that's the thing that's killing me right now. So, using that, this is the error I got:
ray.png
Here's the portion of code that you wrote that I experimented with:
<?php // demo/temp_brucegust.php
error_reporting(E_ALL);
$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);
foreach($arr as $file)
{
$info = new SplFileInfo($file);
if($info->getExtension()!= "gz") continue;
echo PHP_EOL . "PROCESSING $file";
$buffer_size = 4096; // read 4kb at a time
$out_file_name = str_replace('.gz', '',$file);
$inp_handle = gzopen($file, 'rb');
if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);
$out_handle = fopen('JSON/'.$out_file_name, 'wb');
if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);
while(!gzeof($inp_handle))
{
$data = gzread($inp_handle);
fwrite($out_handle, $data);
}
fclose($out_handle);
gzclose($inp_handle);
}
echo "done!";
?>
I figured I've got a ninja writing the decompressing code - that's the thing that's killing me right now. So, using that, this is the error I got:
ray.png
ASKER
What do you think? Where am I blowing it?
<?php // demo/temp_brucegust.php
error_reporting(E_ALL);
$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);
foreach($arr as $file)
{
$info = new SplFileInfo($file);
if($info->getExtension()!= "gz") continue;
echo PHP_EOL . "PROCESSING $file";
$buffer_size = 4096; // read 4kb at a time
$out_file_name = str_replace('.gz', '',$file);
$inp_handle = gzopen($file, 'rb');
if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);
$out_handle = fopen('JSON/'.$out_file_name, 'wb');
if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);
while(!gzeof($inp_handle))
{
$data = gzread($inp_handle);
fwrite($out_handle, $data);
}
fclose($out_handle);
gzclose($inp_handle);
}
echo "done!";
?>
Change:
$inp_handle = gzopen($file, 'rb');
To:
$inp_handle = gzopen('JSON/'.$file, 'rb');
$inp_handle = gzopen($file, 'rb');
To:
$inp_handle = gzopen('JSON/'.$file, 'rb');
ASKER
Gonzo!
After implementing your suggestion I get:
line 52 expects two parameters...
The line in question is $data=gzread($inp_handle).
Here's the code with your recommendations...
After implementing your suggestion I get:
line 52 expects two parameters...
The line in question is $data=gzread($inp_handle).
Here's the code with your recommendations...
<?php // demo/temp_brucegust.php
error_reporting(E_ALL);
$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);
foreach($arr as $file)
{
$info = new SplFileInfo($file);
if($info->getExtension()!= "gz") continue;
echo PHP_EOL . "PROCESSING $file";
$buffer_size = 4096; // read 4kb at a time
$out_file_name = str_replace('.gz', '',$file);
$inp_handle = gzopen('JSON/'.$file, 'rb');
if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);
$out_handle = fopen('JSON/'.$out_file_name, 'wb');
if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);
while(!gzeof($inp_handle))
{
$data = gzread($inp_handle);
fwrite($out_handle, $data);
}
fclose($out_handle);
gzclose($inp_handle);
$now = date('c');
$brice="UPDATE raw_files SET end_time = '$now' WHERE id=$new_id LIMIT 1";
$brice_query=mysqli_query($cxn, $brice);
if(!$brice_query)
{
var_dump($brice);
trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
}
}
echo "done!";
?>
How can you get an error on line 52 in a script that has only 44 lines? Are you sure you're testing the right script?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
That'll do it, Ray!
What was I not doing right?
What was I not doing right?
Thanks for the points. I don't really know what might have been wrong - when the error says line 52 but the script only has 44 lines, I don't read the code at all - I just try to produce something that I think might work.
As a general rule, more data visualization is better when you're trying to debug some code, so you'll often see a lot of echo and var_dump() statements in my programming.
As another general rule, the if() statement without the else{} control structure is often a path to confusion. It's like saying "If something happens do this, but ignore the facts if something didn't happen." That kind of selective way of thinking about facts leads to assumptions that often fail in unit tests. If you like geek jokes, you'll appreciate this one:
The QA Engineer walks into a bar. Orders a beer. Orders 0 beers. Orders 999999999 beers. Orders a lizard. Orders -1 beers. Orders a sfdeljknesv.
As a general rule, more data visualization is better when you're trying to debug some code, so you'll often see a lot of echo and var_dump() statements in my programming.
As another general rule, the if() statement without the else{} control structure is often a path to confusion. It's like saying "If something happens do this, but ignore the facts if something didn't happen." That kind of selective way of thinking about facts leads to assumptions that often fail in unit tests. If you like geek jokes, you'll appreciate this one:
The QA Engineer walks into a bar. Orders a beer. Orders 0 beers. Orders 999999999 beers. Orders a lizard. Orders -1 beers. Orders a sfdeljknesv.
You didn't include the buffer size in your gzopen command.
Try adding:
Open in new window
...to various parts AFTER the insert queries and then run it. You should be able to see where the line #s stop (first file) and restart (second file) - that might give some insight.