The correct use of the sleep function in php

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

gr8gonzo

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

Gonz...

Thanks for taking the time to weigh in. Let me respond by giving you the "back-story" so you can see where all this is going.

The 365 files is a one time "data dump." The files need to be decompressed, parsed and then I'm storing all the info in a single table that's been indexed so as to facilitate efficient queries.

The end user will be entering a date as well as some latitude and longitude values resulting in a recordset that they can then export as a csv file.

Everything that you've seen me struggle with these last few days has as its target that user interface. I rarely work with this much data, so I'm a sponge, trying to soak up all the info I can in order to put together a process that takes all this info and stores it in a way that can be used.

Ray's suggestion resonates as a solid solution and I'm looking forward to popping the hood on that approach and making it work.

First off, however, here's my script, top to bottom. It works, but should you see any room for improvement, I'm all ears. Also, while I understand the logic of Ray's suggestion, I'm google-ing, even as we speak, looking for a tutorial that will walk me through that process.

Bring it!

<?php
error_reporting(E_ALL);

$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);

foreach($arr as $file)
{
    $info = new SplFileInfo($file);
    if($info->getExtension()!= "gz") continue;
    //echo PHP_EOL . "PROCESSING $file";

    $daniel = "SELECT file_name FROM raw_files WHERE file_name='$file'";
    $daniel_query=mysqli_query($cxn, $daniel);
    if(!$daniel_query)
    {
        var_dump($daniel);
        trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
    }
    $daniel_count=mysqli_num_rows($daniel_query);
	//good to go up to here
   // echo PHP_EOL . "FOUND $daniel_count DATABASE ROWS FOR $file";

    if ($daniel_count==0)
    {
        $now = date('c');
        $nelson="INSERT INTO raw_files (file_name, start_time) VALUES ('$file', '$now')";
        $nelson_query=mysqli_query($cxn, $nelson);
        if(!$nelson_query)
        {
            var_dump($nelson);
            trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
        }
        $new_id = $cxn->insert_id;
        //echo PHP_EOL . "INSERTED ID=$new_id INTO raw_files TABLE";
     
    $buffer_size = 4096; // read 4kb at a time
    $out_file_name = str_replace('.gz', '',$file);

    $inp_handle = gzopen('JSON/' . $file, 'rb');
    if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);

    $out_handle = fopen('JSON/' . $out_file_name, 'wb');
    if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);

		while(!gzeof($inp_handle))
		{
			$data = gzread($inp_handle, $buffer_size);
			fwrite($out_handle, $data);
		}
		fclose($out_handle);
		gzclose($inp_handle);

		//at this point, you've decompressed your file, now you do your parsing
		$the_new_file=str_replace('.gz',"",$file);
		$chunk_size=4096;
		$url="JSON/";
		$url .=$the_new_file;
		$handle=@fopen($url,'r');
			if(!$handle) 
			{
				echo "failed to open JSON file";
			}
		while (!feof($handle)) 
		{
		$buffer = fgets($handle, $chunk_size);
			if(trim($buffer)!=='')
			{
			$obj=json_decode(($buffer), true);
			
			include('clean_up.php');	
			
			$insert = "insert into verizon (actor_id, actor_display_name, posted_time, display_name, geo_coords_0, geo_coords_1, location_name, posted_day) 
			values ('$actor_id', '$actor_display_name', '$posted_time', '$display_name', '$geo_coords_0', '$geo_coords_1', '$location_name', '$posted_day')";
				$insertexe = mysqli_query($cxn, $insert);
				if(!$insertexe) {
				$error = mysqli_errno($cxn).': '.mysqli_error($cxn);
				die($error);
				}
				//echo $row_count.' | '. $obj['actor']['id'].' | '.$obj['actor']['displayName'].' | '.$obj['postedTime'].' | '.$obj['generator']['displayName'].' | '.$obj['geo']['coordinates']['0'].' | '.$obj['geo']['coordinates']['1'].' | '.$obj['location']['name'].' '.$trigger.'<br>';
			}
		}
		fclose($handle);
		//you're done parsing and decompressing. Now we update the raw_files table with the time we completed the processing
		$now = date('c');
		$brice="UPDATE raw_files SET end_time = '$now' WHERE id=$new_id LIMIT 1";
		$brice_query=mysqli_query($cxn, $brice);
			if(!$brice_query)
			{
				var_dump($brice);
				trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
			}
	}
	else
    {
	//you've already processed this file
    //trigger_error("NO DATA INSERTED FOR $file", E_USER_ERROR);
	continue;
    }
}

echo "done!";
?>

ASKER

Ray, I'm looking and I'm not finding anything that breaks your suggestion down into academically bite sized pieces for this hardcharger to grasp (https://www.google.com/search?q=php+post+method+request).

It seems like, the code that I have currently falls in line with what you're suggesting right up to line 27:

<?php
error_reporting(E_ALL);

$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);

foreach($arr as $file)
{
    $info = new SplFileInfo($file);
    if($info->getExtension()!= "gz") continue;
    //echo PHP_EOL . "PROCESSING $file";

    $daniel = "SELECT file_name FROM raw_files WHERE file_name='$file'";
    $daniel_query=mysqli_query($cxn, $daniel);
    if(!$daniel_query)
    {
        var_dump($daniel);
        trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
    }
    $daniel_count=mysqli_num_rows($daniel_query);
	//good to go up to here
   // echo PHP_EOL . "FOUND $daniel_count DATABASE ROWS FOR $file";

    if ($daniel_count==0)
    {

Yes?

I'm going through the files as they exist in the directory, I'm looking in my "raw_files" table to see if that particular file has been processed and at line 27 I'm doing my parsing and decompressing.

What I hear you saying is that once that file has been processed, instead of continuing with the for loop, I'm going to...

do a redirect to a page where I just initiate the whole thing all over again
do something along the lines of a POST-method request that involves things I don't pretend to understand at this point

Poised on the threshold of greatness. What is that POST-method request and how do I implement it here?

<?php
error_reporting(E_ALL);

$dir = 'JSON/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);

foreach($arr as $file)
{
    $info = new SplFileInfo($file);
    if($info->getExtension()!= "gz") continue;
    //echo PHP_EOL . "PROCESSING $file";

    $daniel = "SELECT file_name FROM raw_files WHERE file_name='$file'";
    $daniel_query=mysqli_query($cxn, $daniel);
    if(!$daniel_query)
    {
        var_dump($daniel);
        trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
    }
    $daniel_count=mysqli_num_rows($daniel_query);
	//good to go up to here
   // echo PHP_EOL . "FOUND $daniel_count DATABASE ROWS FOR $file";

    if ($daniel_count==0)
    {
        $now = date('c');
        $nelson="INSERT INTO raw_files (file_name, start_time) VALUES ('$file', '$now')";
        $nelson_query=mysqli_query($cxn, $nelson);
        if(!$nelson_query)
        {
            var_dump($nelson);
            trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
        }
        $new_id = $cxn->insert_id;
        //echo PHP_EOL . "INSERTED ID=$new_id INTO raw_files TABLE";
     
    $buffer_size = 4096; // read 4kb at a time
    $out_file_name = str_replace('.gz', '',$file);

    $inp_handle = gzopen('JSON/' . $file, 'rb');
    if (!$inp_handle) trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);

    $out_handle = fopen('JSON/' . $out_file_name, 'wb');
    if (!$out_handle) trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);

		while(!gzeof($inp_handle))
		{
			$data = gzread($inp_handle, $buffer_size);
			fwrite($out_handle, $data);
		}
		fclose($out_handle);
		gzclose($inp_handle);

		//at this point, you've decompressed your file, now you do your parsing
		$the_new_file=str_replace('.gz',"",$file);
		$chunk_size=4096;
		$url="JSON/";
		$url .=$the_new_file;
		$handle=@fopen($url,'r');
			if(!$handle) 
			{
				echo "failed to open JSON file";
			}
		while (!feof($handle)) 
		{
		$buffer = fgets($handle, $chunk_size);
			if(trim($buffer)!=='')
			{
			$obj=json_decode(($buffer), true);
			
			include('clean_up.php');	
			
			$insert = "insert into verizon (actor_id, actor_display_name, posted_time, display_name, geo_coords_0, geo_coords_1, location_name, posted_day) 
			values ('$actor_id', '$actor_display_name', '$posted_time', '$display_name', '$geo_coords_0', '$geo_coords_1', '$location_name', '$posted_day')";
				$insertexe = mysqli_query($cxn, $insert);
				if(!$insertexe) {
				$error = mysqli_errno($cxn).': '.mysqli_error($cxn);
				die($error);
				}
				//echo $row_count.' | '. $obj['actor']['id'].' | '.$obj['actor']['displayName'].' | '.$obj['postedTime'].' | '.$obj['generator']['displayName'].' | '.$obj['geo']['coordinates']['0'].' | '.$obj['geo']['coordinates']['1'].' | '.$obj['location']['name'].' '.$trigger.'<br>';
			}
		}
		fclose($handle);
		//you're done parsing and decompressing. Now we update the raw_files table with the time we completed the processing
		$now = date('c');
		$brice="UPDATE raw_files SET end_time = '$now' WHERE id=$new_id LIMIT 1";
		$brice_query=mysqli_query($cxn, $brice);
			if(!$brice_query)
			{
				var_dump($brice);
				trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
			}
	}
	else
    {
	//you've already processed this file
    //trigger_error("NO DATA INSERTED FOR $file", E_USER_ERROR);
	continue;
    }
}

echo "done!";
?>

gr8gonzo

Just to clarify, I was agreeing with Ray's general suggestion about using a management kind of process. I was simply expanding on it with some additional suggestions.

Basically, Ray was suggesting a serial/sequential, 1-by-1 loop through all the files, processing one at a time (whichever one has not been processed yet), and restarting the script at the end. There's nothing wrong about it, but you can typically be more efficient than that (by processing more than one file at a time). A serial loop (one after another) is fine when you're looking at small quantities of things or when one item depends on another completing first, but in data import / processing cases, you're often better off doing some parallel processing. Otherwise, you -will- be forced into a 61-hour loop at minimum, when you could be doing all files in a fraction of that time.

@brucegust: Please post some test data. I'll show you what I'm talking about with a code example. It does not have to be serial, one-at-a-time for a 61-hour process, but writing an explanation is going to take longer than just showing you the code. FWIW this is a fairly advanced topic in application design, and also very useful.

ASKER

Morning, guys!

Ray, here's some sample data:

{"id":"tag:search.twitter.com,2005:389903668427763712","objectType":"activity","actor":{"objectType":"person","id":"id:twitter.com:91239297","link":"http://www.twitter.com/OGkush103","displayName":"WalkingLick74","postedTime":"2009-11-20T01:21:39.000Z","image":"https://si0.twimg.com/profile_images/378800000593715086/755411d8bdc495472c2d7ed50e319582_normal.jpeg","summary":"Self-Made, Self Paid..... I always had the mind to get it like a man, head first bout my younging Ean! #YOLO","links":[{"href":null,"rel":"me"}],"friendsCount":468,"followersCount":677,"listedCount":0,"statusesCount":25504,"twitterTimeZone":"Alaska","verified":false,"utcOffset":"-28800","preferredUsername":"OGkush103","languages":["en"],"location":{"objectType":"place","displayName":"Boston George Crib"},"favoritesCount":26},"verb":"post","postedTime":"2013-10-15T00:00:53.000Z","generator":{"displayName":"Twitter for iPhone","link":"http://twitter.com/download/iphone"},"provider":{"objectType":"service","displayName":"Twitter","link":"http://www.twitter.com"},"link":"http://twitter.com/OGkush103/statuses/389903668427763712","body":"You a killer you on twitter, You'n do NO talking","object":{"objectType":"note","id":"object:search.twitter.com,2005:389903668427763712","summary":"You a killer you on twitter, You'n do NO talking","link":"http://twitter.com/OGkush103/statuses/389903668427763712","postedTime":"2013-10-15T00:00:53.000Z"},"favoritesCount":0,"location":{"objectType":"place","displayName":"Mississippi, US","name":"Mississippi","country_code":"United States","twitter_country_code":"US","link":"https://api.twitter.com/1.1/geo/id/43d2418301bf1a49.json","geo":{"type":"Polygon","coordinates":[[[-91.65500899999999,30.146096],[-91.65500899999999,34.996099],[-88.097888,34.996099],[-88.097888,30.146096]]]}},"geo":{"type":"Point","coordinates":[31.99686058,-88.72688823]},"twitter_entities":{"hashtags":[],"symbols":[],"urls":[],"user_mentions":[]},"twitter_filter_level":"medium","twitter_lang":"en","retweetCount":0,"gnip":{"matching_rules":[{"tag":null}],"language":{"value":"en"}}}

The data that I'm grabbing from the above is documented in the attached clean_up.php file, although I think I'm going to the "twitter id" field as well to ensure my not duplicating records (tag:search.twitter.com,2005:389903668427763712).

While I've no doubt that you can readily discern what my fields are, this url provides a clean "view" of what's there and what I'm grabbing: http://konklone.io/json/
clean-up.php

ASKER

As an aside, I've attached my "process" thus far. Ray, after testing the process on a couple of files, I commented the trigger_error messages out for fear that the process would quit overnite. In hindsight, that may not have been a good move in light of what I saw when I came in this morning.

Bottom line: No errors, but I had 27,596,100 rows of parsed data, but only two days worth of JSON files. Upon closer inspection, the table that I'm using to "manage" the process - a list of the files that need to be parsed with a "start" and "end" time along with a "completed" column - had nothing in the "completed" column that represented a finished process. I'm thinking that means, I looped through the same JSON file over and over again since, having tested this, one JSON file is 450K rows. Two days should be around 1,000,000 rows, not 27,000,000.

I'm going to save what I've got, but here are my marching orders today:

add the twitter id field to my table and check to make sure I'm not getting ready to duplicate a record when I got to insert the parsed data

figure out why on the parse.php file, my "zipped_files" table wasn't properly updated as far as the "completed" column being updated with a "2." I'm thinking in light of my first select statement looking for rows that have not been completed, there's a chance I just kept doing the same file over and over again because of that flaw in the update statement.

Here's my first file that sees the .gz files and decompresses them:

<?php
ini_set('max_execution_time', 144000); //300 seconds = 5 minutes
include("carter.inc");
$cxn = mysqli_connect($host,$user,$password,$database)
or die ("couldn't connect to server");

error_reporting(E_ALL);

$dir = 'E:/verizon/all_files/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);
$daniel_check=0;

foreach($arr as $file)
{
    $info = new SplFileInfo($file);
    if($info->getExtension()!= "gz") continue;
    //echo PHP_EOL . "PROCESSING $file";

    $daniel = "SELECT file_name, completed, id FROM zipped_files WHERE file_name='$file' and completed=1";
    $daniel_query=mysqli_query($cxn, $daniel);
    $daniel_count=mysqli_num_rows($daniel_query);
	//good to go up to here
   // echo PHP_EOL . "FOUND $daniel_count DATABASE ROWS FOR $file";
		
    if ($daniel_count==1)
    {
	$daniel_row=mysqli_fetch_assoc($daniel_query);
	extract($daniel_row);
        $now = date('c');
        $nelson="update zipped_files set start_time ='$now' where id ='$daniel_row[id]'";
        $nelson_query=mysqli_query($cxn, $nelson);
        if(!$nelson_query)
        {
            //var_dump($nelson);
           // trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
		   //if something goes south here, you can't proceed because you won't have a $new_id value
        }
        $new_id = $daniel_row['id'];
        //echo PHP_EOL . "INSERTED ID=$new_id INTO raw_files TABLE";
     
    $buffer_size = 4096; // read 4kb at a time
    $out_file_name = str_replace('.gz', '',$file);

    $inp_handle = gzopen('E:/verizon/all_files/' . $file, 'rb');
    if (!$inp_handle)// trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);
	continue;

    $out_handle = fopen('E:/verizon/all_files/' . $out_file_name, 'wb');
    if (!$out_handle) //trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);
	continue;
		while(!gzeof($inp_handle))
		{
			$data = gzread($inp_handle, $buffer_size);
			fwrite($out_handle, $data);
		}
		fclose($out_handle);
		gzclose($inp_handle);
		
	header("Location:parse.php?id=$new_id");	
	exit();
	}
	else
    {
	//you've already processed this file
    //trigger_error("NO DATA INSERTED FOR $file", E_USER_ERROR);
	continue;
    }
}

$message="All files have been processed!";
?>

<!DOCTYPE html>
<html lang="en">
<head>
<title>Twitter Usage Search Page</title>
<link href="style.css" rel="stylesheet" type="text/css" />
</head>

<body>

&nbsp;<span style="font-size:18pt; font-weight:strong;">Twitter JSON Processing Page</span>
<br><br>
This script handles the decompression and parsing of the Twitter JSON Files.
<br><br>
<div id="title">&nbsp;Twitter JSON Parsing Machine<div style="float:right;">click <a href="search.php" style="color:#ffffff;">here</a> to return to the search page&nbsp;</div></div>	<br><br>
<?php echo $message; ?>

</body>

</html>

After it finishes, on line 61, I do a redirect to parse.php. Here's that page:

<?php
ini_set('max_execution_time', 144000); //300 seconds = 5 minutes
include("carter.inc");
$cxn = mysqli_connect($host,$user,$password,$database)
or die ("couldn't connect to server");

error_reporting(E_ALL);

$dir = 'E:/verizon/all_files/';
$arr = scandir($dir);
unset($arr[0]);
unset($arr[1]);
$daniel_check=0;

foreach($arr as $file)
{
    $info = new SplFileInfo($file);
    if($info->getExtension()!= "gz") continue;
    //echo PHP_EOL . "PROCESSING $file";

    $daniel = "SELECT file_name, completed, id FROM zipped_files WHERE file_name='$file' and completed=1";
    $daniel_query=mysqli_query($cxn, $daniel);
    $daniel_count=mysqli_num_rows($daniel_query);
	//good to go up to here
   // echo PHP_EOL . "FOUND $daniel_count DATABASE ROWS FOR $file";
		
    if ($daniel_count==1)
    {
	$daniel_row=mysqli_fetch_assoc($daniel_query);
	extract($daniel_row);
        $now = date('c');
        $nelson="update zipped_files set start_time ='$now' where id ='$daniel_row[id]'";
        $nelson_query=mysqli_query($cxn, $nelson);
        if(!$nelson_query)
        {
            //var_dump($nelson);
           // trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
		   //if something goes south here, you can't proceed because you won't have a $new_id value
        }
        $new_id = $daniel_row['id'];
        //echo PHP_EOL . "INSERTED ID=$new_id INTO raw_files TABLE";
     
    $buffer_size = 4096; // read 4kb at a time
    $out_file_name = str_replace('.gz', '',$file);

    $inp_handle = gzopen('E:/verizon/all_files/' . $file, 'rb');
    if (!$inp_handle)// trigger_error("UNABLE TO GZOPEN $file", E_USER_ERROR);
	continue;

    $out_handle = fopen('E:/verizon/all_files/' . $out_file_name, 'wb');
    if (!$out_handle) //trigger_error("UNABLE TO FOPEN $out_file_name", E_USER_ERROR);
	continue;
		while(!gzeof($inp_handle))
		{
			$data = gzread($inp_handle, $buffer_size);
			fwrite($out_handle, $data);
		}
		fclose($out_handle);
		gzclose($inp_handle);
		
	header("Location:parse.php?id=$new_id");	
	exit();
	}
	else
    {
	//you've already processed this file
    //trigger_error("NO DATA INSERTED FOR $file", E_USER_ERROR);
	continue;
    }
}

$message="All files have been processed!";
?>

<!DOCTYPE html>
<html lang="en">
<head>
<title>Twitter Usage Search Page</title>
<link href="style.css" rel="stylesheet" type="text/css" />
</head>

<body>

&nbsp;<span style="font-size:18pt; font-weight:strong;">Twitter JSON Processing Page</span>
<br><br>
This script handles the decompression and parsing of the Twitter JSON Files.
<br><br>
<div id="title">&nbsp;Twitter JSON Parsing Machine<div style="float:right;">click <a href="search.php" style="color:#ffffff;">here</a> to return to the search page&nbsp;</div></div>	<br><br>
<?php echo $message; ?>

</body>

</html>

It's at line 46 the update statement should've updated the "completed" column to a "2" and it didn't. I've got to figure out why that happened.

Mind you, I'm completely open to suggestions, especially those that allow for a more efficient process. I would love to be able to have this thing wrapped up by tomorrow morning.

ASKER

For some reason the "parse.php" page didn't copy over. I just now saw that. Here's the parse page:

<?php
ini_set('max_execution_time', 144000); //300 seconds = 5 minutes
include("carter.inc");
$cxn = mysqli_connect($host,$user,$password,$database)
or die ("couldn't connect to server");
$vivian="select file_name, id from zipped_files where id='$_GET[id]'";
$vivian_query=mysqli_query($cxn, $vivian)
or die("Couldn't execute query.");
$vivian_row=mysqli_fetch_assoc($vivian_query);
extract($vivian_row);
$file=$vivian_row['file_name'];
$new_id=$vivian_row['id'];

//at this point, you've decompressed your file, now you do your parsing
$the_new_file=str_replace('.gz',"",$file);
$chunk_size=4096;
$url="E:/verizon/all_files/";
$url .=$the_new_file;
$handle=@fopen($url,'r');
	if(!$handle) 
	{
		echo "failed to open JSON file";
	}
while (!feof($handle)) 
{
$buffer = fgets($handle, $chunk_size);
	if(trim($buffer)!=='')
	{
	$obj=json_decode(($buffer), true);
	
	include('clean_up.php');	
	
	$insert = "insert into verizon (actor_id, actor_display_name, posted_time, display_name, geo_coords_0, geo_coords_1, location_name, posted_day) 
	values ('$actor_id', '$actor_display_name', '$posted_time', '$display_name', '$geo_coords_0', '$geo_coords_1', '$location_name', '$posted_day')";
		$insertexe = mysqli_query($cxn, $insert);
		if(!$insertexe) {
		$error = mysqli_errno($cxn).': '.mysqli_error($cxn);
		die($error);
		}
		//echo $row_count.' | '. $obj['actor']['id'].' | '.$obj['actor']['displayName'].' | '.$obj['postedTime'].' | '.$obj['generator']['displayName'].' | '.$obj['geo']['coordinates']['0'].' | '.$obj['geo']['coordinates']['1'].' | '.$obj['location']['name'].' '.$trigger.'<br>';
	}
}
fclose($handle);
//you're done parsing and decompressing. Now we update the zipped_files table with the time we completed the processing
$now = date('c');
$brice="UPDATE zipped_files SET end_time = '$now',
completed=2 WHERE id=$new_id LIMIT 1";
$brice_query=mysqli_query($cxn, $brice);
	if(!$brice_query)
	{
		var_dump($brice);
		trigger_error(mysqli_errno($cxn).': '.mysqli_error($cxn), E_USER_ERROR);
	}
header("Location:breather.php");
exit();
?>

I'm getting a sense that we're lacking consolidation of thought on this problem, and it's turning from a question into an application development project. For that sort of thing you might want to consider hiring a professional application developer.

Let me try to summarize what I believe to be true and ask you for a few other pieces of information so that I have a chance to get some part of this working in my own test environment.

1. You have a data source that enables you to get GZ files. These files, once uncompressed, contain JSON strings that have some sort of Twitter data.
Where can I get the same GZ files, in the same format and quantity?

2. You have a database table zipped_files.
Please post the CREATE TABLE statement.

3. You have a database table verizon.
Please post the CREATE TABLE statement.

4. You have this: include('clean_up.php')
Please post the source for that script.

5. In the most recent postings you've got some extract() statements.
Have you verified that these are necessary and not overwriting any important variables?

Are there any other moving parts or pieces of the puzzle that I'm missing?

ASKER

Ray, I apologize. I can see your point that this is no longer a question and the scope of my inquiry requires more than just a brief word of wisdom.

What I've got is working, although it's slow. We'll keep at it and we'll go from there.

No apology solicited or needed at all. I can't see any reason why it's so slow, or why it should be so slow at all. If I could get to the test data, I could probably show you a design that would be faster. But with only one of the JSON strings, no access to the GZ files, and no information about the database tables, I'm kind of flying blind. If you want to show us those things, please post a new question. Thanks.

ASKER

I'll open up another question and get you the "stuff" you asked for.

Thanks!