Link to home
Start Free TrialLog in
Avatar of chrisj1963
chrisj1963

asked on

Combine two php scripts - loop

Hi - I have 2 scripts that I would like to combine.  

script1.php allows a user to input a domain name and a list of keywords and then checks to see if the domain/url exists in a Google local search results (map area in the google serps) box.

For example: go to http://www.prontopage.net/localsearch/script1.php

enter   wausaulaw.com in the url text box

enter the following keyword phrases in the text area box
wisconsin mesothelioma attorney
wausau divorce attorney
wausau bankruptsy lawyer

The result is as follows:

KEYWORD PHRASE                           POSITION
wisconsin mesothelioma attorney - 1
wisconsin mesothelioma attorney - 2
wausau divorce attorney                   - 1
wausau divorce attorney                   - 7
wausau bankruptsy lawyer               - not matched
wausau tax attorney                           - not matched

which tells us that wausaulaw.com has a google local search listing in positions 1 and 2 for "wisconsin mesothelioma attorney, positions 1 and 7 for "wausau divorce attorney" and no listing under "wausau bankruptsy lawyer" or "wausau tax attorney".

I have a second script that takes a single keyword phrase and performs an "intitle:" phrase match search and displays the result.

Please see script2.php here:
For example: go to http://www.prontopage.net/localsearch/script2.php

Enter    wisconsin mesothelioma attorney
it returns    Pages Indexed: 11,700

Enter    wausau divorce attorney
it returns    Pages Indexed: 3

Enter    wausau bankruptsy attorney
it returns     No Pages Found

Enter    wausau tax attorney
it returns     Pages Indexed: 3

I am wondering if someone can help me combine the two scripts (possibly have script1 access script to as a kind of function) to return results as follows:

KEYWORD PHRASE                         POSITION           INTITLE PHRASE RESULTS
wisconsin mesothelioma attorney - 1                       - 11,700
wisconsin mesothelioma attorney - 2                       -  11, 700
wausau divorce attorney                   - 1                       -  3
wausau divorce attorney                   - 7                       -  3
wausau bankruptsy lawyer               - not matched   - 3
wausau tax attorney                           - not matched   - Not Found

where columns 1 and 2 are from script1 and column 3 is from script2

both scripts are included in the code snippet.

help would be greatly appreciated.

script1.php

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Script 1</title>
</head>

<body>
Enter the domain name that you would like to review<br />
(do not enter http://www.  Only enter the domain eg. yourwebsite.com
<form method=post action=''>
<input name="url" type="text" /><br /><br />
Enter or Copy / Paste a list of keywords
eg.
divorce attorney
law firm
portland lawyers<br /><br />
<textarea name="keywords" rows="10" cols="50"></textarea>
<input type=submit value="submit"></form>

<?php
ob_start();
ini_set('display_errors', 1);
//put your site's URL here, but leave off the http://www
$url= $_POST['url'];

//$url = "planetsocceronline.com";




// above madisonunited.org should generate positive results for a 7 box
//       ecsportscenter.com  should generate positive results for a 3 box (and a 1 box)
//       blainesoccer.org should generate positve results for a 2 box
 
//put all the keywords you want to check here, make sure you leave the single quotes and commas intact for each keyword
//$keywords = array(
//'soccer madison, wi',     // a google search for this term generates a 7 box
//'soccer osseo, wi ',      // a google search for this term generates a 3 box
//'soccer blaine, mn',      // a google search for this term generates a 2 box
//);



    if(isset($_POST['keywords'])){
        $data=$_POST['keywords'];
          //$keywords=split("\n",$data); // here the keyword is an array
          //print_r($keyword); // print all the array elements here
		  $keywords=split("\n",$data); // here the keyword is an array
          print_r($keywords); // print all the array elements here

    }

 
$descriptions = array(); //funky regular expresions are stored in this array;
// this is to look at the result page to see if a pattern for a 7, 3, 2, or 1 box exists.  Google does not always generate the local search results maps (boxes)
 
//$descriptions[0] = 'style="display: block; padding-bottom: 1px;"><a href="http://(.+?)" class="l" onMouseDown=';  // 7 box
//$descriptions[1] = '<h4 class="r"><a href="http://(.+?)" class="l" onMouseDown='; // all other boxes

$descriptions[0] = '/style="display:block;padding-bottom:1px"><a href="([^"]+)" class=l /i'; // 7 box
$descriptions[1] = '/<h4 class=r><a href="([^"]+)" class=l /i'; // all other boxes



//Just tells us what URL we're checking rankings for
echo "<h2> Google Local SERPS For " .$url. "</h2>";
 
//if(is_array($keywords) && count($keywords)>1){
if(is_array($keywords) && count($keywords)>0){
//start to run through each of the keywords
foreach($keywords as $keyword)  
{
 
	//set our counter at zero, so we can work out what the ranking of the page
	$count = 0;
	 
	//make a URL that we can query Google with
	$search = "http://www.google.com/search?q=" .urlencode($keyword). "&num=10";
	 
	//now go get that page
	$google = file_get_contents($search);
	 
	
	$matched = false;		
	foreach($descriptions as $description)
	{
		preg_match_all($description,$google,$match);
		
		if( count($match[1]) )
		{
			foreach( $match[1] as $value)
			{
				
				$count = $count + 1;
				//check each of the 1 results from google, to see if our URL is in it.
				if(strstr($value, $url)) 
				{
					//if this particular result has our URL in it, print it to the page, along with the ranking ($count variable from above) - this is from within the local search map results
					echo $keyword. " - " . $count . "<br>";
					$matched = true;
				}
			}
			
		}
		
	}
		 
	if(!$matched)
		echo "$keyword  - not matched <br />";
	
	
	//Have a bit of a rest before we go check the next keyword, so we don't get booted from the Goog'
	//sleep(rand(5,10));
}
}
ob_flush();
	 
flush();
 
?>


</body>
</html>

------------------------------------------------------

script2.php

<?php // RAY_temp_chrisj1963.php
error_reporting(E_ALL);


function my_curl($url, $timeout=2, $error_report=FALSE)
{
    $curl = curl_init();

    // HEADERS FROM FIREFOX - APPEARS TO BE A BROWSER REFERRED BY GOOGLE
    $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // browsers keep this blank.

    // SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
    curl_setopt($curl, CURLOPT_URL,            $url);
    curl_setopt($curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6');
    curl_setopt($curl, CURLOPT_HTTPHEADER,     $header);
    curl_setopt($curl, CURLOPT_REFERER,        'http://www.google.com');
    curl_setopt($curl, CURLOPT_ENCODING,       'gzip,deflate');
    curl_setopt($curl, CURLOPT_AUTOREFERER,    TRUE);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($curl, CURLOPT_TIMEOUT,        $timeout);

    // RUN THE CURL REQUEST AND GET THE RESULTS
    $htm = curl_exec($curl);
    $err = curl_errno($curl);
    $inf = curl_getinfo($curl);
    curl_close($curl);

    // ON FAILURE
    if (!$htm)
    {
        // PROCESS ERRORS HERE
        if ($error_report)
        {
            echo "CURL FAIL: $url TIMEOUT=$timeout, CURL_ERRNO=$err";
            var_dump($inf);
        }
        return FALSE;
    }

    // ON SUCCESS
    return $htm;
}


// THINGS TO LOOK FOR
$rs = '<div id=resultStats>';
$nb = '<nobr>';

// WHERE TO LOOK
$url = 'http://www.google.com/search?q=intitle:';

// IF THERE IS AN ARGUMENT, CALL GOOGLE
if (isset($_GET['s']))
{
    echo "<p><i>Search for {$_GET["s"]}</i></p>";

    // CONSTRUCT URL
    echo $url = $url . '"' . urlencode($_GET["s"]) . '"';
    $htm = my_curl($url);

    // ACTIVATE THIS TO SEE THE HTML FROM GOOGLE
    // echo htmlentities($htm);

    // USE BREAKPOINTS APPROPRIATE TO THE MOZILLA BROWSER RESPONSE TEXT
    $arr = explode($rs, $htm);
    $arr = @explode($nb, $arr[1]);
    $str = preg_replace('/[^0-9]/', '', $arr[0]);
    $num = number_format($str);

    echo '<br><br>';

    if ($num == 0)
    {
        echo "No Pages Found";
        die();
    }
    echo "Pages Indexed: $num";
    die();
}
// END OF PHP, PUT UP THE FORM
?>
<form>
<div align="center">
<p>
<input name="s" type="text" id="s" size="50" />
<input type="submit" name="Submit" value="Count" />
</p>
</div>
</form>

Open in new window

Avatar of Marco Gasi
Marco Gasi
Flag of Spain image

Hi chris.
I post a snippets that seems to work (I set error_reporting to none because I received a numer_formt errror about time_zone).

Bye
script1.php

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <title>Script 1</title>
    </head>

    <body>
        Enter the domain name that you would like to review<br />
        (do not enter http://www.  Only enter the domain eg. yourwebsite.com
        <form method=post action=''>
            <input name="url" type="text" /><br /><br />
            Enter or Copy / Paste a list of keywords
            eg.
            divorce attorney
            law firm
            portland lawyers<br /><br />
            <textarea name="keywords" rows="10" cols="50"></textarea>
            <input type=submit value="submit" /></form>

        <?php
        error_reporting(E_NONE);
        function my_curl($url, $timeout=2, $error_report=FALSE) {
            $curl = curl_init();

            // HEADERS FROM FIREFOX - APPEARS TO BE A BROWSER REFERRED BY GOOGLE
            $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
            $header[] = "Cache-Control: max-age=0";
            $header[] = "Connection: keep-alive";
            $header[] = "Keep-Alive: 300";
            $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
            $header[] = "Accept-Language: en-us,en;q=0.5";
            $header[] = "Pragma: "; // browsers keep this blank.
            // SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
            curl_setopt($curl, CURLOPT_URL, $url);
            curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6');
            curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
            curl_setopt($curl, CURLOPT_REFERER, 'http://www.google.com');
            curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate');
            curl_setopt($curl, CURLOPT_AUTOREFERER, TRUE);
            curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
            curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
            curl_setopt($curl, CURLOPT_TIMEOUT, $timeout);

            // RUN THE CURL REQUEST AND GET THE RESULTS
            $htm = curl_exec($curl);
            $err = curl_errno($curl);
            $inf = curl_getinfo($curl);
            curl_close($curl);

            // ON FAILURE
            if (!$htm) {
                // PROCESS ERRORS HERE
                if ($error_report) {
                    echo "CURL FAIL: $url TIMEOUT=$timeout, CURL_ERRNO=$err";
                    var_dump($inf);
                }
                return FALSE;
            }

            // ON SUCCESS
            return $htm;
        }

        ob_start();
        ini_set('display_errors', 1);
//put your site's URL here, but leave off the http://www
        $url = $_POST['url'];

// THINGS TO LOOK FOR
        $rs = '<div id=resultStats>';
        $nb = '<nobr>';

// WHERE TO LOOK
        $url2 = 'http://www.google.com/search?q=intitle:';


        if (isset($_POST['keywords'])) {
            $data = $_POST['keywords'];
            //$keywords=split("\n",$data); // here the keyword is an array
            //print_r($keyword); // print all the array elements here
            $keywords = split("\n", $data); // here the keyword is an array
            print_r($keywords); // print all the array elements here
        }
        $descriptions = array(); //funky regular expresions are stored in this array;
// this is to look at the result page to see if a pattern for a 7, 3, 2, or 1 box exists.  Google does not always generate the local search results maps (boxes)
        $descriptions[0] = '/style="display:block;padding-bottom:1px"><a href="([^"]+)" class=l /i'; // 7 box
        $descriptions[1] = '/<h4 class=r><a href="([^"]+)" class=l /i'; // all other boxes
//Just tells us what URL we're checking rankings for
        echo "<h2> Google Local SERPS For " . $url . "</h2>";

//if(is_array($keywords) && count($keywords)>1){
        if (is_array($keywords) && count($keywords) > 0) {
//start to run through each of the keywords
            foreach ($keywords as $keyword) {

                $url2 = $url2 . '"' . urlencode($keyword) . '"';
                $htm = my_curl($url2);

                // ACTIVATE THIS TO SEE THE HTML FROM GOOGLE
                // echo htmlentities($htm);
                // USE BREAKPOINTS APPROPRIATE TO THE MOZILLA BROWSER RESPONSE TEXT
                $arr = explode($rs, $htm);
                $arr = @explode($nb, $arr[1]);
                $str = preg_replace('/[^0-9]/', '', $arr[0]);
                $num = number_format($str);

//                echo '<br><br>';

                if ($num == 0) {
                    $msg = "No Pages Found";
//                    die();
                }else $msg = "Page indexed: ".$num;
//                die();
//                echo "Page indexed: ".$msg."<br>";
                //set our counter at zero, so we can work out what the ranking of the page
                $count = 0;

                //make a URL that we can query Google with
                $search = "http://www.google.com/search?q=" . urlencode($keyword) . "&num=10";

                //now go get that page
                $google = file_get_contents($search);


                $matched = false;
                foreach ($descriptions as $description) {
                    preg_match_all($description, $google, $match);

                    if (count($match[1])) {
                        foreach ($match[1] as $value) {

                            $count = $count + 1;
                            //check each of the 1 results from google, to see if our URL is in it.
                            if (strstr($value, $url)) {
                                //if this particular result has our URL in it, print it to the page, along with the ranking ($count variable from above) - this is from within the local search map results
                                echo $keyword . " - " . $count . "  " . $msg . "<br>";
                                $matched = true;
                            }
                        }
                    }
                }

                if (!$matched)
                    echo "$keyword  - not matched <br />";

                //Have a bit of a rest before we go check the next keyword, so we don't get booted from the Goog'
                //sleep(rand(5,10));
            }
        }
        ob_flush();

        flush();
        ?>


    </body>
</html>

Open in new window

Avatar of chrisj1963
chrisj1963

ASKER

Hey again Marqus!

I think were close but the 3rd column results are not quite right.

If I go to http://www.prontopage.net/localsearch/script1.php and enter:
(Domain)
wausaulaw.com

(Keywords)
wisconsin mesothelioma attorney
wausau divorce attorney
wausau bankruptcy lawyer
wausau tax attorney

I get:
wisconsin mesothelioma attorney - 1
wisconsin mesothelioma attorney - 2
wausau divorce attorney - 1
wausau divorce attorney - 7
wausau bankruptcy lawyer - not matched
wausau tax attorney - not matched

and then if I go to  http://www.prontopage.net/localsearch/script2.php
and enter the following terms one-by-one I get the following "Pages Indexed" results

wisconsin mesothelioma attorney    11,800
wausau divorce attorney                     3
wausau bankruptcy lawyer                 8
wausau tax attorney                             3

when I go to http://www.prontopage.net/localsearch/marqus.php (your script)
and enter
(Domain)
wausaulaw.com

(Keywords)
wisconsin mesothelioma attorney
wausau divorce attorney
wausau bankruptcy lawyer
wausau tax attorney

The result I get is:  
wisconsin mesothelioma attorney - 1 Page indexed: 11,800
wisconsin mesothelioma attorney - 2 Page indexed: 11,800
wausau divorce attorney - 1 No Pages Found
wausau divorce attorney - 7 No Pages Found
wausau bankruptcy lawyer - not matched
wausau tax attorney - not matched

When it should be:
wisconsin mesothelioma attorney - 2 Page indexed: 11,800
wausau divorce attorney - 1 Page indexed: 3
wausau divorce attorney - 7 Page indexed: 3
wausau bankruptcy lawyer - Page indexed: 8
wausau tax attorney - Page indexed: 3

any thoughts on how to get that to work properly?

Thanks very much for your help!
the final comment "When it should be"   should have been:

When it should be:
wisconsin mesothelioma attorney - 1 Page indexed: 11,800
wisconsin mesothelioma attorney - 2 Page indexed: 11,800
wausau divorce attorney - 1 Page indexed: 3
wausau divorce attorney - 7 Page indexed: 3
wausau bankruptcy lawyer - Page indexed: 8
wausau tax attorney - Page indexed: 3
Geeze... i meant...

When it should be:
wisconsin mesothelioma attorney - 1 Page indexed: 11,800
wisconsin mesothelioma attorney - 2 Page indexed: 11,800
wausau divorce attorney - 1 Page indexed: 3
wausau divorce attorney - 7 Page indexed: 3
wausau bankruptcy lawyer - not matched Page indexed: 8
wausau tax attorney - not matched Page indexed: 3

sorry
Hi chris. Now I take a look. I will let you know as soon as possible. Bye
Hi chris. Please, try attached code: in my tests some error has been fixed but output is not identical to that you described.

My result is

wisconsin mesothelioma attorney - 1 Page indexed: 11,600
wisconsin mesothelioma attorney - 2 Page indexed: 11,600
wausau divorce attorney - 2 Page indexed: 2
wausau bankruptsy lawyer - not matched No Pages Found

Idon't would that differences were due to different location of our server or google server as it happened in the other question (do you remember we have different results?)

Let me know how it works.
Hi marqus - I think you forgot to attach the code.. and also when I gave you the second round of keywords I changed the spelling of "bankruptcy attorney"  note the "cy" vs "sy".  I spelled it wrong the first time.  I also added "wausau tax attorney" which will explain the "bankruptcy" result difference.

Regarding the "wausau divorce attorney"  Page indexed 2 versus my 3 the result should match this

http://www.google.com/search?q=intitle:"wausau+divorce+attorney"  which is 3...

Thanks for your work on this..
oh and regarding the server difference, when you search from your local machine on google there may be a difference, but when you are searching through my server at prontopage.net it should be the same.
ASKER CERTIFIED SOLUTION
Avatar of Marco Gasi
Marco Gasi
Flag of Spain image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
That works perfectly!!!!

wisconsin mesothelioma attorney - 1 Page indexed: 11,800
wisconsin mesothelioma attorney - 2 Page indexed: 11,800
wausau divorce attorney - 1 Page indexed: 3
wausau divorce attorney - 7 Page indexed: 3
wausau bankruptcy lawyer - not matched Page indexed: 8
wausau tax attorney - not matched Page indexed: 3

Thank you very much!!!!!!!!!!!!!!!!!!!!!
Hip hip hip urrah! Thanks for points, chris. Here you soon...