Link to home
Start Free TrialLog in
Avatar of chrisj1963
chrisj1963

asked on

php sort when scraping data

Hi,
I have source data here:
http://prontopage.net/a_parse/p_datasource.htm
(note that the Monthly Volume column is unsorted)

I have a data scraper here:
http://prontopage.net/a_parse/p_scrape.php

I have a javascript that automatically sorts the Monthly Volume column after the data is displayed on the page, but that is delayed/slow after the page is loaded.

Is there a way to change the scrape.php code to sort the Monthly Volume data when it prints it to the page rather than using javacript to do the sorting?

Help is appreciated.
<!--<style>
table { text-align: left; border-collapse: collapse; }
tr:hover { background: blue; color: white }
th, td { padding: 7px }
</style>-->
<table>
<thead>
<tr><th>Keyword</th><th>Monthly Molume</th></tr>
</thead>

<tbody><tr><td class=1>mid pines north carolina</td><td class=2>12</td></tr>
<tr><td class=1>southern pines resort</td><td class=2>16</td></tr>
<tr><td class=1>pine needles golf course nc</td><td class=2>28</td></tr>

<tr><td class=1>north carolina golf club</td><td class=2>58</td></tr>
<tr><td class=1>golf southern</td><td class=2>12</td></tr>
<tr><td class=1>and pines inn</td><td class=2>0</td></tr>
<tr><td class=1>north carolina pine trees</td><td class=2>58</td></tr>
<tr><td class=1>nc golf course</td><td class=2>73</td></tr>
<tr><td class=1>and mid to</td><td class=2>0</td></tr>

<tr><td class=1>restaurants in southern pines nc</td><td class=2>58</td></tr>
<tr><td class=1>best golf courses north carolina</td><td class=2>22</td></tr>
<tr><td class=1>needles pine</td><td class=2>58</td></tr>
<tr><td class=1>best golf in north carolina</td><td class=2>12</td></tr>
<tr><td class=1>southern pine north carolina</td><td class=2>16</td></tr>
<tr><td class=1>north carolina golf</td><td class=2>1900</td></tr>

<tr><td class=1>southern pines golf nc</td><td class=2>12</td></tr>
<tr><td class=1>long needle pines</td><td class=2>16</td></tr>
<tr><td class=1>southern pine nc</td><td class=2>58</td></tr>
<tr><td class=1>and golf club southern pines</td><td class=2>0</td></tr>
<tr><td class=1>club lodge</td><td class=2>12</td></tr>
<tr><td class=1>carolina course golf north</td><td class=2>12</td></tr>

<tr><td class=1>pinehurst golf nc</td><td class=2>73</td></tr>
<tr><td class=1>golf courses pinehurst nc</td><td class=2>28</td></tr>
<tr><td class=1>pine needles golf club nc</td><td class=2>12</td></tr>
<tr><td class=1>needles</td><td class=2>33100</td></tr>
<tr><td class=1>golf course north carolina</td><td class=2>58</td></tr>
<tr><td class=1>seagrove resorts</td><td class=2>110</td></tr>

<tr><td class=1>pine's</td><td class=2>58</td></tr>
<tr><td class=1>golf pines</td><td class=2>12</td></tr>
<tr><td class=1>and golf club southern</td><td class=2>0</td></tr>
<tr><td class=1>pine needle golf</td><td class=2>46</td></tr>
<tr><td class=1>mid pines golf resort</td><td class=2>28</td></tr>
<tr><td class=1>lodge country club</td><td class=2>36</td></tr>

<tr><td class=1>and southern pines north</td><td class=2>0</td></tr>
<tr><td class=1>golf nc</td><td class=2>390</td></tr>
<tr><td class=1>and resort in southern</td><td class=2>0</td></tr>
<tr><td class=1>pine needles golf resort</td><td class=2>73</td></tr>
<tr><td class=1>carolina in the pine</td><td class=2>16</td></tr>
<tr><td class=1>north carolina golfing</td><td class=2>36</td></tr>

<tr><td class=1>nc golf courses</td><td class=2>590</td></tr>
<tr><td class=1>golf lessons nc</td><td class=2>28</td></tr>
<tr><td class=1>golf courses in north carolina</td><td class=2>320</td></tr>
<tr><td class=1>southern pines restaurant</td><td class=2>22</td></tr>
<tr><td class=1>carolina golf club</td><td class=2>480</td></tr>
<tr><td class=1>north carolina southern pines</td><td class=2>28</td></tr>

<tr><td class=1>mid pine</td><td class=2>12</td></tr>
<tr><td class=1>pine needles golf</td><td class=2>720</td></tr>
<tr><td class=1>mid carolina country club</td><td class=2>58</td></tr>
<tr><td class=1>at southern pines</td><td class=2>0</td></tr>
<tr><td class=1>golf resorts in north carolina</td><td class=2>110</td></tr>
<tr><td class=1>golf resort nc</td><td class=2>16</td></tr>

<tr><td class=1>golf packages in pinehurst nc</td><td class=2>12</td></tr>
<tr><td class=1>pines nc</td><td class=2>16</td></tr>
<tr><td class=1>pine needles golf club</td><td class=2>170</td></tr>
<tr><td class=1>and mid and</td><td class=2>0</td></tr>
<tr><td class=1>north carolina golf vacations</td><td class=2>320</td></tr>
<tr><td class=1>golf courses in pinehurst nc</td><td class=2>46</td></tr>

<tr><td class=1>pine needles nc</td><td class=2>110</td></tr>
<tr><td class=1>souther pines nc</td><td class=2>28</td></tr>
<tr><td class=1>pine needles golf course</td><td class=2>480</td></tr>
<tr><td class=1>pine resorts</td><td class=2>140</td></tr>
<tr><td class=1>top nc golf courses</td><td class=2>16</td></tr>
<tr><td class=1>& golf club southern pines</td><td class=2>0</td></tr>

<tr><td class=1>mid golf</td><td class=2>16</td></tr>
<tr><td class=1>southern pines country club</td><td class=2>110</td></tr>
<tr><td class=1>courses in nc</td><td class=2>22</td></tr>
<tr><td class=1>nc golf vacations</td><td class=2>91</td></tr>
<tr><td class=1>pineneedle com</td><td class=2>22</td></tr>
<tr><td class=1>pine needles golf course north carolina</td><td class=2>16</td></tr>

<tr><td class=1>the carolina golf club</td><td class=2>73</td></tr>
<tr><td class=1>1005 midland road southern</td><td class=2>0</td></tr>
<tr><td class=1>pine north carolina</td><td class=2>16</td></tr>
<tr><td class=1>carolina pines inn</td><td class=2>22</td></tr>
<tr><td class=1>best golf courses in nc</td><td class=2>28</td></tr>
<tr><td class=1>lodge pine</td><td class=2>22</td></tr>

<tr><td class=1>pines of carolina</td><td class=2>320</td></tr>
<tr><td class=1>a pine needle</td><td class=2>0</td></tr>
<tr><td class=1>north carolina golf clubs</td><td class=2>28</td></tr>
<tr><td class=1>the pine needles</td><td class=2>22</td></tr>
<tr><td class=1>mid pines golf course</td><td class=2>91</td></tr>
<tr><td class=1>golf resorts in nc</td><td class=2>28</td></tr>

<tr><td class=1>north carolina golf courses</td><td class=2>1300</td></tr>
<tr><td class=1>golf course nc</td><td class=2>58</td></tr>
<tr><td class=1>southern pine needles</td><td class=2>12</td></tr>
<tr><td class=1>mid southern com</td><td class=2>22</td></tr>
<tr><td class=1>southern pines</td><td class=2>1900</td></tr>
<tr><td class=1>mid pines country club</td><td class=2>16</td></tr>

<tr><td class=1>carolina golf club nc</td><td class=2>12</td></tr>
<tr><td class=1>north carolina golf packages</td><td class=2>480</td></tr>
<tr><td class=1>golf packages in north carolina</td><td class=2>73</td></tr>
<tr><td class=1>mid pines</td><td class=2>320</td></tr>
<tr><td class=1>south pines north carolina</td><td class=2>12</td></tr>
<tr><td class=1>north carolina pine</td><td class=2>28</td></tr>

<tr><td class=1>golf courses nc</td><td class=2>170</td></tr>
<tr><td class=1>golf resort north carolina</td><td class=2>36</td></tr>
<tr><td class=1>southern pines hotels</td><td class=2>9900</td></tr>
<tr><td class=1>southern pines nc 28387</td><td class=2>36</td></tr>
<tr><td class=1>golf courses north carolina</td><td class=2>210</td></tr>
<tr><td class=1>pine golf clubs</td><td class=2>36</td></tr>

<tr><td class=1>club mid</td><td class=2>73</td></tr>
</tbody>
</table>

Open in new window

Avatar of chrisj1963
chrisj1963

ASKER

here is the php page.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml" lang="en"><head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

<style type="text/css">
body { color: white; background: #52616F; }
a { color: white; }
</style>

<!--code for select radio button-->
<script type="text/javascript">  
  function   selectproduct_idfunc(text) {
  document.getElementById("selectproduct_id" + document.getElementById("selectproduct_id").value).style.border = "2px solid white";
  document.getElementById("selectproduct_id" + text).style.border = "3px dotted #ffcc00";
  document.getElementById("selectproduct_id").value = text;
}
function displayValue(element)
{
        var h2text = element.parentNode.parentNode.getElementsByTagName('h2')[0].innerHTML;
        document.getElementById('selectproduct_display_dummy').value = h2text.substr(0, h2text.length);
}
</script>
<!--end code for select radio button-->

<style type="text/css">
body { color: white; background: #52616F; }
a { color: white; }
</style>
  <title>frequency decoder ~ table sort (revisited) demo</title>
  <link href="sort3_files/demo.css" rel="stylesheet" type="text/css">
</head>
<br /><br /> <br /><br />
<form name="form1" id="form1" method="get" action=""> 
  <div align="center"> 
    <p>  
      <input name="s" type="text" id="s" size="50" /> 
      <input type="submit" name="Submit" value="Find Related Keywords" /> 
    </p> 
  </div> 
</form> 

 <input type="text" class="txtbox" id="selectproduct_display_dummy" value="1" readonly /><br>
 <input type="hidden" class="txtbox" id="selectproduct_id" name="product_id" value="1" readonly /><br>
 <!--<input type="submit" value="Submit" id="submit">-->
 
<?php // RAY_temp_scrape.php

$s=$_GET['s']; 
$s = str_replace(' ','+',$s);
if(isset($s)) 
{ 
    //*******F I R S T   P A R T********* 
    //Find the number of pages indexed for the searched term 
    //*From previous part 
    echo "<p><i>Search for $s</i></p>"; 

    $s=urlencode($s); 



//error_reporting(E_ALL);
echo "<pre>\n"; // IMPROVE READABILITY
// $s="bus company";
// FROM THE OP - READ THE GOOGLE PAGE HTML WITH CURL
//$url="http://www.prontopage.net/testhtml2.htm";
// the p=10 below is dummy because I had to put something in there..
$url="http://www.prontopage.net/a_parse/p_datasource.htm";

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
$file=curl_exec ($ch) or die(curl_error());
curl_close ($ch);

 
// CREATE AN ARRAY FROM THE HTML-----------------------------1. Domain
$arr = explode("<td class=1>", $file);
// DISCARD THE UNWANTED STUFF AT THE TOP OF THE HTML
unset($arr[0]);
// TIDY UP EACH ELEMENT OF THE ARRAY
foreach ($arr as $ptr => $string)
{
// LOCATE THE END OF USEFUL DATA
    $poz = strpos($string, "</td>");
// END OF DATA NOT FOUND - SKIP THIS ELEMENT
    if ($poz === FALSE)
    {
        unset($arr[$ptr]);
        continue;
    }
// REMOVE USELESS TRAILING DATA AND REPAIR HTML (XML) ENTITIES
    $arr[$ptr] = substr($string, 0, $poz);
    $arr[$ptr] = str_replace('&amp;', '&', $arr[$ptr]);
}
 
// CREATE AN ARRAY FROM THE HTML-----------------------------2. Phone Number
$arr2 = explode("<td class=2>", $file);
// DISCARD THE UNWANTED STUFF AT THE TOP OF THE HTML
unset($arr2[0]);
// TIDY UP EACH ELEMENT OF THE ARRAY
foreach ($arr2 as $ptr => $string2)
{
// LOCATE THE END OF USEFUL DATA
    $poz = strpos($string2, "</td>");
// END OF DATA NOT FOUND - SKIP THIS ELEMENT
    if ($poz === FALSE)
    {
        unset($arr2[$ptr]);
        continue;
    }
	

// REMOVE USELESS TRAILING DATA AND REPAIR HTML (XML) ENTITIES
    $arr2[$ptr] = substr($string2, 0, $poz);
    $arr2[$ptr] = str_replace('&amp;', '&', $arr2[$ptr]);
}

 
// ACTIVATE THIS TO SEE THE ARRAY
// var_dump($arr);
//CJ set up the table

echo "<form id=\"multiForm\" name=\"theForm\" method=\"POST\" action=\"http://www.prontopage.net/member/signup.php\" >";
        echo "<body class=\"\">";
		echo "<h1>Keyword Estimated Monthly Volume in Google</h1>";
		echo "<h2>Limited to 100 Results</h2>";
        echo "<table id=\"test1\" class=\"sortable-onload-3-reverse rowstyle-alt no-arrow\" border=\"0\" cellpadding=\"0\" cellspacing=\"0\">";
		echo "<caption>Data Courtesy of seoquake (udpated monthly)</caption>";
  		echo "<thead>";
    	echo "<tr>";
		echo "<th style=\"-moz-user-select: none;\" class=\"sortable-keep fd-column-0\"><a title=\"Sort on Position\" href=\"#\">Number</a></th>";
        echo "<th style=\"-moz-user-select: none;\" class=\"fd-column-1 sortable-text  forwardSort reverseSort\"><a title=\"Sort on Keyword\" href=\"#\">Keyword</a></th>";
    	echo "<th style=\"-moz-user-select: none;\" class=\"sortable-numeric fd-column-2\"><a title=\"Sort on Monthly Volumek\" href=\"#\">Montly Volume</a></th>";
        echo "<th style=\"-moz-user-select: none;\" class=\"sortable-date-dmy fd-column-3\"><a title=\"Sort on Release Date\" href=\"#\"></a></th>";
       // echo "<th style=\"-moz-user-select: none;\" class=\"sortable-currency fd-column-4\"><a title=\"Sort on Weekly Gross\" href=\"#\">Weekly Gross</a></th>";
       // echo "<th style=\"-moz-user-select: none;\" class=\"sortable-numeric fd-column-5\"><a title=\"Sort on Change\" href=\"#\">Change</a></th>";
       // echo "<th style=\"-moz-user-select: none;\" class=\"sortable-numeric fd-column-6\"><a title=\"Sort on Theaters\" href=\"#\">Theaters</a></th>";
       // echo "<th style=\"-moz-user-select: none;\" class=\"sortable-currency fd-column-7\"><a title=\"Sort on Per Theater\" href=\"#\">Per Theater</a></th>";
       // echo "<th style=\"-moz-user-select: none;\" class=\"sortable-currency fd-column-8\"><a title=\"Sort on Gross\" href=\"#\">Gross</a></th>";
        echo "</tr>";
        echo "</thead>";
        echo "<tbody>";

        //echo "<tr><th>Word #</th>";
       // echo "<th>Keyword</th>";
        //echo "<th>Wordtracker Volume</th>";
        //echo "<th>google Volume</th></tr>";
//Put results in a table


//for($i=1; $i<=count($arr); $i++){
//version below limits list to X
for($i=1;$i<=1000;$i++){

		//$go=$arr[$i] * $arr2[$i];
		echo "<tr><td>";
        echo $i;
        echo "</td><td>";
        echo "<div><h2>".$arr[$i]."</h2>";
        echo "</td><td>";
        echo $arr2[$i];
        echo "</td><td>";
       	//echo "<input type=\"radio\" id=\" selectproduct_id \"onClick=\" selectproduct_idfunc";
		echo "<img src=\"green.jpg\" id=\"selectproduct_id$i\"onClick=\"selectproduct_idfunc";
		echo "('$i');";
		echo "displayValue(this);\" />" ;
       echo "</td></tr>";

 
}

echo "</tbody>";
echo "</table>";
echo "<script type=\"text/javascript\" src=\"sort3_files/tablesort.js\"></script>";


}
?>

<?php

echo "</body></html>";

?>

Open in new window

I think the best way to do this would be to rearrange the way the $arr array gets built - you might want to change the $arr so that the key of each element is the string, and the value of each element is the count.  Then you can sort the array preserving the key=>value pairs.
ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks for the help.  I have struggled for quite a while trying to integrate this code with the code I posted as a reference, so I will be posting a follow-up. You are welcome to comment there if you would like to provide a solution. Thanks again.
Thanks for the points!  Best, ~Ray