php sort when scraping data

Hi,
I have source data here:
http://prontopage.net/a_parse/p_datasource.htm
(note that the Monthly Volume column is unsorted)

I have a data scraper here:
http://prontopage.net/a_parse/p_scrape.php

I have a javascript that automatically sorts the Monthly Volume column after the data is displayed on the page, but that is delayed/slow after the page is loaded.

Is there a way to change the scrape.php code to sort the Monthly Volume data when it prints it to the page rather than using javacript to do the sorting?

Help is appreciated.
<!--<style>
table { text-align: left; border-collapse: collapse; }
tr:hover { background: blue; color: white }
th, td { padding: 7px }
</style>-->
<table>
<thead>
<tr><th>Keyword</th><th>Monthly Molume</th></tr>
</thead>

<tbody><tr><td class=1>mid pines north carolina</td><td class=2>12</td></tr>
<tr><td class=1>southern pines resort</td><td class=2>16</td></tr>
<tr><td class=1>pine needles golf course nc</td><td class=2>28</td></tr>

<tr><td class=1>north carolina golf club</td><td class=2>58</td></tr>
<tr><td class=1>golf southern</td><td class=2>12</td></tr>
<tr><td class=1>and pines inn</td><td class=2>0</td></tr>
<tr><td class=1>north carolina pine trees</td><td class=2>58</td></tr>
<tr><td class=1>nc golf course</td><td class=2>73</td></tr>
<tr><td class=1>and mid to</td><td class=2>0</td></tr>

<tr><td class=1>restaurants in southern pines nc</td><td class=2>58</td></tr>
<tr><td class=1>best golf courses north carolina</td><td class=2>22</td></tr>
<tr><td class=1>needles pine</td><td class=2>58</td></tr>
<tr><td class=1>best golf in north carolina</td><td class=2>12</td></tr>
<tr><td class=1>southern pine north carolina</td><td class=2>16</td></tr>
<tr><td class=1>north carolina golf</td><td class=2>1900</td></tr>

<tr><td class=1>southern pines golf nc</td><td class=2>12</td></tr>
<tr><td class=1>long needle pines</td><td class=2>16</td></tr>
<tr><td class=1>southern pine nc</td><td class=2>58</td></tr>
<tr><td class=1>and golf club southern pines</td><td class=2>0</td></tr>
<tr><td class=1>club lodge</td><td class=2>12</td></tr>
<tr><td class=1>carolina course golf north</td><td class=2>12</td></tr>

<tr><td class=1>pinehurst golf nc</td><td class=2>73</td></tr>
<tr><td class=1>golf courses pinehurst nc</td><td class=2>28</td></tr>
<tr><td class=1>pine needles golf club nc</td><td class=2>12</td></tr>
<tr><td class=1>needles</td><td class=2>33100</td></tr>
<tr><td class=1>golf course north carolina</td><td class=2>58</td></tr>
<tr><td class=1>seagrove resorts</td><td class=2>110</td></tr>

<tr><td class=1>pine's</td><td class=2>58</td></tr>
<tr><td class=1>golf pines</td><td class=2>12</td></tr>
<tr><td class=1>and golf club southern</td><td class=2>0</td></tr>
<tr><td class=1>pine needle golf</td><td class=2>46</td></tr>
<tr><td class=1>mid pines golf resort</td><td class=2>28</td></tr>
<tr><td class=1>lodge country club</td><td class=2>36</td></tr>

<tr><td class=1>and southern pines north</td><td class=2>0</td></tr>
<tr><td class=1>golf nc</td><td class=2>390</td></tr>
<tr><td class=1>and resort in southern</td><td class=2>0</td></tr>
<tr><td class=1>pine needles golf resort</td><td class=2>73</td></tr>
<tr><td class=1>carolina in the pine</td><td class=2>16</td></tr>
<tr><td class=1>north carolina golfing</td><td class=2>36</td></tr>

<tr><td class=1>nc golf courses</td><td class=2>590</td></tr>
<tr><td class=1>golf lessons nc</td><td class=2>28</td></tr>
<tr><td class=1>golf courses in north carolina</td><td class=2>320</td></tr>
<tr><td class=1>southern pines restaurant</td><td class=2>22</td></tr>
<tr><td class=1>carolina golf club</td><td class=2>480</td></tr>
<tr><td class=1>north carolina southern pines</td><td class=2>28</td></tr>

<tr><td class=1>mid pine</td><td class=2>12</td></tr>
<tr><td class=1>pine needles golf</td><td class=2>720</td></tr>
<tr><td class=1>mid carolina country club</td><td class=2>58</td></tr>
<tr><td class=1>at southern pines</td><td class=2>0</td></tr>
<tr><td class=1>golf resorts in north carolina</td><td class=2>110</td></tr>
<tr><td class=1>golf resort nc</td><td class=2>16</td></tr>

<tr><td class=1>golf packages in pinehurst nc</td><td class=2>12</td></tr>
<tr><td class=1>pines nc</td><td class=2>16</td></tr>
<tr><td class=1>pine needles golf club</td><td class=2>170</td></tr>
<tr><td class=1>and mid and</td><td class=2>0</td></tr>
<tr><td class=1>north carolina golf vacations</td><td class=2>320</td></tr>
<tr><td class=1>golf courses in pinehurst nc</td><td class=2>46</td></tr>

<tr><td class=1>pine needles nc</td><td class=2>110</td></tr>
<tr><td class=1>souther pines nc</td><td class=2>28</td></tr>
<tr><td class=1>pine needles golf course</td><td class=2>480</td></tr>
<tr><td class=1>pine resorts</td><td class=2>140</td></tr>
<tr><td class=1>top nc golf courses</td><td class=2>16</td></tr>
<tr><td class=1>& golf club southern pines</td><td class=2>0</td></tr>

<tr><td class=1>mid golf</td><td class=2>16</td></tr>
<tr><td class=1>southern pines country club</td><td class=2>110</td></tr>
<tr><td class=1>courses in nc</td><td class=2>22</td></tr>
<tr><td class=1>nc golf vacations</td><td class=2>91</td></tr>
<tr><td class=1>pineneedle com</td><td class=2>22</td></tr>
<tr><td class=1>pine needles golf course north carolina</td><td class=2>16</td></tr>

<tr><td class=1>the carolina golf club</td><td class=2>73</td></tr>
<tr><td class=1>1005 midland road southern</td><td class=2>0</td></tr>
<tr><td class=1>pine north carolina</td><td class=2>16</td></tr>
<tr><td class=1>carolina pines inn</td><td class=2>22</td></tr>
<tr><td class=1>best golf courses in nc</td><td class=2>28</td></tr>
<tr><td class=1>lodge pine</td><td class=2>22</td></tr>

<tr><td class=1>pines of carolina</td><td class=2>320</td></tr>
<tr><td class=1>a pine needle</td><td class=2>0</td></tr>
<tr><td class=1>north carolina golf clubs</td><td class=2>28</td></tr>
<tr><td class=1>the pine needles</td><td class=2>22</td></tr>
<tr><td class=1>mid pines golf course</td><td class=2>91</td></tr>
<tr><td class=1>golf resorts in nc</td><td class=2>28</td></tr>

<tr><td class=1>north carolina golf courses</td><td class=2>1300</td></tr>
<tr><td class=1>golf course nc</td><td class=2>58</td></tr>
<tr><td class=1>southern pine needles</td><td class=2>12</td></tr>
<tr><td class=1>mid southern com</td><td class=2>22</td></tr>
<tr><td class=1>southern pines</td><td class=2>1900</td></tr>
<tr><td class=1>mid pines country club</td><td class=2>16</td></tr>

<tr><td class=1>carolina golf club nc</td><td class=2>12</td></tr>
<tr><td class=1>north carolina golf packages</td><td class=2>480</td></tr>
<tr><td class=1>golf packages in north carolina</td><td class=2>73</td></tr>
<tr><td class=1>mid pines</td><td class=2>320</td></tr>
<tr><td class=1>south pines north carolina</td><td class=2>12</td></tr>
<tr><td class=1>north carolina pine</td><td class=2>28</td></tr>

<tr><td class=1>golf courses nc</td><td class=2>170</td></tr>
<tr><td class=1>golf resort north carolina</td><td class=2>36</td></tr>
<tr><td class=1>southern pines hotels</td><td class=2>9900</td></tr>
<tr><td class=1>southern pines nc 28387</td><td class=2>36</td></tr>
<tr><td class=1>golf courses north carolina</td><td class=2>210</td></tr>
<tr><td class=1>pine golf clubs</td><td class=2>36</td></tr>

<tr><td class=1>club mid</td><td class=2>73</td></tr>
</tbody>
</table>

Open in new window

chrisj1963Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

 
chrisj1963Author Commented:
here is the php page.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml" lang="en"><head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

<style type="text/css">
body { color: white; background: #52616F; }
a { color: white; }
</style>

<!--code for select radio button-->
<script type="text/javascript">  
  function   selectproduct_idfunc(text) {
  document.getElementById("selectproduct_id" + document.getElementById("selectproduct_id").value).style.border = "2px solid white";
  document.getElementById("selectproduct_id" + text).style.border = "3px dotted #ffcc00";
  document.getElementById("selectproduct_id").value = text;
}
function displayValue(element)
{
        var h2text = element.parentNode.parentNode.getElementsByTagName('h2')[0].innerHTML;
        document.getElementById('selectproduct_display_dummy').value = h2text.substr(0, h2text.length);
}
</script>
<!--end code for select radio button-->

<style type="text/css">
body { color: white; background: #52616F; }
a { color: white; }
</style>
  <title>frequency decoder ~ table sort (revisited) demo</title>
  <link href="sort3_files/demo.css" rel="stylesheet" type="text/css">
</head>
<br /><br /> <br /><br />
<form name="form1" id="form1" method="get" action=""> 
  <div align="center"> 
    <p>  
      <input name="s" type="text" id="s" size="50" /> 
      <input type="submit" name="Submit" value="Find Related Keywords" /> 
    </p> 
  </div> 
</form> 

 <input type="text" class="txtbox" id="selectproduct_display_dummy" value="1" readonly /><br>
 <input type="hidden" class="txtbox" id="selectproduct_id" name="product_id" value="1" readonly /><br>
 <!--<input type="submit" value="Submit" id="submit">-->
 
<?php // RAY_temp_scrape.php

$s=$_GET['s']; 
$s = str_replace(' ','+',$s);
if(isset($s)) 
{ 
    //*******F I R S T   P A R T********* 
    //Find the number of pages indexed for the searched term 
    //*From previous part 
    echo "<p><i>Search for $s</i></p>"; 

    $s=urlencode($s); 



//error_reporting(E_ALL);
echo "<pre>\n"; // IMPROVE READABILITY
// $s="bus company";
// FROM THE OP - READ THE GOOGLE PAGE HTML WITH CURL
//$url="http://www.prontopage.net/testhtml2.htm";
// the p=10 below is dummy because I had to put something in there..
$url="http://www.prontopage.net/a_parse/p_datasource.htm";

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
$file=curl_exec ($ch) or die(curl_error());
curl_close ($ch);

 
// CREATE AN ARRAY FROM THE HTML-----------------------------1. Domain
$arr = explode("<td class=1>", $file);
// DISCARD THE UNWANTED STUFF AT THE TOP OF THE HTML
unset($arr[0]);
// TIDY UP EACH ELEMENT OF THE ARRAY
foreach ($arr as $ptr => $string)
{
// LOCATE THE END OF USEFUL DATA
    $poz = strpos($string, "</td>");
// END OF DATA NOT FOUND - SKIP THIS ELEMENT
    if ($poz === FALSE)
    {
        unset($arr[$ptr]);
        continue;
    }
// REMOVE USELESS TRAILING DATA AND REPAIR HTML (XML) ENTITIES
    $arr[$ptr] = substr($string, 0, $poz);
    $arr[$ptr] = str_replace('&amp;', '&', $arr[$ptr]);
}
 
// CREATE AN ARRAY FROM THE HTML-----------------------------2. Phone Number
$arr2 = explode("<td class=2>", $file);
// DISCARD THE UNWANTED STUFF AT THE TOP OF THE HTML
unset($arr2[0]);
// TIDY UP EACH ELEMENT OF THE ARRAY
foreach ($arr2 as $ptr => $string2)
{
// LOCATE THE END OF USEFUL DATA
    $poz = strpos($string2, "</td>");
// END OF DATA NOT FOUND - SKIP THIS ELEMENT
    if ($poz === FALSE)
    {
        unset($arr2[$ptr]);
        continue;
    }
	

// REMOVE USELESS TRAILING DATA AND REPAIR HTML (XML) ENTITIES
    $arr2[$ptr] = substr($string2, 0, $poz);
    $arr2[$ptr] = str_replace('&amp;', '&', $arr2[$ptr]);
}

 
// ACTIVATE THIS TO SEE THE ARRAY
// var_dump($arr);
//CJ set up the table

echo "<form id=\"multiForm\" name=\"theForm\" method=\"POST\" action=\"http://www.prontopage.net/member/signup.php\" >";
        echo "<body class=\"\">";
		echo "<h1>Keyword Estimated Monthly Volume in Google</h1>";
		echo "<h2>Limited to 100 Results</h2>";
        echo "<table id=\"test1\" class=\"sortable-onload-3-reverse rowstyle-alt no-arrow\" border=\"0\" cellpadding=\"0\" cellspacing=\"0\">";
		echo "<caption>Data Courtesy of seoquake (udpated monthly)</caption>";
  		echo "<thead>";
    	echo "<tr>";
		echo "<th style=\"-moz-user-select: none;\" class=\"sortable-keep fd-column-0\"><a title=\"Sort on Position\" href=\"#\">Number</a></th>";
        echo "<th style=\"-moz-user-select: none;\" class=\"fd-column-1 sortable-text  forwardSort reverseSort\"><a title=\"Sort on Keyword\" href=\"#\">Keyword</a></th>";
    	echo "<th style=\"-moz-user-select: none;\" class=\"sortable-numeric fd-column-2\"><a title=\"Sort on Monthly Volumek\" href=\"#\">Montly Volume</a></th>";
        echo "<th style=\"-moz-user-select: none;\" class=\"sortable-date-dmy fd-column-3\"><a title=\"Sort on Release Date\" href=\"#\"></a></th>";
       // echo "<th style=\"-moz-user-select: none;\" class=\"sortable-currency fd-column-4\"><a title=\"Sort on Weekly Gross\" href=\"#\">Weekly Gross</a></th>";
       // echo "<th style=\"-moz-user-select: none;\" class=\"sortable-numeric fd-column-5\"><a title=\"Sort on Change\" href=\"#\">Change</a></th>";
       // echo "<th style=\"-moz-user-select: none;\" class=\"sortable-numeric fd-column-6\"><a title=\"Sort on Theaters\" href=\"#\">Theaters</a></th>";
       // echo "<th style=\"-moz-user-select: none;\" class=\"sortable-currency fd-column-7\"><a title=\"Sort on Per Theater\" href=\"#\">Per Theater</a></th>";
       // echo "<th style=\"-moz-user-select: none;\" class=\"sortable-currency fd-column-8\"><a title=\"Sort on Gross\" href=\"#\">Gross</a></th>";
        echo "</tr>";
        echo "</thead>";
        echo "<tbody>";

        //echo "<tr><th>Word #</th>";
       // echo "<th>Keyword</th>";
        //echo "<th>Wordtracker Volume</th>";
        //echo "<th>google Volume</th></tr>";
//Put results in a table


//for($i=1; $i<=count($arr); $i++){
//version below limits list to X
for($i=1;$i<=1000;$i++){

		//$go=$arr[$i] * $arr2[$i];
		echo "<tr><td>";
        echo $i;
        echo "</td><td>";
        echo "<div><h2>".$arr[$i]."</h2>";
        echo "</td><td>";
        echo $arr2[$i];
        echo "</td><td>";
       	//echo "<input type=\"radio\" id=\" selectproduct_id \"onClick=\" selectproduct_idfunc";
		echo "<img src=\"green.jpg\" id=\"selectproduct_id$i\"onClick=\"selectproduct_idfunc";
		echo "('$i');";
		echo "displayValue(this);\" />" ;
       echo "</td></tr>";

 
}

echo "</tbody>";
echo "</table>";
echo "<script type=\"text/javascript\" src=\"sort3_files/tablesort.js\"></script>";


}
?>

<?php

echo "</body></html>";

?>

Open in new window

0
 
Ray PaseurCommented:
I think the best way to do this would be to rearrange the way the $arr array gets built - you might want to change the $arr so that the key of each element is the string, and the value of each element is the count.  Then you can sort the array preserving the key=>value pairs.
0
 
Ray PaseurCommented:
Try this and see if it helps.  Best regards, ~Ray
<?php // RAY_temp_chrisj1963.php
error_reporting(E_ALL);
echo "<pre>\n";

// TEST DATA
$url="http://www.prontopage.net/a_parse/p_datasource.htm";

// READ THE TEST DATA - SEE ALSO PHP FUNCTION file_get_contents()
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
$htm = curl_exec ($ch) or die(curl_error());
curl_close ($ch);

// LOCATE THE TABLE BODY SO WE CAN PROCESS EACH TABLE-ROW SEPARATELY
$arg = '<tbody>';
$poz = strpos($htm, $arg) + strlen($arg);
$htm = substr($htm, $poz);

// LOCATE THE END OF THE TABLE BODY
$arr = explode('</tbody>', $htm);
$htm = $arr[0];

// REMOVE UNWANTED WHITESPACE
$htm = preg_replace('/\s\s+/', ' ', $htm);
$htm = preg_replace('/\n/', '', $htm);

// CREATE AN ARRAY FROM THE TABLE ROWS
$arr = explode('</tr>', $htm);

// ACTIVATE THIS TO SEE THE RAW DATA
// var_dump($arr);

// ITERATE OVER THE EXPLODED HTML
$new = array();
foreach ($arr as $ptr => $str)
{
    if (empty($str)) continue;

    // BREAK UP ON THE END-DATA TAG
    $thing = explode('</td>', $str);

    // ISOLATE THE KEY AND VALUE
    $key = trim(strip_tags($thing[0]));
    $val = trim(strip_tags($thing[1]));

    // SAVE IN THE NEW ARRAY
    $new[$key] = $val;
}
// ACTIVATE THIS TO SEE THE UNSORTED DATA
// var_dump($new);

// SORT THE NEW ARRAY - SEE http://us3.php.net/manual/en/array.sorting.php
arsort($new);
print_r($new);

Open in new window

0

Experts Exchange Solution brought to you by ConnectWise

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
 
chrisj1963Author Commented:
Thanks for the help.  I have struggled for quite a while trying to integrate this code with the code I posted as a reference, so I will be posting a follow-up. You are welcome to comment there if you would like to provide a solution. Thanks again.
0
 
Ray PaseurCommented:
Thanks for the points!  Best, ~Ray
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.