[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 290
  • Last Modified:

php sort when scraping data

Hi,
I have source data here:
http://prontopage.net/a_parse/p_datasource.htm
(note that the Monthly Volume column is unsorted)

I have a data scraper here:
http://prontopage.net/a_parse/p_scrape.php

I have a javascript that automatically sorts the Monthly Volume column after the data is displayed on the page, but that is delayed/slow after the page is loaded.

Is there a way to change the scrape.php code to sort the Monthly Volume data when it prints it to the page rather than using javacript to do the sorting?

Help is appreciated.
<!--<style>
table { text-align: left; border-collapse: collapse; }
tr:hover { background: blue; color: white }
th, td { padding: 7px }
</style>-->
<table>
<thead>
<tr><th>Keyword</th><th>Monthly Molume</th></tr>
</thead>

<tbody><tr><td class=1>mid pines north carolina</td><td class=2>12</td></tr>
<tr><td class=1>southern pines resort</td><td class=2>16</td></tr>
<tr><td class=1>pine needles golf course nc</td><td class=2>28</td></tr>

<tr><td class=1>north carolina golf club</td><td class=2>58</td></tr>
<tr><td class=1>golf southern</td><td class=2>12</td></tr>
<tr><td class=1>and pines inn</td><td class=2>0</td></tr>
<tr><td class=1>north carolina pine trees</td><td class=2>58</td></tr>
<tr><td class=1>nc golf course</td><td class=2>73</td></tr>
<tr><td class=1>and mid to</td><td class=2>0</td></tr>

<tr><td class=1>restaurants in southern pines nc</td><td class=2>58</td></tr>
<tr><td class=1>best golf courses north carolina</td><td class=2>22</td></tr>
<tr><td class=1>needles pine</td><td class=2>58</td></tr>
<tr><td class=1>best golf in north carolina</td><td class=2>12</td></tr>
<tr><td class=1>southern pine north carolina</td><td class=2>16</td></tr>
<tr><td class=1>north carolina golf</td><td class=2>1900</td></tr>

<tr><td class=1>southern pines golf nc</td><td class=2>12</td></tr>
<tr><td class=1>long needle pines</td><td class=2>16</td></tr>
<tr><td class=1>southern pine nc</td><td class=2>58</td></tr>
<tr><td class=1>and golf club southern pines</td><td class=2>0</td></tr>
<tr><td class=1>club lodge</td><td class=2>12</td></tr>
<tr><td class=1>carolina course golf north</td><td class=2>12</td></tr>

<tr><td class=1>pinehurst golf nc</td><td class=2>73</td></tr>
<tr><td class=1>golf courses pinehurst nc</td><td class=2>28</td></tr>
<tr><td class=1>pine needles golf club nc</td><td class=2>12</td></tr>
<tr><td class=1>needles</td><td class=2>33100</td></tr>
<tr><td class=1>golf course north carolina</td><td class=2>58</td></tr>
<tr><td class=1>seagrove resorts</td><td class=2>110</td></tr>

<tr><td class=1>pine's</td><td class=2>58</td></tr>
<tr><td class=1>golf pines</td><td class=2>12</td></tr>
<tr><td class=1>and golf club southern</td><td class=2>0</td></tr>
<tr><td class=1>pine needle golf</td><td class=2>46</td></tr>
<tr><td class=1>mid pines golf resort</td><td class=2>28</td></tr>
<tr><td class=1>lodge country club</td><td class=2>36</td></tr>

<tr><td class=1>and southern pines north</td><td class=2>0</td></tr>
<tr><td class=1>golf nc</td><td class=2>390</td></tr>
<tr><td class=1>and resort in southern</td><td class=2>0</td></tr>
<tr><td class=1>pine needles golf resort</td><td class=2>73</td></tr>
<tr><td class=1>carolina in the pine</td><td class=2>16</td></tr>
<tr><td class=1>north carolina golfing</td><td class=2>36</td></tr>

<tr><td class=1>nc golf courses</td><td class=2>590</td></tr>
<tr><td class=1>golf lessons nc</td><td class=2>28</td></tr>
<tr><td class=1>golf courses in north carolina</td><td class=2>320</td></tr>
<tr><td class=1>southern pines restaurant</td><td class=2>22</td></tr>
<tr><td class=1>carolina golf club</td><td class=2>480</td></tr>
<tr><td class=1>north carolina southern pines</td><td class=2>28</td></tr>

<tr><td class=1>mid pine</td><td class=2>12</td></tr>
<tr><td class=1>pine needles golf</td><td class=2>720</td></tr>
<tr><td class=1>mid carolina country club</td><td class=2>58</td></tr>
<tr><td class=1>at southern pines</td><td class=2>0</td></tr>
<tr><td class=1>golf resorts in north carolina</td><td class=2>110</td></tr>
<tr><td class=1>golf resort nc</td><td class=2>16</td></tr>

<tr><td class=1>golf packages in pinehurst nc</td><td class=2>12</td></tr>
<tr><td class=1>pines nc</td><td class=2>16</td></tr>
<tr><td class=1>pine needles golf club</td><td class=2>170</td></tr>
<tr><td class=1>and mid and</td><td class=2>0</td></tr>
<tr><td class=1>north carolina golf vacations</td><td class=2>320</td></tr>
<tr><td class=1>golf courses in pinehurst nc</td><td class=2>46</td></tr>

<tr><td class=1>pine needles nc</td><td class=2>110</td></tr>
<tr><td class=1>souther pines nc</td><td class=2>28</td></tr>
<tr><td class=1>pine needles golf course</td><td class=2>480</td></tr>
<tr><td class=1>pine resorts</td><td class=2>140</td></tr>
<tr><td class=1>top nc golf courses</td><td class=2>16</td></tr>
<tr><td class=1>& golf club southern pines</td><td class=2>0</td></tr>

<tr><td class=1>mid golf</td><td class=2>16</td></tr>
<tr><td class=1>southern pines country club</td><td class=2>110</td></tr>
<tr><td class=1>courses in nc</td><td class=2>22</td></tr>
<tr><td class=1>nc golf vacations</td><td class=2>91</td></tr>
<tr><td class=1>pineneedle com</td><td class=2>22</td></tr>
<tr><td class=1>pine needles golf course north carolina</td><td class=2>16</td></tr>

<tr><td class=1>the carolina golf club</td><td class=2>73</td></tr>
<tr><td class=1>1005 midland road southern</td><td class=2>0</td></tr>
<tr><td class=1>pine north carolina</td><td class=2>16</td></tr>
<tr><td class=1>carolina pines inn</td><td class=2>22</td></tr>
<tr><td class=1>best golf courses in nc</td><td class=2>28</td></tr>
<tr><td class=1>lodge pine</td><td class=2>22</td></tr>

<tr><td class=1>pines of carolina</td><td class=2>320</td></tr>
<tr><td class=1>a pine needle</td><td class=2>0</td></tr>
<tr><td class=1>north carolina golf clubs</td><td class=2>28</td></tr>
<tr><td class=1>the pine needles</td><td class=2>22</td></tr>
<tr><td class=1>mid pines golf course</td><td class=2>91</td></tr>
<tr><td class=1>golf resorts in nc</td><td class=2>28</td></tr>

<tr><td class=1>north carolina golf courses</td><td class=2>1300</td></tr>
<tr><td class=1>golf course nc</td><td class=2>58</td></tr>
<tr><td class=1>southern pine needles</td><td class=2>12</td></tr>
<tr><td class=1>mid southern com</td><td class=2>22</td></tr>
<tr><td class=1>southern pines</td><td class=2>1900</td></tr>
<tr><td class=1>mid pines country club</td><td class=2>16</td></tr>

<tr><td class=1>carolina golf club nc</td><td class=2>12</td></tr>
<tr><td class=1>north carolina golf packages</td><td class=2>480</td></tr>
<tr><td class=1>golf packages in north carolina</td><td class=2>73</td></tr>
<tr><td class=1>mid pines</td><td class=2>320</td></tr>
<tr><td class=1>south pines north carolina</td><td class=2>12</td></tr>
<tr><td class=1>north carolina pine</td><td class=2>28</td></tr>

<tr><td class=1>golf courses nc</td><td class=2>170</td></tr>
<tr><td class=1>golf resort north carolina</td><td class=2>36</td></tr>
<tr><td class=1>southern pines hotels</td><td class=2>9900</td></tr>
<tr><td class=1>southern pines nc 28387</td><td class=2>36</td></tr>
<tr><td class=1>golf courses north carolina</td><td class=2>210</td></tr>
<tr><td class=1>pine golf clubs</td><td class=2>36</td></tr>

<tr><td class=1>club mid</td><td class=2>73</td></tr>
</tbody>
</table>

Open in new window

0
chrisj1963
Asked:
chrisj1963
  • 3
  • 2
1 Solution
 
chrisj1963Author Commented:
here is the php page.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml" lang="en"><head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

<style type="text/css">
body { color: white; background: #52616F; }
a { color: white; }
</style>

<!--code for select radio button-->
<script type="text/javascript">  
  function   selectproduct_idfunc(text) {
  document.getElementById("selectproduct_id" + document.getElementById("selectproduct_id").value).style.border = "2px solid white";
  document.getElementById("selectproduct_id" + text).style.border = "3px dotted #ffcc00";
  document.getElementById("selectproduct_id").value = text;
}
function displayValue(element)
{
        var h2text = element.parentNode.parentNode.getElementsByTagName('h2')[0].innerHTML;
        document.getElementById('selectproduct_display_dummy').value = h2text.substr(0, h2text.length);
}
</script>
<!--end code for select radio button-->

<style type="text/css">
body { color: white; background: #52616F; }
a { color: white; }
</style>
  <title>frequency decoder ~ table sort (revisited) demo</title>
  <link href="sort3_files/demo.css" rel="stylesheet" type="text/css">
</head>
<br /><br /> <br /><br />
<form name="form1" id="form1" method="get" action=""> 
  <div align="center"> 
    <p>  
      <input name="s" type="text" id="s" size="50" /> 
      <input type="submit" name="Submit" value="Find Related Keywords" /> 
    </p> 
  </div> 
</form> 

 <input type="text" class="txtbox" id="selectproduct_display_dummy" value="1" readonly /><br>
 <input type="hidden" class="txtbox" id="selectproduct_id" name="product_id" value="1" readonly /><br>
 <!--<input type="submit" value="Submit" id="submit">-->
 
<?php // RAY_temp_scrape.php

$s=$_GET['s']; 
$s = str_replace(' ','+',$s);
if(isset($s)) 
{ 
    //*******F I R S T   P A R T********* 
    //Find the number of pages indexed for the searched term 
    //*From previous part 
    echo "<p><i>Search for $s</i></p>"; 

    $s=urlencode($s); 



//error_reporting(E_ALL);
echo "<pre>\n"; // IMPROVE READABILITY
// $s="bus company";
// FROM THE OP - READ THE GOOGLE PAGE HTML WITH CURL
//$url="http://www.prontopage.net/testhtml2.htm";
// the p=10 below is dummy because I had to put something in there..
$url="http://www.prontopage.net/a_parse/p_datasource.htm";

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
$file=curl_exec ($ch) or die(curl_error());
curl_close ($ch);

 
// CREATE AN ARRAY FROM THE HTML-----------------------------1. Domain
$arr = explode("<td class=1>", $file);
// DISCARD THE UNWANTED STUFF AT THE TOP OF THE HTML
unset($arr[0]);
// TIDY UP EACH ELEMENT OF THE ARRAY
foreach ($arr as $ptr => $string)
{
// LOCATE THE END OF USEFUL DATA
    $poz = strpos($string, "</td>");
// END OF DATA NOT FOUND - SKIP THIS ELEMENT
    if ($poz === FALSE)
    {
        unset($arr[$ptr]);
        continue;
    }
// REMOVE USELESS TRAILING DATA AND REPAIR HTML (XML) ENTITIES
    $arr[$ptr] = substr($string, 0, $poz);
    $arr[$ptr] = str_replace('&amp;', '&', $arr[$ptr]);
}
 
// CREATE AN ARRAY FROM THE HTML-----------------------------2. Phone Number
$arr2 = explode("<td class=2>", $file);
// DISCARD THE UNWANTED STUFF AT THE TOP OF THE HTML
unset($arr2[0]);
// TIDY UP EACH ELEMENT OF THE ARRAY
foreach ($arr2 as $ptr => $string2)
{
// LOCATE THE END OF USEFUL DATA
    $poz = strpos($string2, "</td>");
// END OF DATA NOT FOUND - SKIP THIS ELEMENT
    if ($poz === FALSE)
    {
        unset($arr2[$ptr]);
        continue;
    }
	

// REMOVE USELESS TRAILING DATA AND REPAIR HTML (XML) ENTITIES
    $arr2[$ptr] = substr($string2, 0, $poz);
    $arr2[$ptr] = str_replace('&amp;', '&', $arr2[$ptr]);
}

 
// ACTIVATE THIS TO SEE THE ARRAY
// var_dump($arr);
//CJ set up the table

echo "<form id=\"multiForm\" name=\"theForm\" method=\"POST\" action=\"http://www.prontopage.net/member/signup.php\" >";
        echo "<body class=\"\">";
		echo "<h1>Keyword Estimated Monthly Volume in Google</h1>";
		echo "<h2>Limited to 100 Results</h2>";
        echo "<table id=\"test1\" class=\"sortable-onload-3-reverse rowstyle-alt no-arrow\" border=\"0\" cellpadding=\"0\" cellspacing=\"0\">";
		echo "<caption>Data Courtesy of seoquake (udpated monthly)</caption>";
  		echo "<thead>";
    	echo "<tr>";
		echo "<th style=\"-moz-user-select: none;\" class=\"sortable-keep fd-column-0\"><a title=\"Sort on Position\" href=\"#\">Number</a></th>";
        echo "<th style=\"-moz-user-select: none;\" class=\"fd-column-1 sortable-text  forwardSort reverseSort\"><a title=\"Sort on Keyword\" href=\"#\">Keyword</a></th>";
    	echo "<th style=\"-moz-user-select: none;\" class=\"sortable-numeric fd-column-2\"><a title=\"Sort on Monthly Volumek\" href=\"#\">Montly Volume</a></th>";
        echo "<th style=\"-moz-user-select: none;\" class=\"sortable-date-dmy fd-column-3\"><a title=\"Sort on Release Date\" href=\"#\"></a></th>";
       // echo "<th style=\"-moz-user-select: none;\" class=\"sortable-currency fd-column-4\"><a title=\"Sort on Weekly Gross\" href=\"#\">Weekly Gross</a></th>";
       // echo "<th style=\"-moz-user-select: none;\" class=\"sortable-numeric fd-column-5\"><a title=\"Sort on Change\" href=\"#\">Change</a></th>";
       // echo "<th style=\"-moz-user-select: none;\" class=\"sortable-numeric fd-column-6\"><a title=\"Sort on Theaters\" href=\"#\">Theaters</a></th>";
       // echo "<th style=\"-moz-user-select: none;\" class=\"sortable-currency fd-column-7\"><a title=\"Sort on Per Theater\" href=\"#\">Per Theater</a></th>";
       // echo "<th style=\"-moz-user-select: none;\" class=\"sortable-currency fd-column-8\"><a title=\"Sort on Gross\" href=\"#\">Gross</a></th>";
        echo "</tr>";
        echo "</thead>";
        echo "<tbody>";

        //echo "<tr><th>Word #</th>";
       // echo "<th>Keyword</th>";
        //echo "<th>Wordtracker Volume</th>";
        //echo "<th>google Volume</th></tr>";
//Put results in a table


//for($i=1; $i<=count($arr); $i++){
//version below limits list to X
for($i=1;$i<=1000;$i++){

		//$go=$arr[$i] * $arr2[$i];
		echo "<tr><td>";
        echo $i;
        echo "</td><td>";
        echo "<div><h2>".$arr[$i]."</h2>";
        echo "</td><td>";
        echo $arr2[$i];
        echo "</td><td>";
       	//echo "<input type=\"radio\" id=\" selectproduct_id \"onClick=\" selectproduct_idfunc";
		echo "<img src=\"green.jpg\" id=\"selectproduct_id$i\"onClick=\"selectproduct_idfunc";
		echo "('$i');";
		echo "displayValue(this);\" />" ;
       echo "</td></tr>";

 
}

echo "</tbody>";
echo "</table>";
echo "<script type=\"text/javascript\" src=\"sort3_files/tablesort.js\"></script>";


}
?>

<?php

echo "</body></html>";

?>

Open in new window

0
 
Ray PaseurCommented:
I think the best way to do this would be to rearrange the way the $arr array gets built - you might want to change the $arr so that the key of each element is the string, and the value of each element is the count.  Then you can sort the array preserving the key=>value pairs.
0
 
Ray PaseurCommented:
Try this and see if it helps.  Best regards, ~Ray
<?php // RAY_temp_chrisj1963.php
error_reporting(E_ALL);
echo "<pre>\n";

// TEST DATA
$url="http://www.prontopage.net/a_parse/p_datasource.htm";

// READ THE TEST DATA - SEE ALSO PHP FUNCTION file_get_contents()
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
$htm = curl_exec ($ch) or die(curl_error());
curl_close ($ch);

// LOCATE THE TABLE BODY SO WE CAN PROCESS EACH TABLE-ROW SEPARATELY
$arg = '<tbody>';
$poz = strpos($htm, $arg) + strlen($arg);
$htm = substr($htm, $poz);

// LOCATE THE END OF THE TABLE BODY
$arr = explode('</tbody>', $htm);
$htm = $arr[0];

// REMOVE UNWANTED WHITESPACE
$htm = preg_replace('/\s\s+/', ' ', $htm);
$htm = preg_replace('/\n/', '', $htm);

// CREATE AN ARRAY FROM THE TABLE ROWS
$arr = explode('</tr>', $htm);

// ACTIVATE THIS TO SEE THE RAW DATA
// var_dump($arr);

// ITERATE OVER THE EXPLODED HTML
$new = array();
foreach ($arr as $ptr => $str)
{
    if (empty($str)) continue;

    // BREAK UP ON THE END-DATA TAG
    $thing = explode('</td>', $str);

    // ISOLATE THE KEY AND VALUE
    $key = trim(strip_tags($thing[0]));
    $val = trim(strip_tags($thing[1]));

    // SAVE IN THE NEW ARRAY
    $new[$key] = $val;
}
// ACTIVATE THIS TO SEE THE UNSORTED DATA
// var_dump($new);

// SORT THE NEW ARRAY - SEE http://us3.php.net/manual/en/array.sorting.php
arsort($new);
print_r($new);

Open in new window

0
 
chrisj1963Author Commented:
Thanks for the help.  I have struggled for quite a while trying to integrate this code with the code I posted as a reference, so I will be posting a follow-up. You are welcome to comment there if you would like to provide a solution. Thanks again.
0
 
Ray PaseurCommented:
Thanks for the points!  Best, ~Ray
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now