Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 165
  • Last Modified:

Bot page for 2.6mil items, need help with this one

We have a script that we formerly used to let bots "crawl" our site. This is the only thing they were allowed to crawl on our site.

Now we have 2.6mil records in a database table, although there are only 2.06mil distinct records, and we want to allow the bots to "see" each of the "inventory_part_number" value only for each of those records.

This has caused a problem with the current script because of multiple bots accessing the script at the same time. It literally brings mySQL down to it's knees. Here is a shortened version, I've taken out all the HTML and added comments where needed.

How can I improve this to where multiple bots won't cause a problem? Or, on another hand, how can I turn this into a script that can create static HTML pages for each page of data in a particular directory that we will let the bots have access to?

<?php
  // Removed HTML before the php start tag above

  set_time_limit(0);
    
  $connection = @mysql_connect("localhost", "db_admin", "db_pwd");
  if($connection===FALSE)
    die('Could not connect: ' . mysql_error());
  
  if(!mysql_select_db("listings", $connection))
    die('Could not connect: ' . mysql_error());

  // how many rows to show per page
  $rowsPerPage = 10000;
  
  // by default we show first page
  $pageNum = 1;
  
  // if $_GET['page'] defined, use it as page number
  if(isset($_GET['page'])) {
    $pageNum = $_GET['page'];
  }

  // counting the offset
  $offset = ($pageNum - 1) * $rowsPerPage;
  
  $query = "SELECT inventory_part_number FROM linv_inventory
            ORDER BY inventory_part_number LIMIT $offset, $rowsPerPage";

  $result = mysql_query($query) or die('Error, main select query failed');

  echo "<table width=\"100%\" cellspacing=\"2\" cellpadding=\"3\">\n";
  echo "  <tr>\n";
  while($row = mysql_fetch_array($result)) {
    echo "  <tr>\n";
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';
    echo "  </tr>\n";
  } // while($row = mysql_fetch_array($result))

  echo "</table>\n";

  // how many rows we have in database
  $query   = "SELECT COUNT(inventory_part_number) AS numrows FROM linv_inventory";
  $result  = mysql_query($query) or die('Error, count query failed');
  $row     = mysql_fetch_array($result, MYSQL_ASSOC);
  $numrows = $row['numrows'];

  // how many pages we have when using paging?
  $maxPage = ceil($numrows/$rowsPerPage);
  
  // print the link to access each page
  $self = $_SERVER['PHP_SELF'];
  $nav  = '';
  
  for($page = 1; $page <= $maxPage; $page++) {
     if ($page == $pageNum) {
       $nav .= " $page "; // no need to create a link to current page
     } else {
       $nav .= " <a href=\"$self?page=$page\">$page</a> ";
     } // if ($page == $pageNum)
  } // for($page = 1; $page <= $maxPage; $page++)
  
  // creating previous and next link
  // plus the link to go straight to
  // the first and last page
  if ($pageNum > 1) {
    $page  = $pageNum - 1;
    $prev  = " <a href=\"$self?page=$page\">[Prev]</a> ";
    $first = " <a href=\"$self?page=1\">[First Page]</a> ";
  } else {
    $prev  = '&nbsp;'; // we're on page one, don't print previous link
    $first = '&nbsp;'; // nor the first page link
  } // if ($pageNum > 1)
  
  if ($pageNum < $maxPage) {
    $page = $pageNum + 1;
    $next = " <a href=\"$self?page=$page\">[Next]</a> ";
    $last = " <a href=\"$self?page=$maxPage\">[Last Page]</a> ";
  } else {
    $next = '&nbsp;'; // we're on the last page, don't print next link
    $last = '&nbsp;'; // nor the last page link
  } // if ($pageNum < $maxPage)
  
  // print the navigation link
  echo  $prev . $nav . $next ;
  
  // and close the database connection
  mysql_close($connection);
  
  // Removed HTML after the php end tag below
?>

Open in new window

0
EddieShipman
Asked:
EddieShipman
1 Solution
 
tsmgeekCommented:
dont let bots access the script directly as this is sure to end up in problems

1. make bots access a static file instead of the php file
2. edit script so it first writes all its output into a tmp file
3. rename the tmp file into the other static file which the bots access directly, as you are doing a rename its just an inode change and happens imediatly and the next time
4. put script in crontab so its run at set intervals
0
 
EddieShipmanAuthor Commented:
Building static pages from CSV create using select into OUTFILE
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now