• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 167
  • Last Modified:

Bot page for 2.6mil items, need help with this one

We have a script that we formerly used to let bots "crawl" our site. This is the only thing they were allowed to crawl on our site.

Now we have 2.6mil records in a database table, although there are only 2.06mil distinct records, and we want to allow the bots to "see" each of the "inventory_part_number" value only for each of those records.

This has caused a problem with the current script because of multiple bots accessing the script at the same time. It literally brings mySQL down to it's knees. Here is a shortened version, I've taken out all the HTML and added comments where needed.

How can I improve this to where multiple bots won't cause a problem? Or, on another hand, how can I turn this into a script that can create static HTML pages for each page of data in a particular directory that we will let the bots have access to?

<?php
  // Removed HTML before the php start tag above

  set_time_limit(0);
    
  $connection = @mysql_connect("localhost", "db_admin", "db_pwd");
  if($connection===FALSE)
    die('Could not connect: ' . mysql_error());
  
  if(!mysql_select_db("listings", $connection))
    die('Could not connect: ' . mysql_error());

  // how many rows to show per page
  $rowsPerPage = 10000;
  
  // by default we show first page
  $pageNum = 1;
  
  // if $_GET['page'] defined, use it as page number
  if(isset($_GET['page'])) {
    $pageNum = $_GET['page'];
  }

  // counting the offset
  $offset = ($pageNum - 1) * $rowsPerPage;
  
  $query = "SELECT inventory_part_number FROM linv_inventory
            ORDER BY inventory_part_number LIMIT $offset, $rowsPerPage";

  $result = mysql_query($query) or die('Error, main select query failed');

  echo "<table width=\"100%\" cellspacing=\"2\" cellpadding=\"3\">\n";
  echo "  <tr>\n";
  while($row = mysql_fetch_array($result)) {
    echo "  <tr>\n";
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';
    echo "  </tr>\n";
  } // while($row = mysql_fetch_array($result))

  echo "</table>\n";

  // how many rows we have in database
  $query   = "SELECT COUNT(inventory_part_number) AS numrows FROM linv_inventory";
  $result  = mysql_query($query) or die('Error, count query failed');
  $row     = mysql_fetch_array($result, MYSQL_ASSOC);
  $numrows = $row['numrows'];

  // how many pages we have when using paging?
  $maxPage = ceil($numrows/$rowsPerPage);
  
  // print the link to access each page
  $self = $_SERVER['PHP_SELF'];
  $nav  = '';
  
  for($page = 1; $page <= $maxPage; $page++) {
     if ($page == $pageNum) {
       $nav .= " $page "; // no need to create a link to current page
     } else {
       $nav .= " <a href=\"$self?page=$page\">$page</a> ";
     } // if ($page == $pageNum)
  } // for($page = 1; $page <= $maxPage; $page++)
  
  // creating previous and next link
  // plus the link to go straight to
  // the first and last page
  if ($pageNum > 1) {
    $page  = $pageNum - 1;
    $prev  = " <a href=\"$self?page=$page\">[Prev]</a> ";
    $first = " <a href=\"$self?page=1\">[First Page]</a> ";
  } else {
    $prev  = '&nbsp;'; // we're on page one, don't print previous link
    $first = '&nbsp;'; // nor the first page link
  } // if ($pageNum > 1)
  
  if ($pageNum < $maxPage) {
    $page = $pageNum + 1;
    $next = " <a href=\"$self?page=$page\">[Next]</a> ";
    $last = " <a href=\"$self?page=$maxPage\">[Last Page]</a> ";
  } else {
    $next = '&nbsp;'; // we're on the last page, don't print next link
    $last = '&nbsp;'; // nor the last page link
  } // if ($pageNum < $maxPage)
  
  // print the navigation link
  echo  $prev . $nav . $next ;
  
  // and close the database connection
  mysql_close($connection);
  
  // Removed HTML after the php end tag below
?>

Open in new window

0
Eddie Shipman
Asked:
Eddie Shipman
1 Solution
 
tsmgeekCommented:
dont let bots access the script directly as this is sure to end up in problems

1. make bots access a static file instead of the php file
2. edit script so it first writes all its output into a tmp file
3. rename the tmp file into the other static file which the bots access directly, as you are doing a rename its just an inode change and happens imediatly and the next time
4. put script in crontab so its run at set intervals
0
 
Eddie ShipmanAll-around developerAuthor Commented:
Building static pages from CSV create using select into OUTFILE
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now