?
Solved

Bot page for 2.6mil items, need help with this one

Posted on 2011-03-23
2
Medium Priority
?
159 Views
Last Modified: 2012-05-11
We have a script that we formerly used to let bots "crawl" our site. This is the only thing they were allowed to crawl on our site.

Now we have 2.6mil records in a database table, although there are only 2.06mil distinct records, and we want to allow the bots to "see" each of the "inventory_part_number" value only for each of those records.

This has caused a problem with the current script because of multiple bots accessing the script at the same time. It literally brings mySQL down to it's knees. Here is a shortened version, I've taken out all the HTML and added comments where needed.

How can I improve this to where multiple bots won't cause a problem? Or, on another hand, how can I turn this into a script that can create static HTML pages for each page of data in a particular directory that we will let the bots have access to?

<?php
  // Removed HTML before the php start tag above

  set_time_limit(0);
    
  $connection = @mysql_connect("localhost", "db_admin", "db_pwd");
  if($connection===FALSE)
    die('Could not connect: ' . mysql_error());
  
  if(!mysql_select_db("listings", $connection))
    die('Could not connect: ' . mysql_error());

  // how many rows to show per page
  $rowsPerPage = 10000;
  
  // by default we show first page
  $pageNum = 1;
  
  // if $_GET['page'] defined, use it as page number
  if(isset($_GET['page'])) {
    $pageNum = $_GET['page'];
  }

  // counting the offset
  $offset = ($pageNum - 1) * $rowsPerPage;
  
  $query = "SELECT inventory_part_number FROM linv_inventory
            ORDER BY inventory_part_number LIMIT $offset, $rowsPerPage";

  $result = mysql_query($query) or die('Error, main select query failed');

  echo "<table width=\"100%\" cellspacing=\"2\" cellpadding=\"3\">\n";
  echo "  <tr>\n";
  while($row = mysql_fetch_array($result)) {
    echo "  <tr>\n";
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';
    echo "  </tr>\n";
  } // while($row = mysql_fetch_array($result))

  echo "</table>\n";

  // how many rows we have in database
  $query   = "SELECT COUNT(inventory_part_number) AS numrows FROM linv_inventory";
  $result  = mysql_query($query) or die('Error, count query failed');
  $row     = mysql_fetch_array($result, MYSQL_ASSOC);
  $numrows = $row['numrows'];

  // how many pages we have when using paging?
  $maxPage = ceil($numrows/$rowsPerPage);
  
  // print the link to access each page
  $self = $_SERVER['PHP_SELF'];
  $nav  = '';
  
  for($page = 1; $page <= $maxPage; $page++) {
     if ($page == $pageNum) {
       $nav .= " $page "; // no need to create a link to current page
     } else {
       $nav .= " <a href=\"$self?page=$page\">$page</a> ";
     } // if ($page == $pageNum)
  } // for($page = 1; $page <= $maxPage; $page++)
  
  // creating previous and next link
  // plus the link to go straight to
  // the first and last page
  if ($pageNum > 1) {
    $page  = $pageNum - 1;
    $prev  = " <a href=\"$self?page=$page\">[Prev]</a> ";
    $first = " <a href=\"$self?page=1\">[First Page]</a> ";
  } else {
    $prev  = '&nbsp;'; // we're on page one, don't print previous link
    $first = '&nbsp;'; // nor the first page link
  } // if ($pageNum > 1)
  
  if ($pageNum < $maxPage) {
    $page = $pageNum + 1;
    $next = " <a href=\"$self?page=$page\">[Next]</a> ";
    $last = " <a href=\"$self?page=$maxPage\">[Last Page]</a> ";
  } else {
    $next = '&nbsp;'; // we're on the last page, don't print next link
    $last = '&nbsp;'; // nor the last page link
  } // if ($pageNum < $maxPage)
  
  // print the navigation link
  echo  $prev . $nav . $next ;
  
  // and close the database connection
  mysql_close($connection);
  
  // Removed HTML after the php end tag below
?>

Open in new window

0
Comment
Question by:EddieShipman
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 5

Accepted Solution

by:
tsmgeek earned 2000 total points
ID: 35201401
dont let bots access the script directly as this is sure to end up in problems

1. make bots access a static file instead of the php file
2. edit script so it first writes all its output into a tmp file
3. rename the tmp file into the other static file which the bots access directly, as you are doing a rename its just an inode change and happens imediatly and the next time
4. put script in crontab so its run at set intervals
0
 
LVL 26

Author Closing Comment

by:EddieShipman
ID: 35235586
Building static pages from CSV create using select into OUTFILE
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Ready to improve network connectivity? Watch this webinar to learn how SD-WANs and a one-click instant connect tool can boost provisions, deployment, and management of your cloud connection.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Foreword (July, 2015) Since I first wrote this article, years ago, a great many more people have begun using the internet.  They are coming online from every part of the globe, learning, reading, shopping and spending money at an ever-increasing ra…
Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
Suggested Courses

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question