Link to home
Start Free TrialLog in
Avatar of Eddie Shipman
Eddie ShipmanFlag for United States of America

asked on

Helping speed up script

I'm trying to speed up a script that we use to let bots crawl our site. This is the only page that the bots are allowed to access.

We have over 2 million records in our database and we want one column from the table `linv_inventory` shown in an HTML table paged in 10,000 row increments.
  // counting the offset 
  // $rowsPerPage = 10000
  $offset = ($pageNum - 1) * $rowsPerPage;
  $query = "SELECT `inventory_part_number` FROM `linv_inventory` ORDER BY `inventory_part_number` LIMIT $offset, $rowsPerPage";

  $result = mysql_query($query) or die('Error, query failed');

Open in new window

Which of these would be faster? The timing results that I have gotten have really been inconclusive.

In the first, we query the first 10,000 records, extract them into an array, then
iterate through the array in a foreach.
  $element = array(true);
  while($row = mysql_fetch_array($result))
    $element[] = trim($row[0]);
  
  echo "<table width=\"100%\" cellspacing=\"2\" cellpadding=\"3\">\n";
  $i = 0;
  foreach($element as $cell) {
    echo '<td>' . $cell . '</td>';
    if($i%5==0) {
      echo "  </tr>\n";
      echo "  <tr>\n";
    }
    $i++;
  } // foreach($element as $cell)
  echo "  </tr>\n";
  echo "</table>\n";

Open in new window


In the second, we just iterate over the result set using mysql_fetch_array().
  while($row = mysql_fetch_array($result)) {
    echo "  <tr>\n";
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';
    echo "  </tr>\n";
  } // while($row = mysql_fetch_array($result))

Open in new window


Are here any other means that I may use to make this faster as this script has been causing problems due to the number of bots that hit it simultaneously and bringing my database down to a crawl.

ASKER CERTIFIED SOLUTION
Avatar of Swafnil
Swafnil
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Eddie Shipman

ASKER

1. Already have index on that column.
2. Yes, I forgot about using that instead.

Output buffering, hmmm, never thought of that, either.
Are you saying that I should lower my page size value from 10000 to, say, 2000?

AFA the quotes go, I inherited this code and didn't change much of it. It had been working flawlessly for months and all of a sudden the bots
started hitting it more frequently and, in the case of Google, with more than one bot at a time, and it was bringing my server to it's knees.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
As shefen state, in case your item ids do not change too often, consider generating static pages for bots and restrict them to the generated pages. Another option would be to restrict bots from crawling too frequently, or using caching mechanisms like memcache.
Customers can upload inventory files at will so it can literally change within seconds.
You know, on another thought, it may be ok to just generate daily statics.
I can run a cron job to do it and then I wouldn't have to worry.
I can put the files in one directory and then limit the bots to it.
You'll have to decide if you want to continue allowing robots to crawl so frequently (which will result in higher hardware costs, you'll need to upgrade hardware as soon as you'll reach the critical mass of bot accesses) or go for a caching mechanism with fewer updates to the inventory.
I would recommend to check your access logs and create statistical data about the frequency of accesses, combine it to performance data gathered parallely and also have a look if the load is on the database or webserver side.
In your case you are dealing with a near realtime data system with a high data volume so topics like clustering, database tweaking and performance monitoring could be worth reading!
Just had an idea this morning which could be worth a try:

because you only need a single column from each row, you can group_concat() your fields on the database side and return the result in just one line:

$query= "SELECT GROUP_CONCAT(`inventory_part_number`) FROM `linv_inventory` ORDER BY `inventory_part_number` LIMIT $offset, $rowsPerPage";

$res= mysql_query($query);
if ($res) {
  $row= mysql_fetch_row($res);
  // the values are comma separated, explode() them to get all part numbers
  $numbers= explode(',', $row[0]);
}

Open in new window

That'll cause one statement and merge operation on the database side, but will reduce the amount of resource accesses on the database connection and the amount of data being transferred between database and mysql client.

I wonder if this page is solely used by bots, why you are using tables? And how are these search results presented when you search for them on Google?
Good catch, Swafnil. I'll give that a try.
Gonna have to rethink how to create statics from the array, though.
How do I set the group_concat_max_len? It keeps coming up with only 1024 chars.
You have to set the group_concat_max_len before querying the database:

mysql_query("SET SESSION group_concat_max_len = 1048576"); // depending on max_allowed_packet
$query= "SELECT GROUP_CONCAT(`inventory_part_number`) FROM `linv_inventory` ORDER BY `inventory_part_number` LIMIT $offset, $rowsPerPage";

$res= mysql_query($query);

Open in new window


To set the maximum packet length, see:

http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_max_allowed_packet
Nah, that ain't gonna work either, it only sends 1024kb. my max_allowed_packet can't be modified.
Is your MySQL database hosted or are you in charge of it? If I got the documentation right, you can also set the max_allowed_packet setting system-wide, so you won't need to try to change the packet size inside the script (which didn't work if I got you right).
If you have no way to change the setting, you can work with subselects and group_concat to combine a set of inventory_part_numbers in one field to a maximum length of about 1 MB and then continue with the next row of combined inventory_part_numbers.
Another option would be to mysqldump() the numbers to a directory on a regular basis and use the dump files instead of direct database access. I'll investigate a bit more about best practices on the topic.
No need. I think I'll tweak what we have.
Am rethinking the entire thing.