Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 214
  • Last Modified:

Helping speed up script

I'm trying to speed up a script that we use to let bots crawl our site. This is the only page that the bots are allowed to access.

We have over 2 million records in our database and we want one column from the table `linv_inventory` shown in an HTML table paged in 10,000 row increments.
  // counting the offset 
  // $rowsPerPage = 10000
  $offset = ($pageNum - 1) * $rowsPerPage;
  $query = "SELECT `inventory_part_number` FROM `linv_inventory` ORDER BY `inventory_part_number` LIMIT $offset, $rowsPerPage";

  $result = mysql_query($query) or die('Error, query failed');

Open in new window

Which of these would be faster? The timing results that I have gotten have really been inconclusive.

In the first, we query the first 10,000 records, extract them into an array, then
iterate through the array in a foreach.
  $element = array(true);
  while($row = mysql_fetch_array($result))
    $element[] = trim($row[0]);
  
  echo "<table width=\"100%\" cellspacing=\"2\" cellpadding=\"3\">\n";
  $i = 0;
  foreach($element as $cell) {
    echo '<td>' . $cell . '</td>';
    if($i%5==0) {
      echo "  </tr>\n";
      echo "  <tr>\n";
    }
    $i++;
  } // foreach($element as $cell)
  echo "  </tr>\n";
  echo "</table>\n";

Open in new window


In the second, we just iterate over the result set using mysql_fetch_array().
  while($row = mysql_fetch_array($result)) {
    echo "  <tr>\n";
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';
    echo "  </tr>\n";
  } // while($row = mysql_fetch_array($result))

Open in new window


Are here any other means that I may use to make this faster as this script has been causing problems due to the number of bots that hit it simultaneously and bringing my database down to a crawl.

0
EddieShipman
Asked:
EddieShipman
  • 9
  • 6
2 Solutions
 
SwafnilCommented:
Both statements are not ideal, the first given example simply wastes memory through creating an unneeded array, the second doesn't make sense at all because the repeated mysql_fetch_array() calls inside of the while loop don't increase performance and even create unnecessary notices causing error log entries depending on your webserver settings.

Here are some hints on how to increase performance:
1. create an index for the used query on MySQL-side:
CREATE INDEX ix_linv_part ON linv_inventory (inventory_part_number);

Open in new window


2. Use mysql_fetch_row() instead of mysql_fetch_array()
(mysql_fetch_array() offers some advantages - which you don't need in your scenario - but is a bit slower)
  // $rowsPerPage = 10000
  $offset = ($pageNum - 1) * $rowsPerPage;
  $query = "SELECT `inventory_part_number` FROM `linv_inventory` ORDER BY `inventory_part_number` LIMIT $offset, $rowsPerPage";

  echo '<table width="100%" cellspacing="2" cellpadding="3">'."\n";
  echo '<tr>'."\n";
  $rows = 0;

  $result = mysql_query($query) or die('Error, query failed');
  while($row = mysql_fetch_row($result))
  
    echo '<td>' . $row[0] . '</td>';
    if($rows % 5 == 0) {
      echo '  </tr>\n';
      echo '  <tr>\n';
    }
    $rows++;
  }
  echo "  </tr>\n";
  echo "</table>\n";

Open in new window


Some more things to consider for the last pieces of performance gains:
* use single quotes wherever possible, i.e. when outputting through print(); simple quote string don't get parsed and are directly output
* deactivate output buffering to reduce load on the webserver side
* consider smaller page sizes because large pages cause longer wait periods through document generation and transmission
0
 
EddieShipmanAuthor Commented:
1. Already have index on that column.
2. Yes, I forgot about using that instead.

Output buffering, hmmm, never thought of that, either.
Are you saying that I should lower my page size value from 10000 to, say, 2000?

AFA the quotes go, I inherited this code and didn't change much of it. It had been working flawlessly for months and all of a sudden the bots
started hitting it more frequently and, in the case of Google, with more than one bot at a time, and it was bringing my server to it's knees.
0
 
shefenCommented:
How often does the data change?
Have you considered periodically generating static pages for the bots to read?
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
SwafnilCommented:
As shefen state, in case your item ids do not change too often, consider generating static pages for bots and restrict them to the generated pages. Another option would be to restrict bots from crawling too frequently, or using caching mechanisms like memcache.
0
 
EddieShipmanAuthor Commented:
Customers can upload inventory files at will so it can literally change within seconds.
0
 
EddieShipmanAuthor Commented:
You know, on another thought, it may be ok to just generate daily statics.
I can run a cron job to do it and then I wouldn't have to worry.
I can put the files in one directory and then limit the bots to it.
0
 
SwafnilCommented:
You'll have to decide if you want to continue allowing robots to crawl so frequently (which will result in higher hardware costs, you'll need to upgrade hardware as soon as you'll reach the critical mass of bot accesses) or go for a caching mechanism with fewer updates to the inventory.
I would recommend to check your access logs and create statistical data about the frequency of accesses, combine it to performance data gathered parallely and also have a look if the load is on the database or webserver side.
In your case you are dealing with a near realtime data system with a high data volume so topics like clustering, database tweaking and performance monitoring could be worth reading!
0
 
SwafnilCommented:
Just had an idea this morning which could be worth a try:

because you only need a single column from each row, you can group_concat() your fields on the database side and return the result in just one line:

$query= "SELECT GROUP_CONCAT(`inventory_part_number`) FROM `linv_inventory` ORDER BY `inventory_part_number` LIMIT $offset, $rowsPerPage";

$res= mysql_query($query);
if ($res) {
  $row= mysql_fetch_row($res);
  // the values are comma separated, explode() them to get all part numbers
  $numbers= explode(',', $row[0]);
}

Open in new window

That'll cause one statement and merge operation on the database side, but will reduce the amount of resource accesses on the database connection and the amount of data being transferred between database and mysql client.

I wonder if this page is solely used by bots, why you are using tables? And how are these search results presented when you search for them on Google?
0
 
EddieShipmanAuthor Commented:
Good catch, Swafnil. I'll give that a try.
0
 
EddieShipmanAuthor Commented:
Gonna have to rethink how to create statics from the array, though.
0
 
EddieShipmanAuthor Commented:
How do I set the group_concat_max_len? It keeps coming up with only 1024 chars.
0
 
SwafnilCommented:
You have to set the group_concat_max_len before querying the database:

mysql_query("SET SESSION group_concat_max_len = 1048576"); // depending on max_allowed_packet
$query= "SELECT GROUP_CONCAT(`inventory_part_number`) FROM `linv_inventory` ORDER BY `inventory_part_number` LIMIT $offset, $rowsPerPage";

$res= mysql_query($query);

Open in new window


To set the maximum packet length, see:

http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_max_allowed_packet
0
 
EddieShipmanAuthor Commented:
Nah, that ain't gonna work either, it only sends 1024kb. my max_allowed_packet can't be modified.
0
 
SwafnilCommented:
Is your MySQL database hosted or are you in charge of it? If I got the documentation right, you can also set the max_allowed_packet setting system-wide, so you won't need to try to change the packet size inside the script (which didn't work if I got you right).
If you have no way to change the setting, you can work with subselects and group_concat to combine a set of inventory_part_numbers in one field to a maximum length of about 1 MB and then continue with the next row of combined inventory_part_numbers.
Another option would be to mysqldump() the numbers to a directory on a regular basis and use the dump files instead of direct database access. I'll investigate a bit more about best practices on the topic.
0
 
EddieShipmanAuthor Commented:
No need. I think I'll tweak what we have.
0
 
EddieShipmanAuthor Commented:
Am rethinking the entire thing.
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 9
  • 6
Tackle projects and never again get stuck behind a technical roadblock.
Join Now