Solved

Helping speed up script

Posted on 2011-03-21
16
201 Views
Last Modified: 2012-05-11
I'm trying to speed up a script that we use to let bots crawl our site. This is the only page that the bots are allowed to access.

We have over 2 million records in our database and we want one column from the table `linv_inventory` shown in an HTML table paged in 10,000 row increments.
  // counting the offset 
  // $rowsPerPage = 10000
  $offset = ($pageNum - 1) * $rowsPerPage;
  $query = "SELECT `inventory_part_number` FROM `linv_inventory` ORDER BY `inventory_part_number` LIMIT $offset, $rowsPerPage";

  $result = mysql_query($query) or die('Error, query failed');

Open in new window

Which of these would be faster? The timing results that I have gotten have really been inconclusive.

In the first, we query the first 10,000 records, extract them into an array, then
iterate through the array in a foreach.
  $element = array(true);
  while($row = mysql_fetch_array($result))
    $element[] = trim($row[0]);
  
  echo "<table width=\"100%\" cellspacing=\"2\" cellpadding=\"3\">\n";
  $i = 0;
  foreach($element as $cell) {
    echo '<td>' . $cell . '</td>';
    if($i%5==0) {
      echo "  </tr>\n";
      echo "  <tr>\n";
    }
    $i++;
  } // foreach($element as $cell)
  echo "  </tr>\n";
  echo "</table>\n";

Open in new window


In the second, we just iterate over the result set using mysql_fetch_array().
  while($row = mysql_fetch_array($result)) {
    echo "  <tr>\n";
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';$row = mysql_fetch_array($result);
    echo '<td>' . $row['inventory_part_number'] . '</td>';
    echo "  </tr>\n";
  } // while($row = mysql_fetch_array($result))

Open in new window


Are here any other means that I may use to make this faster as this script has been causing problems due to the number of bots that hit it simultaneously and bringing my database down to a crawl.

0
Comment
Question by:EddieShipman
  • 9
  • 6
16 Comments
 
LVL 7

Accepted Solution

by:
Swafnil earned 425 total points
ID: 35182453
Both statements are not ideal, the first given example simply wastes memory through creating an unneeded array, the second doesn't make sense at all because the repeated mysql_fetch_array() calls inside of the while loop don't increase performance and even create unnecessary notices causing error log entries depending on your webserver settings.

Here are some hints on how to increase performance:
1. create an index for the used query on MySQL-side:
CREATE INDEX ix_linv_part ON linv_inventory (inventory_part_number);

Open in new window


2. Use mysql_fetch_row() instead of mysql_fetch_array()
(mysql_fetch_array() offers some advantages - which you don't need in your scenario - but is a bit slower)
  // $rowsPerPage = 10000
  $offset = ($pageNum - 1) * $rowsPerPage;
  $query = "SELECT `inventory_part_number` FROM `linv_inventory` ORDER BY `inventory_part_number` LIMIT $offset, $rowsPerPage";

  echo '<table width="100%" cellspacing="2" cellpadding="3">'."\n";
  echo '<tr>'."\n";
  $rows = 0;

  $result = mysql_query($query) or die('Error, query failed');
  while($row = mysql_fetch_row($result))
  
    echo '<td>' . $row[0] . '</td>';
    if($rows % 5 == 0) {
      echo '  </tr>\n';
      echo '  <tr>\n';
    }
    $rows++;
  }
  echo "  </tr>\n";
  echo "</table>\n";

Open in new window


Some more things to consider for the last pieces of performance gains:
* use single quotes wherever possible, i.e. when outputting through print(); simple quote string don't get parsed and are directly output
* deactivate output buffering to reduce load on the webserver side
* consider smaller page sizes because large pages cause longer wait periods through document generation and transmission
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 35183086
1. Already have index on that column.
2. Yes, I forgot about using that instead.

Output buffering, hmmm, never thought of that, either.
Are you saying that I should lower my page size value from 10000 to, say, 2000?

AFA the quotes go, I inherited this code and didn't change much of it. It had been working flawlessly for months and all of a sudden the bots
started hitting it more frequently and, in the case of Google, with more than one bot at a time, and it was bringing my server to it's knees.
0
 
LVL 1

Assisted Solution

by:shefen
shefen earned 75 total points
ID: 35183132
How often does the data change?
Have you considered periodically generating static pages for the bots to read?
0
 
LVL 7

Expert Comment

by:Swafnil
ID: 35183280
As shefen state, in case your item ids do not change too often, consider generating static pages for bots and restrict them to the generated pages. Another option would be to restrict bots from crawling too frequently, or using caching mechanisms like memcache.
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 35183345
Customers can upload inventory files at will so it can literally change within seconds.
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 35183358
You know, on another thought, it may be ok to just generate daily statics.
I can run a cron job to do it and then I wouldn't have to worry.
I can put the files in one directory and then limit the bots to it.
0
 
LVL 7

Expert Comment

by:Swafnil
ID: 35183499
You'll have to decide if you want to continue allowing robots to crawl so frequently (which will result in higher hardware costs, you'll need to upgrade hardware as soon as you'll reach the critical mass of bot accesses) or go for a caching mechanism with fewer updates to the inventory.
I would recommend to check your access logs and create statistical data about the frequency of accesses, combine it to performance data gathered parallely and also have a look if the load is on the database or webserver side.
In your case you are dealing with a near realtime data system with a high data volume so topics like clustering, database tweaking and performance monitoring could be worth reading!
0
 
LVL 7

Expert Comment

by:Swafnil
ID: 35186968
Just had an idea this morning which could be worth a try:

because you only need a single column from each row, you can group_concat() your fields on the database side and return the result in just one line:

$query= "SELECT GROUP_CONCAT(`inventory_part_number`) FROM `linv_inventory` ORDER BY `inventory_part_number` LIMIT $offset, $rowsPerPage";

$res= mysql_query($query);
if ($res) {
  $row= mysql_fetch_row($res);
  // the values are comma separated, explode() them to get all part numbers
  $numbers= explode(',', $row[0]);
}

Open in new window

That'll cause one statement and merge operation on the database side, but will reduce the amount of resource accesses on the database connection and the amount of data being transferred between database and mysql client.

I wonder if this page is solely used by bots, why you are using tables? And how are these search results presented when you search for them on Google?
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 26

Author Comment

by:EddieShipman
ID: 35189826
Good catch, Swafnil. I'll give that a try.
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 35189839
Gonna have to rethink how to create statics from the array, though.
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 35191505
How do I set the group_concat_max_len? It keeps coming up with only 1024 chars.
0
 
LVL 7

Expert Comment

by:Swafnil
ID: 35191910
You have to set the group_concat_max_len before querying the database:

mysql_query("SET SESSION group_concat_max_len = 1048576"); // depending on max_allowed_packet
$query= "SELECT GROUP_CONCAT(`inventory_part_number`) FROM `linv_inventory` ORDER BY `inventory_part_number` LIMIT $offset, $rowsPerPage";

$res= mysql_query($query);

Open in new window


To set the maximum packet length, see:

http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_max_allowed_packet
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 35192291
Nah, that ain't gonna work either, it only sends 1024kb. my max_allowed_packet can't be modified.
0
 
LVL 7

Expert Comment

by:Swafnil
ID: 35192845
Is your MySQL database hosted or are you in charge of it? If I got the documentation right, you can also set the max_allowed_packet setting system-wide, so you won't need to try to change the packet size inside the script (which didn't work if I got you right).
If you have no way to change the setting, you can work with subselects and group_concat to combine a set of inventory_part_numbers in one field to a maximum length of about 1 MB and then continue with the next row of combined inventory_part_numbers.
Another option would be to mysqldump() the numbers to a directory on a regular basis and use the dump files instead of direct database access. I'll investigate a bit more about best practices on the topic.
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 35193390
No need. I think I'll tweak what we have.
0
 
LVL 26

Author Closing Comment

by:EddieShipman
ID: 35198920
Am rethinking the entire thing.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Both Easy and Powerful How easy is PHP? http://lmgtfy.com?q=how+easy+is+php (http://lmgtfy.com?q=how+easy+is+php)  Very easy.  It has been described as "a programming language even my grandmother can use." How powerful is PHP?  http://en.wikiped…
Part of the Global Positioning System A geocode (https://developers.google.com/maps/documentation/geocoding/) is the major subset of a GPS coordinate (http://en.wikipedia.org/wiki/Global_Positioning_System), the other parts being the altitude and t…
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now