• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1361
  • Last Modified:

poor performance of serialize()

I get the problem with this function on PHP4.3.10 and FreeBSD4.5.
When I try to serialize a very large variable(about 10MB).  It takes 16 seconds of processing.
But I have no problem with Linux platform,  just 1 second for this variable.

Any idea to increase the speed of serialization? or any PHP code to use instead of this php function?

Thanks.
0
buasuwan
Asked:
buasuwan
  • 10
  • 10
  • 7
  • +2
4 Solutions
 
Marcus BointonCommented:
That's a lot of data - can you avoid serializing it at all? Can you save it into a separate file and maintain it yourself.
0
 
buasuwanAuthor Commented:
Could you please show me the php codes to do that?
0
 
PromethylCommented:
We can assume this is an array, correct?

What about importing it to a database?

This is a big enough chunk of data it is warranted... even to another mysql server not local, but in the dc.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
buasuwanAuthor Commented:
>We can assume this is an array, correct?
That's right. very huge array.

>What about importing it to a database?
I think storing it to files is better for my system. I avoid to use MySQL to store theses.



0
 
inq123Commented:
Hi buasuwan,

You can try and see if storing the data in $_SESSION is faster for you.  AFAIK, it's a different way of serialization, which may or may not be faster, but worth a try.

Nonetheless, it is better to try avoid serializing such a big amount of data, as others pointed out.  One way might be to store long strings in your huge array using resource ids and a file (or better, DB table) mapping resource ids to strings.  Presumably, even if you have a huge array, most of the space was actually taken up by long strings.  This way would save you much time in serialization.  This type of thing is routinely done in commercial softwares (for same and other reasons).

Cheers!
0
 
PromethylCommented:
Find out if it's your iowait or processing. Is it the reading off the HDD, or the memory/proc time?

Perhaps store your array in memory. See if it gets significantly faster.

What's the iowait look like on this box?
0
 
PromethylCommented:
(Is there 10 megs of memory free. Perhaps you're swapping.)
0
 
Marcus BointonCommented:
I think sessions use serialize internally anyway, so I don't think that's a win.
As inq123 said, It doesn't seem reasonable that this is something you should have to do every time, so it may be possible to design it in such a way that you don't need to do it. If you don't like MySQL, you could try a local DB, such as SQLite. What's in this 10Mb of data? You mentioned theses?
0
 
buasuwanAuthor Commented:
First of all, I store this data in session but my codes hit the limitation of 32MB. Then I have to compress this data by using serialize() and gzcompress().
Second, this data was retrieved from many tables in MySQL. Then I don't want to store it back to DB and would like reduce the DB processing.

Is it possible to write codes in PHP to do serialize() and get the better speed?
0
 
_GeG_Commented:
there have been performance issues with serialize/unserialize in the last PHP versions.

try if this code is faster:
i think your code look like this at the moment:
<?php
$a=array(....whatever...);
$fp=fopen('store.file', 'w'); // or sql
fwrite($fp,serialize($a));
fclose($fp);
?>

next page:
<?php
$a=unserialize(file_get_contents('store.file'));
// process $a
//...
?>

maybe this is quicker:
<?php
$a=array(....whatever...);
ob_start();
var_export($a);
$b='<?php $a='.ob_get_clean().'; ?>';
$fp=fopen('store.file', 'w'); // or sql
fwrite($fp,serialize($a));
fclose($fp);
?>

next page:
<?php
require('store.file');
// process $a
//...
?>

0
 
_GeG_Commented:
btw this works only if there are no classes in the array
0
 
Marcus BointonCommented:
> Second, this data was retrieved from many tables in MySQL.

So why not create a temporary table and store it all in there? It's exactly the kind of thing they're for.
0
 
_GeG_Commented:
i think a temporary table wouldn't help, because they are deleted when the connection is closed afaik. Also it seems that the speed of serialize() is the problem, not the speed of the storage device.
0
 
Marcus BointonCommented:
This is rather my point - don't serialize the data, serialize pointers to it. If temp table retention is really an issue, just use a real table instead (use a hash value in its name to prevent cross-session clashes), and drop it when you're finished. Chances are, the PHP script doesn't actually need the entire 10 Mb of data for every page (if at all), so however efficient you get serialization and compression working, there's some wasted effort going on, and MySQL is far more efficient at reading data than PHP is at serializing.
0
 
buasuwanAuthor Commented:
Sorry, I forgot to tell you that there are many classes in that array.

For GeG,
As you mention. So, I can't use your codes, right?

For Squinky,
I understand what you said. But storing it to the Table is the last choice of mine.

0
 
_GeG_Commented:
Surprisingly it seems to work for classes also, using a code construct that is new for me. I only tested it on PHP5.
0
 
_GeG_Commented:
don't forget to include the class definition before you include your storage file
0
 
Marcus BointonCommented:
I still don't quite believe that you need to get 10Mb of data for every single page hit. There must be a more efficient way - you've still not told us what's in this array other than 'objects'.
0
 
PromethylCommented:
How about storing the 10 meg in memory?

And why compress it? Faster iowait?

0
 
_GeG_Commented:
yes, i agree with squinky, imo it only makes sense to have 10mb of data if you want to process it further. And in this case it is probably better left to the database to do the processing.
0
 
PromethylCommented:
Is all the data required for each page hit? Can you break it up to several object/serialized pairs... for specific tasks?
0
 
Marcus BointonCommented:
From what I've read so far, it sounds like the first time a visitor hits the page, it goes through a large and complex setup procedure that results in a 10Mb array of 'stuff'. Each subsequent hit involves accessing this array, thus avoiding the setup overhead. However, the overhead of serializing/unserializing the array is killing the performance. I think it's entirely logical to keep this chunk in the database - then you don't have to serialize it at all, and the performance problem just goes away. What's more, the database will be far faster than PHP at handling this chunk of data anyway, so I'd expect a large performance improvement. If it happens that all these array items are class instances, which in turn contain large chunks of data, then just don't keep the data in the class - point to it in the database instead. It's reasonably easy to virtualize the storage of the data inside the class so it looks like it's really in there.
0
 
buasuwanAuthor Commented:
That's right, 10MB array is the result to be kept in the session for the next pages.

To GeG,
    I have to test this function var_export() first, I use many PHP versions(4.3.6, 4.3.8, 4.3.9, 4.3.10 for 4 webservers).

To Promethyl,
    If I do not compress it I will get the out of memory problem insteads of this.

To Squinky,
    How to convert array() to text without using serialize()? what function do you use?
0
 
PromethylCommented:
To Promethyl,
    If I do not compress it I will get the out of memory problem insteads of this.

Probably the first hint that you're moving too much data for an average webpage hit.

Are you trying to take the load down on the database? Are you using this serialize method as a form of file based caching?

Are we even sure it's gzfile or serialize with the bad performance? Is it the IOWait time?

Michael
0
 
buasuwanAuthor Commented:
To Michael,
>Is it the IOWait time?
Sorry, Im not sure. How to check this?
0
 
PromethylCommented:
Put your code (to test..) ni the test block. Put the gzfile command in there, try one iwth just file, and one iwth the serialize command in the block, and see which has the largest time. Report the execution times of the various blocks back here.

Also, top will give you oiwait, if you have shell access.


<?php
/**
 * Simple function to replicate PHP 5 behaviour
 */
function microtime_float()
{
   list($usec, $sec) = explode(" ", microtime());
   return ((float)$usec + (float)$sec);
}

$time_start = microtime_float();

// Do you test here...
// Do you test here...
// Do you test here...


$time_end = microtime_float();
$time = $time_end - $time_start;

echo "Did nothing in $time seconds\n";
?>
0
 
_GeG_Commented:
var_export() exists since php 4.2
0
 
Marcus BointonCommented:
> How to convert array() to text without using serialize()? what function do you use?

I don't. What I'm suggesting is that you make your array smaller (so the serialize overhead will effectively disappear) by taking large data items out of it and leaving them in the database. You _still_ have not told us what's in your array!
0
 
buasuwanAuthor Commented:
To Promethyl,
   I already did a test for serialize() with my sampling data before.
5.46797800 second(s) when only use serialize() and 5.65010715 second(s) when + gzcompress().

To GeG,
   var_export takes 2.82100606 second(s) for this data. it's better like you mention.

To Squinky,
   If I make the array smaller like your suggession to reduce the overhead of it. If it works, I still has no need to store this information to database.
>You _still_ have not told us what's in your array!
I'm so sorry. It contains the runtime data and some objects.

To All,
   from all your suggesions. I have the conclusions.
1.Reduce the size of that array. I think 80% of array is the result of runtime processing and just only static data will be removed.
2.Call serialize() with the smaller array. I have to code my_own_serialize() to do this. or anyone codes this for me, I will give 2000 points.
3.If the 2nd solution does work and get the better performance, storing or not storing to dabase is not the main point.

Thanks,
Any comments? I will accept the answers within 30 minutes.






0
 
Marcus BointonCommented:
> If it works, I still has no need to store this information to database.

It's not really a matter of whether you _need_ to store this in the database, it's whether it's a good solution to your performance problem. If putting it in the database could give you a 100x speedup, what are your other objections to putting it there?

> It contains the runtime data and some objects.

I meant - what kind of data. 10mb is a lot of data to build one web page from - you're obviously not going to deliver it all - so why do you need so much local data? Does it contain large bodies of text? PDFs? Images?
0
 
buasuwanAuthor Commented:
>what are your other objections to putting it there?
I get alot of table locking problem in MySQL server, that why I avoid to store the runtime result into database.

>Does it contain large bodies of text? PDFs? Images?
No, It contains the array of runtime calculated data, rules, conditions and mores for 200+ products at that time he/she submits to check. And the result is stored in the session for viewing by 20 products/page.
0
 
buasuwanAuthor Commented:
BTW.
It is not all local data, the main sources come from the Suppliers in runtime(via XML).
0
 
Marcus BointonCommented:
> I get alot of table locking problem in MySQL server, that why I avoid to store the runtime result into database.

It sounds like you're not using transactions which would make most of that just go away. It sounds like you'd be doing mostly reading anyway?

> It is not all local data, the main sources come from the Suppliers in runtime(via XML).

You pull data from them live with every initialisation hit? No caching?

So you're storing data for >200 items in your session, but only ever viewing 20 at once, and serializing and unserializing the whole lot every time you only want to look at 20 items, so you're always doing at least 10 times more work than is necessary? It's all very well optimizing serialization for marginal improvements, but there's got to be a more efficient design.
0
 
inq123Commented:
Gee. this thread's getting so long that I couldn't read it in all.  Anyway, Squinky, AFAIK, $_SESSION does not use serialize() internally, so buasuwan, it's worth a try. _GeG_, I thought that serialize call affects methods only, not class data.  Class data are always serializable AFAIK, although I'm not too sure about lower versions.
0
 
buasuwanAuthor Commented:
To Squinky,
>You pull data from them live with every initialisation hit? No caching?
Yes,  this live data is changed every second. Caching will not help for this case.

>So you're storing data for >200 items in your session, but only ever viewing 20 at once, ...
I understand what you said.  I think about it before.

To inq123,
I store this data in the session.

Thanks for all the comments.
0
 
Marcus BointonCommented:
> I thought that serialize call affects methods only, not class data.

No, quite the reverse. Serialization is only applied to data. If you look in a session file at a serialized object, you'll find that all it contains is a note of what class it is and the data stored in it. Class methods are never stored because they're part of the class definition, not the instantiated object. This is why you have to have your class definitions loaded before you session_start or unserialize.

I really think these points should have been split. While I've been banging on about the overall design, everyone else has had good input too.
0

Featured Post

How to Use the Help Bell

Need to boost the visibility of your question for solutions? Use the Experts Exchange Help Bell to confirm priority levels and contact subject-matter experts for question attention.  Check out this how-to article for more information.

  • 10
  • 10
  • 7
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now