Link to home
Start Free TrialLog in
Avatar of buasuwan
buasuwan

asked on

poor performance of serialize()

I get the problem with this function on PHP4.3.10 and FreeBSD4.5.
When I try to serialize a very large variable(about 10MB).  It takes 16 seconds of processing.
But I have no problem with Linux platform,  just 1 second for this variable.

Any idea to increase the speed of serialization? or any PHP code to use instead of this php function?

Thanks.
Avatar of Marcus Bointon
Marcus Bointon
Flag of France image

That's a lot of data - can you avoid serializing it at all? Can you save it into a separate file and maintain it yourself.
Avatar of buasuwan
buasuwan

ASKER

Could you please show me the php codes to do that?
We can assume this is an array, correct?

What about importing it to a database?

This is a big enough chunk of data it is warranted... even to another mysql server not local, but in the dc.
>We can assume this is an array, correct?
That's right. very huge array.

>What about importing it to a database?
I think storing it to files is better for my system. I avoid to use MySQL to store theses.



Hi buasuwan,

You can try and see if storing the data in $_SESSION is faster for you.  AFAIK, it's a different way of serialization, which may or may not be faster, but worth a try.

Nonetheless, it is better to try avoid serializing such a big amount of data, as others pointed out.  One way might be to store long strings in your huge array using resource ids and a file (or better, DB table) mapping resource ids to strings.  Presumably, even if you have a huge array, most of the space was actually taken up by long strings.  This way would save you much time in serialization.  This type of thing is routinely done in commercial softwares (for same and other reasons).

Cheers!
Find out if it's your iowait or processing. Is it the reading off the HDD, or the memory/proc time?

Perhaps store your array in memory. See if it gets significantly faster.

What's the iowait look like on this box?
(Is there 10 megs of memory free. Perhaps you're swapping.)
I think sessions use serialize internally anyway, so I don't think that's a win.
As inq123 said, It doesn't seem reasonable that this is something you should have to do every time, so it may be possible to design it in such a way that you don't need to do it. If you don't like MySQL, you could try a local DB, such as SQLite. What's in this 10Mb of data? You mentioned theses?
First of all, I store this data in session but my codes hit the limitation of 32MB. Then I have to compress this data by using serialize() and gzcompress().
Second, this data was retrieved from many tables in MySQL. Then I don't want to store it back to DB and would like reduce the DB processing.

Is it possible to write codes in PHP to do serialize() and get the better speed?
SOLUTION
Avatar of _GeG_
_GeG_

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
btw this works only if there are no classes in the array
> Second, this data was retrieved from many tables in MySQL.

So why not create a temporary table and store it all in there? It's exactly the kind of thing they're for.
i think a temporary table wouldn't help, because they are deleted when the connection is closed afaik. Also it seems that the speed of serialize() is the problem, not the speed of the storage device.
This is rather my point - don't serialize the data, serialize pointers to it. If temp table retention is really an issue, just use a real table instead (use a hash value in its name to prevent cross-session clashes), and drop it when you're finished. Chances are, the PHP script doesn't actually need the entire 10 Mb of data for every page (if at all), so however efficient you get serialization and compression working, there's some wasted effort going on, and MySQL is far more efficient at reading data than PHP is at serializing.
Sorry, I forgot to tell you that there are many classes in that array.

For GeG,
As you mention. So, I can't use your codes, right?

For Squinky,
I understand what you said. But storing it to the Table is the last choice of mine.

Surprisingly it seems to work for classes also, using a code construct that is new for me. I only tested it on PHP5.
don't forget to include the class definition before you include your storage file
I still don't quite believe that you need to get 10Mb of data for every single page hit. There must be a more efficient way - you've still not told us what's in this array other than 'objects'.
How about storing the 10 meg in memory?

And why compress it? Faster iowait?

yes, i agree with squinky, imo it only makes sense to have 10mb of data if you want to process it further. And in this case it is probably better left to the database to do the processing.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
From what I've read so far, it sounds like the first time a visitor hits the page, it goes through a large and complex setup procedure that results in a 10Mb array of 'stuff'. Each subsequent hit involves accessing this array, thus avoiding the setup overhead. However, the overhead of serializing/unserializing the array is killing the performance. I think it's entirely logical to keep this chunk in the database - then you don't have to serialize it at all, and the performance problem just goes away. What's more, the database will be far faster than PHP at handling this chunk of data anyway, so I'd expect a large performance improvement. If it happens that all these array items are class instances, which in turn contain large chunks of data, then just don't keep the data in the class - point to it in the database instead. It's reasonably easy to virtualize the storage of the data inside the class so it looks like it's really in there.
That's right, 10MB array is the result to be kept in the session for the next pages.

To GeG,
    I have to test this function var_export() first, I use many PHP versions(4.3.6, 4.3.8, 4.3.9, 4.3.10 for 4 webservers).

To Promethyl,
    If I do not compress it I will get the out of memory problem insteads of this.

To Squinky,
    How to convert array() to text without using serialize()? what function do you use?
To Promethyl,
    If I do not compress it I will get the out of memory problem insteads of this.

Probably the first hint that you're moving too much data for an average webpage hit.

Are you trying to take the load down on the database? Are you using this serialize method as a form of file based caching?

Are we even sure it's gzfile or serialize with the bad performance? Is it the IOWait time?

Michael
To Michael,
>Is it the IOWait time?
Sorry, Im not sure. How to check this?
Put your code (to test..) ni the test block. Put the gzfile command in there, try one iwth just file, and one iwth the serialize command in the block, and see which has the largest time. Report the execution times of the various blocks back here.

Also, top will give you oiwait, if you have shell access.


<?php
/**
 * Simple function to replicate PHP 5 behaviour
 */
function microtime_float()
{
   list($usec, $sec) = explode(" ", microtime());
   return ((float)$usec + (float)$sec);
}

$time_start = microtime_float();

// Do you test here...
// Do you test here...
// Do you test here...


$time_end = microtime_float();
$time = $time_end - $time_start;

echo "Did nothing in $time seconds\n";
?>
var_export() exists since php 4.2
> How to convert array() to text without using serialize()? what function do you use?

I don't. What I'm suggesting is that you make your array smaller (so the serialize overhead will effectively disappear) by taking large data items out of it and leaving them in the database. You _still_ have not told us what's in your array!
To Promethyl,
   I already did a test for serialize() with my sampling data before.
5.46797800 second(s) when only use serialize() and 5.65010715 second(s) when + gzcompress().

To GeG,
   var_export takes 2.82100606 second(s) for this data. it's better like you mention.

To Squinky,
   If I make the array smaller like your suggession to reduce the overhead of it. If it works, I still has no need to store this information to database.
>You _still_ have not told us what's in your array!
I'm so sorry. It contains the runtime data and some objects.

To All,
   from all your suggesions. I have the conclusions.
1.Reduce the size of that array. I think 80% of array is the result of runtime processing and just only static data will be removed.
2.Call serialize() with the smaller array. I have to code my_own_serialize() to do this. or anyone codes this for me, I will give 2000 points.
3.If the 2nd solution does work and get the better performance, storing or not storing to dabase is not the main point.

Thanks,
Any comments? I will accept the answers within 30 minutes.






> If it works, I still has no need to store this information to database.

It's not really a matter of whether you _need_ to store this in the database, it's whether it's a good solution to your performance problem. If putting it in the database could give you a 100x speedup, what are your other objections to putting it there?

> It contains the runtime data and some objects.

I meant - what kind of data. 10mb is a lot of data to build one web page from - you're obviously not going to deliver it all - so why do you need so much local data? Does it contain large bodies of text? PDFs? Images?
>what are your other objections to putting it there?
I get alot of table locking problem in MySQL server, that why I avoid to store the runtime result into database.

>Does it contain large bodies of text? PDFs? Images?
No, It contains the array of runtime calculated data, rules, conditions and mores for 200+ products at that time he/she submits to check. And the result is stored in the session for viewing by 20 products/page.
BTW.
It is not all local data, the main sources come from the Suppliers in runtime(via XML).
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
To Squinky,
>You pull data from them live with every initialisation hit? No caching?
Yes,  this live data is changed every second. Caching will not help for this case.

>So you're storing data for >200 items in your session, but only ever viewing 20 at once, ...
I understand what you said.  I think about it before.

To inq123,
I store this data in the session.

Thanks for all the comments.
> I thought that serialize call affects methods only, not class data.

No, quite the reverse. Serialization is only applied to data. If you look in a session file at a serialized object, you'll find that all it contains is a note of what class it is and the data stored in it. Class methods are never stored because they're part of the class definition, not the instantiated object. This is why you have to have your class definitions loaded before you session_start or unserialize.

I really think these points should have been split. While I've been banging on about the overall design, everyone else has had good input too.