Avatar of Steve Bink
Steve BinkFlag for United States of America

asked on 

python performance...am I missing something?

The background for the specific test case is here: https://www.experts-exchange.com/questions/27781765/python-type-conversion-exhibiting-unexpected-behavior.html.  

I ran some benchmarks to find the best performer among several methods, and was getting back some astounding results.  Running the same code in PHP yields much better performance.  I've previously read that PHP-vs-Python is a bit of a mixed bag, with Python performing better in data handling, and PHP performing better with I/O.  These results are showing quite a different picture, and the Python code is not even executing the type conversion.  Can anyone point out where I've gone wrong?

For Python:
$> python
Python 2.6.6 (r266:84292, Sep 12 2011, 14:03:14)
[GCC 4.4.5 20110214 (Red Hat 4.4.5-6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> def tryit():
...   tin=time.time()
...   a="97757589545391667"
...   for x in range(1,10000001):
...     try:
...       b=a
...     except:
...       pass
...   tout=time.time()
...   print tout-tin
...
>>> tryit()
1.29326200485
>>> tryit()
0.987454891205
>>> tryit()
0.976771116257
>>> tryit()
0.977509975433
>>> tryit()
1.36146092415

Open in new window


For PHP:
$> php -v
PHP 5.3.3 (cli) (built: May  3 2012 17:33:17)
Copyright (c) 1997-2010 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies


$> php -r '$in=microtime();$a = "97757589545391667";for ($x=0;$x++;$x<10000001) { $b=(int)$a;} $out=microtime();echo number_format($out-$in,8);'
0.00006000

$> php -r '$in=microtime();$a = "97757589545391667";for ($x=0;$x++;$x<10000001) { $b=(int)$a;} $out=microtime();echo number_format($out-$in,8);'
0.00006700

$> php -r '$in=microtime();$a = "97757589545391667";for ($x=0;$x++;$x<10000001) { $b=(int)$a;} $out=microtime();echo number_format($out-$in,8);'
0.00008900

$> php -r '$in=microtime();$a = "97757589545391667";for ($x=0;$x++;$x<10000001) { $b=(int)$a;} $out=microtime();echo number_format($out-$in,8);'
0.00006000

$> php -r '$in=microtime();$a = "97757589545391667";for ($x=0;$x++;$x<10000001) { $b=(int)$a;} $out=microtime();echo number_format($out-$in,8);'
0.00005900

$> php -r '$in=microtime();$a = "97757589545391667";for ($x=0;$x++;$x<10000001) { $b=(int)$a;} $out=microtime();echo number_format($out-$in,8);'
0.00006000

Open in new window

PythonPHP

Avatar of undefined
Last Comment
Steve Bink
ASKER CERTIFIED SOLUTION
Avatar of gelonida
gelonida
Flag of France image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of pepr
pepr

I suggest to use the same body of the for loop like
    for x in xrange(1,10000001):
        b = int(a)

Open in new window

and to run the code not in the interactive mode. (I did not test but it could slow down the execution.)

Also, integer type may be implemented differently in Python as the int type has almost no practical limit in the size of the number.  I do not know how the Perl int is implemented.
Avatar of Steve Bink
Steve Bink
Flag of United States of America image

ASKER

Here's the Python test using xrange() instead of range():

$> python
Python 2.6.6 (r266:84292, Sep 12 2011, 14:03:14)
[GCC 4.4.5 20110214 (Red Hat 4.4.5-6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> def tryit():
...   tin=time.time()
...   a="97757589545391667"
...   for x in xrange(1,10000001):
...     try:
...       b=a
...     except:
...       pass
...   tout=time.time()
...   print tout-tin
...
>>> tryit()
0.629939079285
>>> tryit()
0.681260108948
>>> tryit()
0.61185503006
>>> tryit()
0.590117931366
>>> tryit()
0.5790579319
>>> tryit()
0.570842981339
>>> tryit()
0.597903966904

Open in new window


While the results are better than range(), I still consider the performance embarrassing.  Running it outside of the interactive console was even worse, probably because the initialization overhead is now included in the timer:

$> cat tryit.py
import time
tin=time.time()
a="97757589545391667"
for x in xrange(1,10000001):
  try:
    b=a
  except:
    pass
tout=time.time()
print tout-tin

$> python tryit.py
1.59175992012

$> python tryit.py
1.33616805077

$> python tryit.py
1.46257305145

$> python tryit.py
1.36383104324

$> python tryit.py
1.40599513054

Open in new window


I'll be running more benchmarks with various portions of code as I move forward, but this is very disconcerting.  If this represents actual performance in something as simple as variable assignment, much less including a type conversion, there is no way I can justify continuing to endorse Python for future development.

My only other thought is that there must be something I'm missing.  There just can't be this much of a community centered around a language with this kind of performance profile.
SOLUTION
Avatar of pepr
pepr

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
Avatar of gelonida
gelonida
Flag of France image

pepr hit the nail of the head.

You messed up the 'for' statement in PHP.

Just try with:
 php -r '$in=microtime();$a = "97757589545391667";for ($x=0;$x<10000001;$x++) { $b=(int)$a;} $out=microtime();echo number_format($out-$in,8);'


and you'll get slightly different numbers.

Please note as well, that a really smart compiler  might detect, that
$b=(int)$a;


PHP is  still faster though.
SOLUTION
Avatar of gelonida
gelonida
Flag of France image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
Avatar of Steve Bink
Steve Bink
Flag of United States of America image

ASKER

>>> I get occasionally even negative numbers as output. This sounds rather improbable.

LOL!  I concur!

OK, so I've revisited the test suite.  My apologies for the bad for() construct...the first tests were done off-the-cuff, and I didn't check the code too closely.  It has been corrected, and microtime(true) has been used to standardize the PHP reporting.  I've also changed the test to collect an average over 100 iterations.  I then ran the test a dozen or so times for each test case, and selected the high/low of each set for reporting.  

For Python, I limited the test to the interactive console because a) it provides better performance, and b) the same benchmarks on mod_python show greatest similarity with console cases.

The only other caveat is that all of these tests were run on a live web server under its usual load.  Besides increasing the individual test times, this also increases the stddev.  During my collection of statistics, I threw out any immediately obvious high-end outliers in favor of an additional cycle.

The PHP code, run from bash command line:
php -r '$a="5";$totaltime=0;$totalcount=0;for ($z=0;$z<100;$z++) { $totalcount++; $timein=microtime(true); for ($x=0;$x<1000000;$x++) { <<<command>>> } $timeout=microtime(true); $totaltime+=($timeout-$timein); } echo "count:$totalcount, time:$totaltime, avg:",$totaltime/$totalcount;'

Open in new window

The Python code, run from console by calling tryit():
def tryit():
  import time
  a="5"
  totaltime=0
  totalcount=0
  for z in xrange(0,100):
    totalcount=totalcount+1
    timein=time.time()
    for x in xrange(0,1000000):
      <<<command>>>
    timeout=time.time()
    totaltime=totaltime+(timeout-timein)
  print "count:{s}, time:{t}, avg:{u}".format(s=totalcount,t=totaltime,u=totaltime/totalcount)

Open in new window

The test cases are loop iteration, simple assignment, assignment with type conversion, and (for python only) assignment with type conversion using try/except.  In each test case, "<<<command>>>" was replaced with the appropriate command for the test.  For the exception test case, the command text was:
try:
  b=int(a)
except:
  pass

Open in new window

The results as average duration in seconds of a single 1000000-iteration loop:

Command                PHP (L)        PHP (H)        Python (L)       Python (H)
'' / pass                      0.04447       0.05967       0.02393           0.03052
b=a                            0.05744       0.06737       0.03471           0.03693
b=(int)a / b=int(a)   0.11619       0.12020       0.58733           0.60940
with try/except                                                  0.69335           0.72633

The first two cases are what I had come to expect from my previous reading - Python is better at raw processing.  The third case is what surprised me.  Not only does PHP beat Python, but Python shows it absolutely sucks at type conversion - ~17x slower than non-conversion, and ~5x slower than PHP.  I think this highlights the source of my question.

The try/except test was not very surprising, considering pepr's (correct!) assertion that the construct is costly.  I am a little dismayed that it adds so much to the cycle, considering that Python is built around the concept for error-handling.  Even with just "try:pass", a test cycle reported.04377/.04578 for low/high.

gelonida mentioned that some languages may store dual representations of numeric values.  I'd like to see some documentation on this - the idea is new to me - but one more test shows there is something extra going on in Python when a is an integer value.  The PHP results do not look much different.  This test is based on simple assignment:
a as integer           0.05742       0.06275       0.03783        0.04245

Open in new window

When taking a substring:
a="abcdefg"            0.33876       0.38429       0.10930        0.11866

Open in new window

For PHP, the command was "$b=substr($a,3,3)".  For Python, "b=a[3,6]".  PHP's dismal performance highlights the difference between a PHP function call and a Python slice.  Again, not unexpected, and one of the things I really like about Python.  

Time to go find more benchmarks, I suppose.  There are now a whole host of things I need to about Python's performance..  If anyone has recommended reading for me, I would surely appreciate it.
Avatar of Steve Bink
Steve Bink
Flag of United States of America image

ASKER

In case anyone is curious, I went back and ran the same Python tests using mod_python.  The results were generally equivalent to the console, with only a slight delay.  Good job, mod_python!
Avatar of gelonida
gelonida
Flag of France image

@routinet.

Unfortunately I have no idea why Python's integer handling is that much slower and whether there would be any tricks to fix it.
Unfortunately I don't now any more about Python internals and I'm afraid I can't help you much more. I will still follow this question though and I'll definitely come back in case I have another idea.

Concerning the mentioning of storing a numerical and a string representation for a variable in certain programming languages:

I do not think, that this is what's going on with PHP, at least I didn't find anything with a quick search, that would support this theory and some tiny profilign experimetns with some code variateins seem also only to indicate, that somehow PHP is more efficient with integer conversions. (perhaps because as pepr suggested because python has some overhead to handle intergers larger than 64 bits)


Now just for your info (won't help you with your question though):
If I remember correctly  I read this some years ago in some article about Perl. I didn't find any good link about this though.

Following article seems to indicate, that Perl variables have a NV description (numerical value) and a PV (Pointer value??) representation.
http://modperlbook.org/html/10-1-2-2-Numerical-versus-string-access-to-variables.html
Avatar of pepr
pepr

On integers... As far as I know, PHP uses fixed memory size for the integers (or 32 bits or 64 bits based on the target platform). This way it probably uses some standard ways (lower-level, highly optimized) for converting a string to a number.

Python, on the other hand, does not limit the size of the integer. This way, conversion may be more specific and more complex. Possibly, you can try to profile working with integers (not conversions but operations).  I would expect similar timing for integers that fits into 32 or 64 bits respectively in both PHP and Python.

For the Python web applications, WSGI is now prefered over mod_python. It is said to be faster, but I am not expert in that. Just for information.
Avatar of gelonida
gelonida
Flag of France image

Not sure whether mod_wsgi is faster or not. But as far as I know mod_python was already marked as obsolete about two years ago. (refer for example to http://stackoverflow.com/questions/3319545/mod-wsgi-mod-python-or-just-cgi )
mod_wsgi is rather different to mod_python.

So independent of performance and in tne the long run for new scripts it might make sense to switch to mod_wsgi
(wsgi apps can also be served by other servers than apache if ever required)

Please note, that many people don't code pure wsgi, but use some 'higher' level libraries  or frame works ( werkzeug / django, . . .  ) to abstract away at least some aspects of wsgi.
Avatar of Steve Bink
Steve Bink
Flag of United States of America image

ASKER

While I understand mod_python is old hat, there is no opportunity in the near future to migrate to mod_wsgi.  So far, I'm not having any issues I can directly attribute to mod_python, so it is not a priority.  We are migrating to a new server soon, though, so I'll at least have the opportunity to have it installed parallel for playing/development along with Python 3.
PHP
PHP

PHP is a widely-used server-side scripting language especially suited for web development, powering tens of millions of sites from Facebook to personal WordPress blogs. PHP is often paired with the MySQL relational database, but includes support for most other mainstream databases. By utilizing different Server APIs, PHP can work on many different web servers as a server-side scripting language.

125K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo