Link to home
Start Free TrialLog in
Avatar of Steve Bink
Steve BinkFlag for United States of America

asked on

python type conversion exhibiting unexpected behavior

Can someone please explain why this is?  And more importantly, how to correct it...

$> python
Python 2.6.6 (r266:84292, Sep 12 2011, 14:03:14)
[GCC 4.4.5 20110214 (Red Hat 4.4.5-6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a='97757589545391667'
>>> a
'97757589545391667'
>>> int(a)
97757589545391667
>>> float(a)
97757589545391664.0

Open in new window

Avatar of kevinhigg
kevinhigg

Good day.  So the question is why the float doesn't match the int?  I am not a Python expert, but it seems that this is described in several sources.  Very interesting reading; enjoy!

http://docs.python.org/tutorial/floatingpoint.html
http://docs.python.org/library/stdtypes.html
http://stackoverflow.com/questions/5997027/python-rounding-error-with-float-numbers
Avatar of Steve Bink

ASKER

I understand about precision in floating points, but this example is particularly irksome.  Perhaps a little more explanation will help towards a solution.

I inherited a ~7-year-old code base.  To generate the profile id of a user, they implemented some voodoo with time.time() and random numbers to arrive at some large integer value.  This integer value is stored in a text field in MySQL (text...not even varchar...but I digress).  Sometime before that value gets to my new code, it *may* have received a float-style notation (i.e., adding '.0' to the end).  My code must ensure the value is an integer.  So I have a fairly simple function:

def clean_integer(x):
    if type(x) is int: return x
    try:
        y=int(float(x))
    except:
        y=0
    return y

Open in new window


I certainly expect precision errors in the fractional portion, but I considered it moot since I would ideally be discarding it without changing the integer portion.  But this particular number will not co-operate.  So far, it is the only one I've found, though I'm sure there will be others.

Python seems to lack a float-to-int conversion.  How can I address this issue?
SOLUTION
Avatar of gelonida
gelonida
Flag of France image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Here three ways (with different results) to detect an int (depending on what you define an int

numstrings = [ 
    '7757589545391667', 
    '7757589545391667.0',  
    '7757589545391667.3',  
] 

# detect with decimal.Decimal, whether a huge number is an integer
from decimal import Decimal
one = Decimal('1')
for numstr in numstrings:
    as_decimal = Decimal(numstr)
    if as_decimal == as_decimal.quantize(one):
        print("%s is an integer" % numstr)

# just check for the presence of a decimal point
for numstr in numstrings:
    if '.' in numstr:
        print("%s has a '.', thus it is a float" % numstr)


# or use a regexp (only nonzero digits after the decimal point)
import re
my_re = re.compile(r'\d+(\.0*|)$')
for numstr in numstrings:
    if my_re.match(numstr):
        print("%s is an integer" % numstr)

Open in new window

ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Normally every programming language, that is using IEEE doubles to represent floating point values should have the same problem if you try to parse the string
"7757589545391667" into a float (with double precision)
The number has too many digits and can therefore not be represented in full precision as float.

Languages like SQL do not have this problem, as they can use a fixed point representation.


Out of curiousity I would like to know which other languages you are talking about.

I tested for example with C:
#include <stdio.h>
#include <stdlib.h>

main(){
    char *s = "97757589545391667";
    char *e;
    unsigned long int l = strtoul(s,&e,10);
    double d = atof(s);

    printf("%s\n", s);
    printf("%lu\n", l);
    printf("%lf\n", d);
}

Open in new window

and got the output:

97757589545391667
97757589545391667
97757589545391664.000000

converting a huge 'float-string' to a float (double precision) and then to an int should fail with any language, that uses the IEEE double representation for floating point numbers.

Remove for example the first two digits of your number and you'll see that it will work again in Python and in C, as the number can now be fully represented as double without loosing precision.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@routinet

Your suggested solution code works probably, but I would not use it,
as it works probably just in your use case.
Anybody trying to reuse your function in another context might have some negative surprises.

So some comments.

you do you write
int(str(x).split('.')[0])
instead of
int(x.split('.')[0])

Do really expect, that the function receives either an integer or a string or a float value???

If no (it receives only strings and integers), then you can remove the str() call as you had already a string.

If yes, then the code works only in some special cases.
Just try

str(1000000000000.)
and you'll see, that you get '1e+12'
int(str(1000000000000.))  would cause an error.

So I suggest you use following snippet.

from decimal import Decimal

def clean_integer(x):
    if type(x) is int: return x # int will be handled here
    try:
       return int(Decimal(str(x))) # strings AND floats would be handled here
       #return int(Decimal(x)) # ONLY strings would be handled here
    except TypeError:
        # perhaps it would be better to raise an exception and not just silently return 0
        # I don't know your code, but perhaps something to think of
        return 0

Open in new window

$> php -v
PHP 5.3.3 (cli) (built: May  3 2012 17:33:17)
Copyright (c) 1997-2010 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies

$> php -r '$a = "97757589545391667.0";echo (int)$a;'
97757589545391667

$> php -r '$a = "97757589545391667";echo (int)$a;'
97757589545391667

$> php -r '$a = "stuff";echo (int)$a;'
0

Open in new window


That is exactly the behavior I want.

The purpose of clean_integer() is to whitelist any (presumed) numeric data point I think is suspect at the time I have to use it.  Since I am not familiar with most of the code base yet, just about everything is suspect.  My code is always designed to initialize variables using whitelisting, so the rest of my code knows what to expect and how to fail gracefully.  From what I've seen, the rest of the code is designed to work properly, provided it is passed the proper values.  Unfortunately, the idea of a proper value in the old code is somewhat inconsistent, at best, and odd things can make it into what should be a string representation of an integer.  Should my code fail because clean_integer() returns 0, then I can go back and correct the old code that passed the bad data to my code originally.

After running some benchmarks, I'm going to go with the method I posted last (using str().split('.')[0]).  The performance on any method requiring instantiation (i.e., Decimal, re, etc.) was just too horrible to consider implementing.  Like orders-of-magnitude kind of horrible.  I've posted another question regarding my benchmarking method at https://www.experts-exchange.com/questions/27784998/python-performance-am-I-missing-something.html.

Thanks to all for the input.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I used the str().split('.')[0] method, wrapped in a try/except clause, mostly for performance reasons.  This will certainly not catch all data that *could* work, but it should suffice for now.