# python type conversion exhibiting unexpected behavior

Can someone please explain why this is?  And more importantly, how to correct it...

``````\$> python
Python 2.6.6 (r266:84292, Sep 12 2011, 14:03:14)
[GCC 4.4.5 20110214 (Red Hat 4.4.5-6)] on linux2
>>> a='97757589545391667'
>>> a
'97757589545391667'
>>> int(a)
97757589545391667
>>> float(a)
97757589545391664.0``````
kevinhigg

Good day.  So the question is why the float doesn't match the int?  I am not a Python expert, but it seems that this is described in several sources.  Very interesting reading; enjoy!

http://docs.python.org/tutorial/floatingpoint.html
http://docs.python.org/library/stdtypes.html
http://stackoverflow.com/questions/5997027/python-rounding-error-with-float-numbers

I understand about precision in floating points, but this example is particularly irksome.  Perhaps a little more explanation will help towards a solution.

I inherited a ~7-year-old code base.  To generate the profile id of a user, they implemented some voodoo with time.time() and random numbers to arrive at some large integer value.  This integer value is stored in a text field in MySQL (text...not even varchar...but I digress).  Sometime before that value gets to my new code, it *may* have received a float-style notation (i.e., adding '.0' to the end).  My code must ensure the value is an integer.  So I have a fairly simple function:

``````def clean_integer(x):
if type(x) is int: return x
try:
y=int(float(x))
except:
y=0
return y``````

I certainly expect precision errors in the fractional portion, but I considered it moot since I would ideally be discarding it without changing the integer portion.  But this particular number will not co-operate.  So far, it is the only one I've found, though I'm sure there will be others.

Python seems to lack a float-to-int conversion.  How can I address this issue?
SOLUTION
gelonida

membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Here three ways (with different results) to detect an int (depending on what you define an int

``````numstrings = [
'7757589545391667',
'7757589545391667.0',
'7757589545391667.3',
]

# detect with decimal.Decimal, whether a huge number is an integer
from decimal import Decimal
one = Decimal('1')
for numstr in numstrings:
as_decimal = Decimal(numstr)
if as_decimal == as_decimal.quantize(one):
print("%s is an integer" % numstr)

# just check for the presence of a decimal point
for numstr in numstrings:
if '.' in numstr:
print("%s has a '.', thus it is a float" % numstr)

# or use a regexp (only nonzero digits after the decimal point)
import re
my_re = re.compile(r'\d+(\.0*|)\$')
for numstr in numstrings:
if my_re.match(numstr):
print("%s is an integer" % numstr)``````

membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Normally every programming language, that is using IEEE doubles to represent floating point values should have the same problem if you try to parse the string
"7757589545391667" into a float (with double precision)
The number has too many digits and can therefore not be represented in full precision as float.

Languages like SQL do not have this problem, as they can use a fixed point representation.

Out of curiousity I would like to know which other languages you are talking about.

I tested for example with C:
``````#include <stdio.h>
#include <stdlib.h>

main(){
char *s = "97757589545391667";
char *e;
unsigned long int l = strtoul(s,&e,10);
double d = atof(s);

printf("%s\n", s);
printf("%lu\n", l);
printf("%lf\n", d);
}``````
and got the output:

97757589545391667
97757589545391667
97757589545391664.000000

converting a huge 'float-string' to a float (double precision) and then to an int should fail with any language, that uses the IEEE double representation for floating point numbers.

Remove for example the first two digits of your number and you'll see that it will work again in Python and in C, as the number can now be fully represented as double without loosing precision.
SOLUTION

membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
@routinet

Your suggested solution code works probably, but I would not use it,
as it works probably just in your use case.
Anybody trying to reuse your function in another context might have some negative surprises.

you do you write
int(str(x).split('.')[0])
int(x.split('.')[0])

Do really expect, that the function receives either an integer or a string or a float value???

If no (it receives only strings and integers), then you can remove the str() call as you had already a string.

If yes, then the code works only in some special cases.
Just try

str(1000000000000.)
and you'll see, that you get '1e+12'
int(str(1000000000000.))  would cause an error.

So I suggest you use following snippet.

``````from decimal import Decimal

def clean_integer(x):
if type(x) is int: return x # int will be handled here
try:
return int(Decimal(str(x))) # strings AND floats would be handled here
#return int(Decimal(x)) # ONLY strings would be handled here
except TypeError:
# perhaps it would be better to raise an exception and not just silently return 0
# I don't know your code, but perhaps something to think of
return 0``````

``````\$> php -v
PHP 5.3.3 (cli) (built: May  3 2012 17:33:17)
Copyright (c) 1997-2010 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies

\$> php -r '\$a = "97757589545391667.0";echo (int)\$a;'
97757589545391667

\$> php -r '\$a = "97757589545391667";echo (int)\$a;'
97757589545391667

\$> php -r '\$a = "stuff";echo (int)\$a;'
0``````

That is exactly the behavior I want.

The purpose of clean_integer() is to whitelist any (presumed) numeric data point I think is suspect at the time I have to use it.  Since I am not familiar with most of the code base yet, just about everything is suspect.  My code is always designed to initialize variables using whitelisting, so the rest of my code knows what to expect and how to fail gracefully.  From what I've seen, the rest of the code is designed to work properly, provided it is passed the proper values.  Unfortunately, the idea of a proper value in the old code is somewhat inconsistent, at best, and odd things can make it into what should be a string representation of an integer.  Should my code fail because clean_integer() returns 0, then I can go back and correct the old code that passed the bad data to my code originally.

After running some benchmarks, I'm going to go with the method I posted last (using str().split('.')[0]).  The performance on any method requiring instantiation (i.e., Decimal, re, etc.) was just too horrible to consider implementing.  Like orders-of-magnitude kind of horrible.  I've posted another question regarding my benchmarking method at https://www.experts-exchange.com/questions/27784998/python-performance-am-I-missing-something.html.

Thanks to all for the input.
SOLUTION

membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION

membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.