Solved

# Shortest integer to string algorithm

Posted on 2004-11-02
Medium Priority
8,699 Views
Hello,
can anybody suggest , the shortest integer to string algorithm.i already have implemented this code.the integer is between 1 to 9999 and is unsigned.assembly code will do but i would be better off with c code and hey no library functions(itoa or sprintf).

char buffer[12];
int i=7876;
char *back;
int divider;
int count=0;

(i/1000)?divider=1000:divider=(i/100)?100:(i/10)?10:1;
back = buffer;

do {
*(back+count) = i/divider + '0';
i=i%divider;
++count;
divider=divider/10;
} while (i);

*(back+count)=0;

i need to make it more concise in terms of instructions in the memory.its purely a memory issue.thanking you all.

elico
0
Question by:software_elico
• 7
• 6
• 6
• +2

LVL 45

Expert Comment

ID: 12480571
>i need to make it more concise in terms of instructions in the memory.its purely a memory issue.thanking you all.
>the integer is between 1 to 9999 and is unsigned

char buffer[12];     >>> can be reduced to 5 bytes
int i=7876;
char *back;
int divider;
int count=0;

(i/1000)?divider=1000:divider=(i/100)?100:(i/10)?10:1;  >> do you have division implemented as an instruction on your target platform? If no, then perhaps you would like to convert this division into a sequence of shift and addition/subtraction operations. This is likely to save some memory and should result in better performance

e.g.
a/10 is same as ((a>>3) - (2*a)) ... which is same as ((a>>3) - (a<<1))

back = buffer;

do {
*(back+count) = i/divider + '0';
i=i%divider;
++count;
divider=divider/10;
} while (i);

*(back+count)=0;
0

LVL 45

Expert Comment

ID: 12480593
Sorry, it seems my mind is wandering

e.g.
a/10 is same as ((a>>3) - (2*a)) ... which is same as ((a>>3) - (a<<1))

is so obviously incorrect, but I hope you get the idea that division and multiplications need to be converted to shifts, additions, and subtractions.
0

LVL 45

Expert Comment

ID: 12480683
0

Author Comment

ID: 12480882
I agree that changing the mul and div operations to shifting bits is much faster but i need a code in just two or three lines maximum.

thanks
0

LVL 45

Expert Comment

ID: 12480960
2-3 lines of code does not necessarily mean that you will have 2-3 assembly instructions corrresponding to them. 1-1 correspondence may not exist.

Here is a modification of your source code whilch "looks" compact but should generate the same amount of instructions or may be even more

char buffer[5];
int i=7876;
int divider;
int count=0;

(i/1000)?divider=1000:divider=(i/100)?100:(i/10)?10:1;

do {
buffer[count++] = i/divider + '0';
divider=divider/10;
} while (i=i%(divider*10));

buffer[count]=0;
0

LVL 16

Accepted Solution

PaulCaswell earned 400 total points
ID: 12481220
It seems we have lost sight of the most laudable requirement of:

>>...it more concise...

The answer is assemble the number from right to left.

i.e.

char buffer[MAX_NUM];

int n = 7876;

// Do something special for 'n==0' case because this will not generate a '0'.
for ( i = MAX_NUM-1; i > 0 && n > 0; i-- )
{
buffer[i] = (n%10)+'0';
n /= 10;
}

// Now &buffer[i] (or is it &buffer[i+1] ... I can never remember) contains the string of length MAX_NUM-i.
0

LVL 45

Expert Comment

ID: 12481265
Hi Paul,

Its always nice to have you or Kent present in the thread :-)
>>i need to make it more concise *in terms of instructions* in the memory.

I thought instructions generated would be of significant impact. Since multiplication and division are significantly more expensive and we know that we would be dividing or mod'ding by 10 (or any known divisor), I was suggesting leveraging that knowledge. :-)

cheers
0

LVL 16

Expert Comment

ID: 12481345
Hi sunny,

>>Its always nice to have you or Kent present in the thread :-)
You are so kind :))

I totally agree with your approach of removing unnecessary multiplication and/or division. This would make significant improvements in code size and speed.

I would recommend using BOTH of our suggestions. An algorithmic change such as mine should be deeply beneficial, especially in this case where such a tangled and limited algorithm has been chosen as first-attempt. I have seen astonishing improvements in code using shifts instead of * or / or %.

I'll leave it to you to demonstrate techniques for implementing '%' using simple adds and shiifts ;).

Paul
0

LVL 45

Expert Comment

ID: 12481400
>I would recommend using BOTH of our suggestions.
I totally agree.

>I'll leave it to you to demonstrate techniques for implementing '%' using simple adds and shiifts ;).
0

LVL 22

Expert Comment

ID: 12483239
If you're going to do this in assembly, you can really win big, as the division instruction is going to give you both the remainder (the next output character), and the quotient, (the next value to divide).

In C, it's somewhat less likely that the compiler will see this optimization.

0

LVL 39

Expert Comment

ID: 12485651
Look at that prog (i used C++ only for output)

#include <iostream>
#include <time.h>
using namespace std;

#define MAX_RUN 50000000
int main()
{
int a;
int i;
int x;
int s = 0;
int t1, t2, t3, t4, t5;

/* first loop ... getting warm only */
for (i = 0; i < MAX_RUN; ++i)
{
a = rand(); a =  (a>>4);  // get random numbers less than 9999
if (a == 1) s++;              // to prevent from optimization
}
t1 = clock();
for (i = 0; i < MAX_RUN; ++i)
{
a = rand(); a =  (a>>4);
if (a == 1) s++;
}
t2 = clock();
for (i = 0; i < MAX_RUN; ++i)
{
a = rand(); a =  (a>>4);
x = a/10;
if (x == 1) s++;
}
t3 = clock();
for (i = 0; i < MAX_RUN; ++i)
{
a = rand(); a =  (a>>4);
x = (a*6554)>>16;          // 6554 = round (2^16 / 10)
// x = (a * round (2^16 / 10) ) / 2^16
if (x == 1) s++;
}
t4 = clock();
for (i = 0; i < MAX_RUN; ++i)
{
a = rand(); a =  (a>>4);
// a*6554 = ((a<<12)+(a<<11)+(a<<8)+(a<<7)+(a<<3)+(a<<2));
x = ((a<<12)+(a<<11)+(a<<8)+(a<<7)+(a<<3)+(a<<2))>>16;
if (x == 1) s++;
}
t5 = clock();
cout << "loop only             = " << t2 - t1 << endl
<< "a/10                    = " << t3 - t2 << endl
<< "a*6554                = " << t4 - t3 << endl
<< "(a<<12 +..)>>16 = " << t5 - t4 << endl;
return s;
}

The result for MAX_RUN = 50 millions (in milliseconds)

loop only              = 875
a/10                     = 907
(a*6554)>>16      = 906
(a<<12 +..)>>16 = 890

Further runs give similar results where any algorithm was fastest at least once and even 'loop only' was slowest once. There might be compilers that have greater differences but i doubt you'll gain any significant advantage for a integer-to-string- converter by using bit shifts or multiplication rather than division and modulo.

So, i think that Paul's code moved to a function and handling 0 value will do it:

char* intToStr(int n, char* buffer, int i)
{
buffer[--i] = 0;
if (n == 0) buffer[--i] = '0';
else for (; n > 0; n/=10) buffer[--i] = (n%10) + '0';
return &buffer[i];
}

Regards, Alex

0

LVL 22

Expert Comment

ID: 12486082
You might want to add a line or two to handle the case of n < 0, and subsume the ==0 case and avoid some underruns:

char* intToStr(int n, char* buffer, int i)
{
buffer[--i] = '\0';
if( n < 0 ) { buffer[ --i ] = '-'; n = -n; }
do {
buffer[--i] = (n%10) + '0';
n /= 10;
} while( n > 0 && i > 0 );
return &buffer[i];
}

---------------------------------

Or if you want real short, here's a recursive one:

char * Out;

#define intToStr( n, Buf ) {  Out = Buf; OutDigit( n ); *Out = '\0'; }

OutDigit( int n )      {
if( n > 9 ) OutDigit( n / 10 );
*Out++ = (  n % 10)  + '0';
}
0

LVL 16

Expert Comment

ID: 12491353
In summary,

If you want completeness, go for Gregs 'intToStr'. Its just about as minimal as you're going to get, does everything mine did while still being a complete solution handling both negative and zero.

For full speed, use the shift maths from Alex instead of the 'n/10' and 'n%10' but watch out for his 'for (; n > 0; n/=10)...'. Sweet though it is, I think there's an extra '/=' in there that's unnecessary. Perhaps we'll debate that because it is a really sweet loop.

For a really zany approach that would probably actually work out to be minimal, use Alex's recursive solution. But put loads of comments in for the poor software engineer who looks at it in a few years time, especially if its in assembler. Changing a global pointer in a recursive function is definitely not thread safe.

Good Luck.

Paul

0

LVL 39

Expert Comment

ID: 12492188
>> I think there's an extra '/=' in there that's unnecessary.

Though  a/10 and a%10 nearly do the same, i couldn't find a simple way to take advantage of that.

That's a working attempt, but needed a few lines more...

char* intToStr(int n, char* buffer, int i)
{
int n10 = n;
buffer[--i] = 0;
if (n == 0) buffer[--i] = '0';
else
for (; n > 0; n10 = n)
{
n /= 10;
buffer[--i] = n10 - ((n<<3) + (n<<1)) + '0';
}
return &buffer[i];
}

Regards, Alex

0

LVL 39

Expert Comment

ID: 12492577
... one line less

char* intToStr(int n, char* buffer, int i)
{
int n10 = n;
buffer[--i] = 0;
if (n == 0) buffer[--i] = '0';
else for (n /= 10; n10 > 0; n10 = n, n /= 10)
buffer[--i] = n10 - ((n<<3) + (n<<1)) + '0';
return &buffer[i];
}
0

LVL 22

Expert Comment

ID: 12493363
Minor enhancement:
If you move the tests to the bottom of the loop you don't need to special-case the ==0.

0

LVL 39

Expert Comment

ID: 12495986
>> you don't need to special-case the ==0.

char* intToStr(int n, char* buffer, int i)
{
int n10 = n/10;
buffer[--i] = '\0';
do buffer[--i] = (n - ((n10<<3) + (n10<<1)) + '0'), n = n10, n10 /= 10;
while(n > 0);
return &buffer[i];
}

You do mean that?

Regards, Alex
0

LVL 16

Expert Comment

ID: 12496015
Magic!!!

BUt doesnt a comma separated expression evaluate to the LAST part, not the first?

Paul
0

LVL 22

Expert Comment

ID: 12496495
Magic!!!

> BUt doesnt a comma separated expression evaluate to the LAST part, not the first?

If so you'll have to invest in a semicolon and a pair of braces.  sniff....

0

LVL 39

Expert Comment

ID: 12496572
>> BUt doesnt a comma separated expression evaluate to the LAST part, not the first?

No, at least not with VC6 compiler as i tested it using that main():

int main()
{
char str[10];
int  n;

while (true)
{
cout << "Enter number ==>";
cin >> n;
if (n < 0)
break;

cout << n << " == " << intToStr(n, str, sizeof(str)) << endl;
str[9] = 'x';
}
return 0;
}

My explanation is that a comma separates statements but not  rvalues . A look to C++ bible  (The C++ programming language from Bjarne Stroustrup) shows the following grammar:

expression:
assignment-expression
expression, assignment-expression

Here yo see that comma separates expressions and the first expression is independent of the second.

Regards, Alex
0

LVL 39

Expert Comment

ID: 12496659
Another argument is that

i = 0, n = 1

wouldn't work correctly as start condition in a for statement then, but it does.

Regards, Alex
0

LVL 22

Expert Comment

ID: 12496691
BTW let's not think we're so hackerly, one of the original MIT HAKMEMS shows a unsigned decimal out routine written in PDP-10 assembly language with the following features:

(1)  It's exactly  SIX assembler instructions long.  (Well, ONE instruction is the system call to print a character)

(2)  If you call it with a parameter of -36, it prints out CR,LF and clears the terminal column counter! (All thru clever arithmetic, there's not a CR,LF constant anywhere in the code)

Now that's impressive!

0

LVL 22

Expert Comment

ID: 12496796

char* intToStr(int n, char* buffer, int i)
{
int n10;
buffer[--i] = '\0';
do buffer[--i] = (n - (((n=n10=n/10)<<3) + (n10<<1)) + '0');
while(n > 0);
return &buffer[i];
}
0

LVL 39

Expert Comment

ID: 12497266
Doesn't work with VC6. The result of

do buffer[--i] = (n - (((n=n10=n/10)<<3) + (n10<<1)) + '0');

evaluates to a negative number

Actually it seems to be a compiler bug as

do buffer[--i] = '0' + n - ((n=n10=n/10)<<3) - (n10<<1) ;

works.

TATATA

0

Author Comment

ID: 12502151
Thanks for the great optimized instructions,My boss agreed on pauls code because it is comaparitively faster on the environment we are using and we do not have a 0 as input so we dont have to handle the situation.

cheers
0

LVL 16

Expert Comment

ID: 12502608
>>Now that's impressive!

This impressed me some time ago. I have posted it before but I 'found' it yesterday and I thought of this question:

#include <stdio.h>
#define r0 return(0);

#define f(X,g,p,N,M)M{return X?a&b?a&b&1<<i&&m(b,a^1<<i,9)>8?i:M:0:9;}main(){\
int a=511,b=a,i=4;for(;X;b^=1<<N)a^=1<<g-'1',g,p(N+'1'),p('\n');r0}  /* 123 */
f(i--&&b&7&&b&56&&b&448&&b&73&&b&146&&b&292&&b  /* John Rickard   */ /* 456 */
&273&&b&84,getchar(),putchar,m(b,a,9),m(a,b,i)) /* xxx@xxxx.xx.xx */ /* 789 */

No points as to what it does, just the fun of finding out. Its not mine, I picked it up some time ago.

Have we got time to investigate the recursive one?

The problem with recursive solutions to this problem is that the natural order turns out to be digits-reversed. The answer is to keep a count of the recursion level and place the characters back in reverse order.

int IntToStr (int i, char * s, int digit = 0, int digits = 0)
{
// On the way up, do the rest of the digits.
if ( i > 9 ) digits += IntToStr(i/10,s+1,digit+1);
// Do our digit.
s[digits-digit] = (i%10) + '0';
// One more digit.
return digits+1;
}

A rather entertaining side-effect of this is that the function actually returns the number of digits generated so you can use:

str[IntToStr(12345, str)] = '\0';

Benefits:

1. There is no assumption in the function that there must be a '\0' terminator.
2. There is no special case for '0'.
3. It should be easy to replace the i/10 and i%10 with shifts without affecting the algorithm.
4. There is no loop.

Paul

0

LVL 16

Expert Comment

ID: 12502680
elico,

Thanks for the points.

All,

Sorry I got all the points for this guys. I felt this was a joint solution.

"The balance may shift but always returns."

Paul
0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Preface I don't like visual development tools that are supposed to write a program for me. Even if it is Xcode and I can use Interface Builder. Yes, it is a perfect tool and has helped me a lot, mainly, in the beginning, when my programs were smallâ€¦
Examines three attack vectors, specifically, the different types of malware used in malicious attacks, web application attacks, and finally, network based attacks.  Concludes by examining the means of securing and protecting critical systems and infâ€¦
The goal of this video is to provide viewers with basic examples to understand and use for-loops in the C programming language.
The goal of this video is to provide viewers with basic examples to understand how to create, access, and change arrays in the C programming language.
###### Suggested Courses
Course of the Month16 days, 7 hours left to enroll