• C

Shortest integer to string algorithm

Hello,
        can anybody suggest , the shortest integer to string algorithm.i already have implemented this code.the integer is between 1 to 9999 and is unsigned.assembly code will do but i would be better off with c code and hey no library functions(itoa or sprintf).

char buffer[12];
      int i=7876;  
      char *back;
      int divider;
                int count=0;
   
      (i/1000)?divider=1000:divider=(i/100)?100:(i/10)?10:1;
     back = buffer;
     

     do {
          *(back+count) = i/divider + '0';
          i=i%divider;
              ++count;
              divider=divider/10;
     } while (i);
       
      *(back+count)=0;

i need to make it more concise in terms of instructions in the memory.its purely a memory issue.thanking you all.

elico
software_elicoAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

sunnycoderCommented:
>i need to make it more concise in terms of instructions in the memory.its purely a memory issue.thanking you all.
>the integer is between 1 to 9999 and is unsigned

char buffer[12];     >>> can be reduced to 5 bytes
     int i=7876;  
     char *back;
     int divider;
                int count=0;
   
     (i/1000)?divider=1000:divider=(i/100)?100:(i/10)?10:1;  >> do you have division implemented as an instruction on your target platform? If no, then perhaps you would like to convert this division into a sequence of shift and addition/subtraction operations. This is likely to save some memory and should result in better performance

e.g.
a/10 is same as ((a>>3) - (2*a)) ... which is same as ((a>>3) - (a<<1))

     back = buffer;
     

     do {
          *(back+count) = i/divider + '0';
          i=i%divider;
            ++count;
            divider=divider/10;
     } while (i);
       
      *(back+count)=0;
0
sunnycoderCommented:
Sorry, it seems my mind is wandering

e.g.
a/10 is same as ((a>>3) - (2*a)) ... which is same as ((a>>3) - (a<<1))

is so obviously incorrect, but I hope you get the idea that division and multiplications need to be converted to shifts, additions, and subtractions.
0
Introducing the "443 Security Simplified" Podcast

This new podcast puts you inside the minds of leading white-hat hackers and security researchers. Hosts Marc Laliberte and Corey Nachreiner turn complex security concepts into easily understood and actionable insights on the latest cyber security headlines and trends.

software_elicoAuthor Commented:
I agree that changing the mul and div operations to shifting bits is much faster but i need a code in just two or three lines maximum.

thanks
0
sunnycoderCommented:
2-3 lines of code does not necessarily mean that you will have 2-3 assembly instructions corrresponding to them. 1-1 correspondence may not exist.

Here is a modification of your source code whilch "looks" compact but should generate the same amount of instructions or may be even more

char buffer[5];
     int i=7876;  
     int divider;
     int count=0;
   
     (i/1000)?divider=1000:divider=(i/100)?100:(i/10)?10:1;

     do {
          buffer[count++] = i/divider + '0';
          divider=divider/10;
     } while (i=i%(divider*10));
       
      buffer[count]=0;
0
PaulCaswellCommented:
It seems we have lost sight of the most laudable requirement of:

>>...it more concise...

The answer is assemble the number from right to left.

i.e.

char buffer[MAX_NUM];

int n = 7876;

// Do something special for 'n==0' case because this will not generate a '0'.
for ( i = MAX_NUM-1; i > 0 && n > 0; i-- )
{
 buffer[i] = (n%10)+'0';
 n /= 10;
}

// Now &buffer[i] (or is it &buffer[i+1] ... I can never remember) contains the string of length MAX_NUM-i.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
sunnycoderCommented:
Hi Paul,

Its always nice to have you or Kent present in the thread :-)
>>i need to make it more concise *in terms of instructions* in the memory.

I thought instructions generated would be of significant impact. Since multiplication and division are significantly more expensive and we know that we would be dividing or mod'ding by 10 (or any known divisor), I was suggesting leveraging that knowledge. :-)

cheers
0
PaulCaswellCommented:
Hi sunny,

>>Its always nice to have you or Kent present in the thread :-)
You are so kind :))

I totally agree with your approach of removing unnecessary multiplication and/or division. This would make significant improvements in code size and speed.

I would recommend using BOTH of our suggestions. An algorithmic change such as mine should be deeply beneficial, especially in this case where such a tangled and limited algorithm has been chosen as first-attempt. I have seen astonishing improvements in code using shifts instead of * or / or %.

I'll leave it to you to demonstrate techniques for implementing '%' using simple adds and shiifts ;).

Paul
0
sunnycoderCommented:
>I would recommend using BOTH of our suggestions.
I totally agree.

>I'll leave it to you to demonstrate techniques for implementing '%' using simple adds and shiifts ;).
Already posted a detaaaaaaaaaailed link :-)
0
grg99Commented:
If you're going to do this in assembly, you can really win big, as the division instruction is going to give you both the remainder (the next output character), and the quotient, (the next value to divide).

In C, it's somewhat less likely that the compiler will see this optimization.

0
itsmeandnobodyelseCommented:
Look at that prog (i used C++ only for output)

#include <iostream>
#include <time.h>
using namespace std;
           
#define MAX_RUN 50000000
int main()
{
    int a;
    int i;
    int x;
    int s = 0;
    int t1, t2, t3, t4, t5;

    /* first loop ... getting warm only */
    for (i = 0; i < MAX_RUN; ++i)
    {
        a = rand(); a =  (a>>4);  // get random numbers less than 9999
        if (a == 1) s++;              // to prevent from optimization
    }
    t1 = clock();
    for (i = 0; i < MAX_RUN; ++i)
    {
        a = rand(); a =  (a>>4);
        if (a == 1) s++;
    }
    t2 = clock();  
    for (i = 0; i < MAX_RUN; ++i)
    {
        a = rand(); a =  (a>>4);
        x = a/10;
        if (x == 1) s++;
    }
    t3 = clock();
    for (i = 0; i < MAX_RUN; ++i)
    {
        a = rand(); a =  (a>>4);
        x = (a*6554)>>16;          // 6554 = round (2^16 / 10)
                                               // x = (a * round (2^16 / 10) ) / 2^16
        if (x == 1) s++;
    }
    t4 = clock();
    for (i = 0; i < MAX_RUN; ++i)
    {
        a = rand(); a =  (a>>4);
       // a*6554 = ((a<<12)+(a<<11)+(a<<8)+(a<<7)+(a<<3)+(a<<2));
        x = ((a<<12)+(a<<11)+(a<<8)+(a<<7)+(a<<3)+(a<<2))>>16;
        if (x == 1) s++;
    }
    t5 = clock();
    cout << "loop only             = " << t2 - t1 << endl
           << "a/10                    = " << t3 - t2 << endl
           << "a*6554                = " << t4 - t3 << endl
           << "(a<<12 +..)>>16 = " << t5 - t4 << endl;  
    return s;
}

The result for MAX_RUN = 50 millions (in milliseconds)

loop only              = 875
a/10                     = 907
(a*6554)>>16      = 906
(a<<12 +..)>>16 = 890

Further runs give similar results where any algorithm was fastest at least once and even 'loop only' was slowest once. There might be compilers that have greater differences but i doubt you'll gain any significant advantage for a integer-to-string- converter by using bit shifts or multiplication rather than division and modulo.

So, i think that Paul's code moved to a function and handling 0 value will do it:

char* intToStr(int n, char* buffer, int i)
{
    buffer[--i] = 0;
    if (n == 0) buffer[--i] = '0';
    else for (; n > 0; n/=10) buffer[--i] = (n%10) + '0';
    return &buffer[i];
}

Regards, Alex

0
grg99Commented:
You might want to add a line or two to handle the case of n < 0, and subsume the ==0 case and avoid some underruns:


char* intToStr(int n, char* buffer, int i)
{
    buffer[--i] = '\0';
    if( n < 0 ) { buffer[ --i ] = '-'; n = -n; }
    do {
            buffer[--i] = (n%10) + '0';
            n /= 10;
    } while( n > 0 && i > 0 );
    return &buffer[i];
}

---------------------------------

Or if you want real short, here's a recursive one:

char * Out;

#define intToStr( n, Buf ) {  Out = Buf; OutDigit( n ); *Out = '\0'; }

OutDigit( int n )      {
            if( n > 9 ) OutDigit( n / 10 );
            *Out++ = (  n % 10)  + '0';
      }
0
PaulCaswellCommented:
In summary,

If you want completeness, go for Gregs 'intToStr'. Its just about as minimal as you're going to get, does everything mine did while still being a complete solution handling both negative and zero.

For full speed, use the shift maths from Alex instead of the 'n/10' and 'n%10' but watch out for his 'for (; n > 0; n/=10)...'. Sweet though it is, I think there's an extra '/=' in there that's unnecessary. Perhaps we'll debate that because it is a really sweet loop.

For a really zany approach that would probably actually work out to be minimal, use Alex's recursive solution. But put loads of comments in for the poor software engineer who looks at it in a few years time, especially if its in assembler. Changing a global pointer in a recursive function is definitely not thread safe.

Good Luck.

Paul




0
itsmeandnobodyelseCommented:
>> I think there's an extra '/=' in there that's unnecessary.

Though  a/10 and a%10 nearly do the same, i couldn't find a simple way to take advantage of that.

That's a working attempt, but needed a few lines more...

char* intToStr(int n, char* buffer, int i)
{
    int n10 = n;
    buffer[--i] = 0;
    if (n == 0) buffer[--i] = '0';
    else
        for (; n > 0; n10 = n)
        {
            n /= 10;
            buffer[--i] = n10 - ((n<<3) + (n<<1)) + '0';
        }
    return &buffer[i];
}

Regards, Alex

0
itsmeandnobodyelseCommented:
... one line less

char* intToStr(int n, char* buffer, int i)
{
    int n10 = n;
    buffer[--i] = 0;
    if (n == 0) buffer[--i] = '0';
    else for (n /= 10; n10 > 0; n10 = n, n /= 10)
            buffer[--i] = n10 - ((n<<3) + (n<<1)) + '0';
    return &buffer[i];
}
0
grg99Commented:
Minor enhancement:
  If you move the tests to the bottom of the loop you don't need to special-case the ==0.

0
itsmeandnobodyelseCommented:
>> you don't need to special-case the ==0.

char* intToStr(int n, char* buffer, int i)
{
    int n10 = n/10;
    buffer[--i] = '\0';
    do buffer[--i] = (n - ((n10<<3) + (n10<<1)) + '0'), n = n10, n10 /= 10;
    while(n > 0);
    return &buffer[i];
}

You do mean that?

Regards, Alex
0
PaulCaswellCommented:
Magic!!!

BUt doesnt a comma separated expression evaluate to the LAST part, not the first?

Paul
0
grg99Commented:
Magic!!!

> BUt doesnt a comma separated expression evaluate to the LAST part, not the first?

If so you'll have to invest in a semicolon and a pair of braces.  sniff....

0
itsmeandnobodyelseCommented:
>> BUt doesnt a comma separated expression evaluate to the LAST part, not the first?

No, at least not with VC6 compiler as i tested it using that main():

int main()
{
    char str[10];
    int  n;

    while (true)
    {
        cout << "Enter number ==>";
        cin >> n;
        if (n < 0)
            break;

        cout << n << " == " << intToStr(n, str, sizeof(str)) << endl;
        str[9] = 'x';
    }
    return 0;
}

My explanation is that a comma separates statements but not  rvalues . A look to C++ bible  (The C++ programming language from Bjarne Stroustrup) shows the following grammar:

   expression:
   assignment-expression
   expression, assignment-expression

Here yo see that comma separates expressions and the first expression is independent of the second.

Regards, Alex
0
itsmeandnobodyelseCommented:
Another argument is that

     i = 0, n = 1

wouldn't work correctly as start condition in a for statement then, but it does.

Regards, Alex
0
grg99Commented:
BTW let's not think we're so hackerly, one of the original MIT HAKMEMS shows a unsigned decimal out routine written in PDP-10 assembly language with the following features:

(1)  It's exactly  SIX assembler instructions long.  (Well, ONE instruction is the system call to print a character)

(2)  If you call it with a parameter of -36, it prints out CR,LF and clears the terminal column counter! (All thru clever arithmetic, there's not a CR,LF constant anywhere in the code)


Now that's impressive!




0
grg99Commented:
How about this, two statements shorter, but has tricky evaluation order dependency:

char* intToStr(int n, char* buffer, int i)
{
    int n10;
    buffer[--i] = '\0';
    do buffer[--i] = (n - (((n=n10=n/10)<<3) + (n10<<1)) + '0');
    while(n > 0);
    return &buffer[i];
}
0
itsmeandnobodyelseCommented:
Doesn't work with VC6. The result of  

     do buffer[--i] = (n - (((n=n10=n/10)<<3) + (n10<<1)) + '0');

evaluates to a negative number

Actually it seems to be a compiler bug as

    do buffer[--i] = '0' + n - ((n=n10=n/10)<<3) - (n10<<1) ;

works.

TATATA

   
0
software_elicoAuthor Commented:
Thanks for the great optimized instructions,My boss agreed on pauls code because it is comaparitively faster on the environment we are using and we do not have a 0 as input so we dont have to handle the situation.

cheers
0
PaulCaswellCommented:
>>Now that's impressive!

This impressed me some time ago. I have posted it before but I 'found' it yesterday and I thought of this question:

#include <stdio.h>
#define r0 return(0);

#define f(X,g,p,N,M)M{return X?a&b?a&b&1<<i&&m(b,a^1<<i,9)>8?i:M:0:9;}main(){\
int a=511,b=a,i=4;for(;X;b^=1<<N)a^=1<<g-'1',g,p(N+'1'),p('\n');r0}  /* 123 */
f(i--&&b&7&&b&56&&b&448&&b&73&&b&146&&b&292&&b  /* John Rickard   */ /* 456 */
&273&&b&84,getchar(),putchar,m(b,a,9),m(a,b,i)) /* xxx@xxxx.xx.xx */ /* 789 */

No points as to what it does, just the fun of finding out. Its not mine, I picked it up some time ago.

Have we got time to investigate the recursive one?

The problem with recursive solutions to this problem is that the natural order turns out to be digits-reversed. The answer is to keep a count of the recursion level and place the characters back in reverse order.

int IntToStr (int i, char * s, int digit = 0, int digits = 0)
{
      // On the way up, do the rest of the digits.
      if ( i > 9 ) digits += IntToStr(i/10,s+1,digit+1);
      // Do our digit.
      s[digits-digit] = (i%10) + '0';
      // One more digit.
      return digits+1;
}

A rather entertaining side-effect of this is that the function actually returns the number of digits generated so you can use:

      str[IntToStr(12345, str)] = '\0';

Benefits:

1. There is no assumption in the function that there must be a '\0' terminator.
2. There is no special case for '0'.
3. It should be easy to replace the i/10 and i%10 with shifts without affecting the algorithm.
4. There is no loop.

Paul

0
PaulCaswellCommented:
elico,

Thanks for the points.

All,

Sorry I got all the points for this guys. I felt this was a joint solution.

"The balance may shift but always returns."

Paul
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.