Link to home
Start Free TrialLog in
Avatar of ankuratvb
ankuratvbFlag for United States of America

asked on

Is % faster than /

I just wanted to know:

Is the mod operation on integers faster than division on integers?
And the reasons for either answer.

Just out of curiosity.
Avatar of booki
booki

ankuratvb,

No, not on IA32.  They both use idiv.  The modulo goes to edx and quotient goes to eax.

b.


Just curious......
Even a SAR will do division by shifting the bits right......correct ?

When is an IDIV used and when is a SAR used ?

Cheers
ASKER CERTIFIED SOLUTION
Avatar of booki
booki

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of ankuratvb

ASKER

Thats what i was also thinking.For mod,you have to do division.

Would it also be the same for floating point numbers as well.
Other than the overhead of the function call for fmod(),would they both execute at the same speed.
anupvijay,

>> Even a SAR will do division by shifting the bits right......correct ?

Yes but only for divisors that are powers of two.

>> When is an IDIV used and when is a SAR used ?

SAR/SHR is used when division is by a divisor that is a power of two and it is an immediate value (constant) or for the '>>' operator.  SAR is used for signed integers.  SHR is used for unsigned integers.

b.
ankuratvb,

>> Would it also be the same for floating point numbers as well.

No, fprem determines a 'partial remainder' through iterative subtraction and may require multiple execution. While division is done with a single instruction, fdiv.  In other words, division will likely be faster.

b.
Avatar of Sjef Bosman
Mod always with IDIV?? I thought that a%32 is equivalent to a&31...

Be careful, doing mod or div with masks and shifts works a bit differently for negative numbers!

Thanx.I have one clarification.

I couldnt get this part:

>*fprem* determines a 'partial remainder' through iterative subtraction and may require >multiple execution.

sjef_bosman,

Yes, you are correct.  With immediate values that are powers of 2, the compiler can optimize and use 'bitwise and'.  Similarly, for arithmetic (integer) operations '*' and '/' that involve immediate values that are powers of two, the compiler can optimize.  However, this is not possible for the general case.

b.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ankuratvb,

// fprem, is implemented something like the following:

bool fprem(float *a, float b) {
    int d = a.exponent - b.exponent; // pseudo-code
    if (d < 64) {
        *a -=  b*(int)(a/b);
        return 0;
    } else {
        *a -= b*(int)(a/b); // not exact.. see intel manual for exact formula
        return 1;
    }
}

// so fmod may need to execute fprem more than once.

float fmod (float a, float b) {
    float r;
    r = a;
    while (fprem(&r,b));
    return r;
}

main () {
    float a,b,rem;
    a = VERYBIG;
    b = VERYSMALL;
    rem = fmod(a,b);
}

b.
stefan73,

Division faster than AND? What can be faster than an AND in this case? Microcode division stuff is usually a repeated subtraction, the MOD comes for free.

Btw, I only reacted on the statement that "For mod, you have to do division." Just meant to show that if DIV can be done by shifting, MOD can be done using AND.

Sjef
Assuming that divisor is in powers of 2

Is IDIV faster or SAR ?
Or is AND faster, if only MOD is required ?

I am not an expert in low level programming...so if that looks stupid please ignore :-))
May be I am talking of CPU cycles here.....but I do not know :-)

Cheers.
>For mod, you have to do division

What i meant was to calculate the mod value,you'd have to perform the *division* operation(not necessarily IDIV),whether it does so by IDIV or SAR is irrevelant.
Run this test program. Div is always faster than mod on my PC. When in doubt, I always use a timed benchmark.

------------------------------------------------------------------------------------------------------------------------
#include <iostream>
#include <time.h>
using namespace std;

int repetitions=10000;

clock_t start_time;                        // time at test start
clock_t time_elapsed;                  // time at test end
int c;

void main()
{
      start_time=clock();                  // test mod time

      for(int i=1; i<repetitions; i++)
      {
            for(int j=1; j<repetitions; j++)
            {
                  c = i%j;      
            }
      }
      time_elapsed = clock()-start_time;
      
      cout << "total time for mod: " << time_elapsed << endl;


      start_time=clock();                  // test div time

      for(i=1; i<repetitions; i++)
      {
            for(int j=1; j<repetitions; j++)
            {
                  c = i/j;      
            }
      }
      time_elapsed = clock()-start_time;
      
      cout << "total time for div: " << time_elapsed << endl;

}
---------------------------------------------------------------------------------------------
oops you dont need the "using namespace" heh. Anyway, no amount of reading theory will help you with really specific performance questions like this (at least IMHO). Make test programs =)
anupvijay,

Assuming IA32...

>> Is IDIV faster or SAR ?

..SAR is much faster.

>> Or is AND faster, if only MOD is required ?

..there is no MOD instruction.  Generally IDIV/DIV is used for integer modulo and 'AND' is much faster.

>> May be I am talking of CPU cycles here.....

Yes, clock cycles.

Fippy_Darkpaw,

>> ..no amount of reading theory will help you with really specific performance questions like this

Though benchmarking does have its place in comparative perfomance, I'm afraid you're sorely underestimating the value of reading.

ankuratvb,

Adding to my previous post on fprem, I understand that fprem rarely requires multiple execution so the main reason floating point division is faster than a floating point modulo operation is that the modulo requires a division, multiplication and a subtraction as pointed out earlier by stefan73.

b.
Fippy_Darkpaw,

I had tried timing programs on this but i got variable results that couldnt conclusively say which is faster.

Try running your program multiple times and see the results:

Here is what i got:

reps=10000
total time for mod: 495
total time for div: 504
total time for mod: 495
total time for div: 492                                                        
total time for mod: 497                                                        
total time for div: 499                                                        

reps=100
total time for mod: 5                                                          
total time for div: 5                                                          
total time for mod: 5                                                          
total time for div: 5                                                          

reps=5000
total time for mod: 128                                                        
total time for div: 123                                                        
total time for mod: 124                                                        
total time for div: 123                                                        
total time for mod: 125                                                        
total time for div: 124                                                        
total time for mod: 123                                                        
total time for div: 123                                                        

reps=10000
total time for mod: 497                                                        
total time for div: 495                                                        
total time for mod: 497                                                        
total time for div: 494                                                        
total time for mod: 493                                                        
total time for div: 493                                                        
                                                                               
>reps=100
should be
reps=1000
The only way to find out what's happening is to disassemble the executable, or to generate assembly. Probably, most likely, in both cases IDIV will be used, and the total number of instructions executed won't be different. If there is a difference, it can only be that additional tests are required for negative numbers or so. In this case, differences are only due to the OS, handling interrupts etc. when the loop is busy. If run under DOS, the results should be more reliable.

And don't use an optimizing compiler, a good one might remove these loops completely since the results are discarded immediately so the loops can be considered useless.

By the way, do you think that the next line is a valid C statement?
    12345;
>12345;
Yes.

The compiler treats the 12345 as a constant value.

All of these will work:
'a';
123.4;
"abcdef";

The compiler would create temp. variables for these constant values and on replacing would look something like:

Taking 12345; as an example:
the compiler 'd create a temp variable say temp_t.

So,the statement would become:

temp_t;

which is a perfectly valid C statement.
Don't confuse the language C and a compiler for the language C. Indeed
    12345;
is a valid C statement, for it is an expression followed by a semicolon, or, in BNF, it is accepted according to the rule
    <expression> ';'
just like in fact a function call like
    func();
Since all functions return a result, that may be discarded, this line is also an expression followed by a semicolon, hence a valid statement.

At least, in the original C (K&R) language. The current C language, with a lot more typing rules in it, might refuse the statement or at least warn about it. Another ideosyncrasy: I suppose nobody uses the comma as statement separator? Yeah, only in for-statements probably, but the following lines are also valid in C:
    12345,6789;
    12345+6789;

:)
--- Fippy_Darkpaw,
---
---- I had tried timing programs on this but i got variable
---- results that couldnt conclusively say  which is faster.

Wow what comp did you run that on? I ran it on my P4 1.8 laptop with 512 RAM. Division is always substantially faster (like 20% or more). DIv vs. mod must be architecture dependent as well.

Oh yeah. And what compiler did you use? I used VC++ 6.0. Perhaps a different compiler optimized differently?
I changed the code above to fit a standard C compiler. I ran it on Linux, but the results were unpredictable because of the other processes running.

Nevertheless, did you try to assemble the program only? I.e. cc -S bookmark.c or comparable? If you look at the code generated for the div or the mod, it looks like this (Linux):

For the div:
.L7:
      movl      -8(%ebp), %eax
      cmpl      repetitions, %eax
      jle      .L10
      leal      -4(%ebp), %eax
      incl      (%eax)
      jmp      .L3
      .p2align 2
.L10:
      movl      -4(%ebp), %edx
      leal      -8(%ebp), %eax
      movl      %eax, -12(%ebp)
      movl      %edx, %eax
      movl      -12(%ebp), %ecx
      cltd
      idivl      (%ecx)
      movl      %edx, c
      leal      -8(%ebp), %eax
      incl      (%eax)
      jmp      .L7

And for the mod:
.L15:
      movl      -8(%ebp), %eax
      cmpl      repetitions, %eax
      jle      .L18
      leal      -4(%ebp), %eax
      incl      (%eax)
      jmp      .L11
      .p2align 2
.L18:
      movl      -4(%ebp), %edx
      leal      -8(%ebp), %eax
      movl      %eax, -12(%ebp)
      movl      %edx, %eax
      movl      -12(%ebp), %ecx
      cltd
      idivl      (%ecx)
      movl      %eax, c
      leal      -8(%ebp), %eax
      incl      (%eax)
      jmp      .L15

What's the difference? Nothing! Except for two move instructions, but they are absolutely equivalent. Please assemble your code only, and study the differences. I'm not convinced anyway ;)

Just a suggestion: put another outer loop over both loops, so you can eliminate program startup-stuff (if any).