Solved: Is % faster than / | Experts Exchange

ankuratvb,

No, not on IA32. They both use idiv. The modulo goes to edx and quotient goes to eax.

b.

Just curious......
Even a SAR will do division by shifting the bits right......correct ?

When is an IDIV used and when is a SAR used ?

Cheers

ankuratvb

ASKER

Thats what i was also thinking.For mod,you have to do division.

Would it also be the same for floating point numbers as well.
Other than the overhead of the function call for fmod(),would they both execute at the same speed.

booki

anupvijay,

>> Even a SAR will do division by shifting the bits right......correct ?

Yes but only for divisors that are powers of two.

>> When is an IDIV used and when is a SAR used ?

SAR/SHR is used when division is by a divisor that is a power of two and it is an immediate value (constant) or for the '>>' operator. SAR is used for signed integers. SHR is used for unsigned integers.

b.

booki

ankuratvb,

>> Would it also be the same for floating point numbers as well.

No, fprem determines a 'partial remainder' through iterative subtraction and may require multiple execution. While division is done with a single instruction, fdiv. In other words, division will likely be faster.

b.

Sjef Bosman

Mod always with IDIV?? I thought that a%32 is equivalent to a&31...

grg99

Be careful, doing mod or div with masks and shifts works a bit differently for negative numbers!

ankuratvb

ASKER

Thanx.I have one clarification.

I couldnt get this part:

>*fprem* determines a 'partial remainder' through iterative subtraction and may require >multiple execution.

booki

sjef_bosman,

Yes, you are correct. With immediate values that are powers of 2, the compiler can optimize and use 'bitwise and'. Similarly, for arithmetic (integer) operations '*' and '/' that involve immediate values that are powers of two, the compiler can optimize. However, this is not possible for the general case.

b.

booki

ankuratvb,

// fprem, is implemented something like the following:

bool fprem(float *a, float b) {
int d = a.exponent - b.exponent; // pseudo-code
if (d < 64) {
*a -= b*(int)(a/b);
return 0;
} else {
*a -= b*(int)(a/b); // not exact.. see intel manual for exact formula
return 1;
}
}

// so fmod may need to execute fprem more than once.

float fmod (float a, float b) {
float r;
r = a;
while (fprem(&r,b));
return r;
}

main () {
float a,b,rem;
a = VERYBIG;
b = VERYSMALL;
rem = fmod(a,b);
}

b.

Sjef Bosman

stefan73,

Division faster than AND? What can be faster than an AND in this case? Microcode division stuff is usually a repeated subtraction, the MOD comes for free.

Btw, I only reacted on the statement that "For mod, you have to do division." Just meant to show that if DIV can be done by shifting, MOD can be done using AND.

Sjef

anupvijay

Assuming that divisor is in powers of 2

Is IDIV faster or SAR ?
Or is AND faster, if only MOD is required ?

I am not an expert in low level programming...so if that looks stupid please ignore :-))
May be I am talking of CPU cycles here.....but I do not know :-)

Cheers.

ankuratvb

ASKER

>For mod, you have to do division

What i meant was to calculate the mod value,you'd have to perform the *division* operation(not necessarily IDIV),whether it does so by IDIV or SAR is irrevelant.

Fippy_Darkpaw

Run this test program. Div is always faster than mod on my PC. When in doubt, I always use a timed benchmark.

------------------------------------------------------------------------------------------------------------------------
#include <iostream>
#include <time.h>
using namespace std;

int repetitions=10000;

clock_t start_time;                        // time at test start
clock_t time_elapsed;                  // time at test end
int c;

void main()
{
      start_time=clock();                  // test mod time

      for(int i=1; i<repetitions; i++)
      {
            for(int j=1; j<repetitions; j++)
            {
                  c = i%j;
            }
      }
      time_elapsed = clock()-start_time;

      cout << "total time for mod: " << time_elapsed << endl;

      start_time=clock();                  // test div time

      for(i=1; i<repetitions; i++)
      {
            for(int j=1; j<repetitions; j++)
            {
                  c = i/j;
            }
      }
      time_elapsed = clock()-start_time;

      cout << "total time for div: " << time_elapsed << endl;

}
---------------------------------------------------------------------------------------------

Fippy_Darkpaw

oops you dont need the "using namespace" heh. Anyway, no amount of reading theory will help you with really specific performance questions like this (at least IMHO). Make test programs =)

booki

anupvijay,

Assuming IA32...

>> Is IDIV faster or SAR ?

..SAR is much faster.

>> Or is AND faster, if only MOD is required ?

..there is no MOD instruction. Generally IDIV/DIV is used for integer modulo and 'AND' is much faster.

>> May be I am talking of CPU cycles here.....

Yes, clock cycles.

Fippy_Darkpaw,

>> ..no amount of reading theory will help you with really specific performance questions like this

Though benchmarking does have its place in comparative perfomance, I'm afraid you're sorely underestimating the value of reading.

ankuratvb,

Adding to my previous post on fprem, I understand that fprem rarely requires multiple execution so the main reason floating point division is faster than a floating point modulo operation is that the modulo requires a division, multiplication and a subtraction as pointed out earlier by stefan73.

b.

ankuratvb

ASKER

Fippy_Darkpaw,

I had tried timing programs on this but i got variable results that couldnt conclusively say which is faster.

Try running your program multiple times and see the results:

Here is what i got:

reps=10000
total time for mod: 495
total time for div: 504
total time for mod: 495
total time for div: 492
total time for mod: 497
total time for div: 499

reps=100
total time for mod: 5
total time for div: 5
total time for mod: 5
total time for div: 5

reps=5000
total time for mod: 128
total time for div: 123
total time for mod: 124
total time for div: 123
total time for mod: 125
total time for div: 124
total time for mod: 123
total time for div: 123

reps=10000
total time for mod: 497
total time for div: 495
total time for mod: 497
total time for div: 494
total time for mod: 493
total time for div: 493

ankuratvb

ASKER

>reps=100
should be
reps=1000

Sjef Bosman

The only way to find out what's happening is to disassemble the executable, or to generate assembly. Probably, most likely, in both cases IDIV will be used, and the total number of instructions executed won't be different. If there is a difference, it can only be that additional tests are required for negative numbers or so. In this case, differences are only due to the OS, handling interrupts etc. when the loop is busy. If run under DOS, the results should be more reliable.

And don't use an optimizing compiler, a good one might remove these loops completely since the results are discarded immediately so the loops can be considered useless.

By the way, do you think that the next line is a valid C statement?
12345;

ankuratvb

ASKER

>12345;
Yes.

The compiler treats the 12345 as a constant value.

All of these will work:
'a';
123.4;
"abcdef";

The compiler would create temp. variables for these constant values and on replacing would look something like:

Taking 12345; as an example:
the compiler 'd create a temp variable say temp_t.

So,the statement would become:

temp_t;

which is a perfectly valid C statement.

Sjef Bosman

Don't confuse the language C and a compiler for the language C. Indeed
12345;
is a valid C statement, for it is an expression followed by a semicolon, or, in BNF, it is accepted according to the rule
<expression> ';'
just like in fact a function call like
func();
Since all functions return a result, that may be discarded, this line is also an expression followed by a semicolon, hence a valid statement.

At least, in the original C (K&R) language. The current C language, with a lot more typing rules in it, might refuse the statement or at least warn about it. Another ideosyncrasy: I suppose nobody uses the comma as statement separator? Yeah, only in for-statements probably, but the following lines are also valid in C:
12345,6789;
12345+6789;

:)

Fippy_Darkpaw

--- Fippy_Darkpaw,
---
---- I had tried timing programs on this but i got variable
---- results that couldnt conclusively say which is faster.

Wow what comp did you run that on? I ran it on my P4 1.8 laptop with 512 RAM. Division is always substantially faster (like 20% or more). DIv vs. mod must be architecture dependent as well.

Fippy_Darkpaw

Oh yeah. And what compiler did you use? I used VC++ 6.0. Perhaps a different compiler optimized differently?

Sjef Bosman

I changed the code above to fit a standard C compiler. I ran it on Linux, but the results were unpredictable because of the other processes running.

Nevertheless, did you try to assemble the program only? I.e. cc -S bookmark.c or comparable? If you look at the code generated for the div or the mod, it looks like this (Linux):

For the div:
.L7:
      movl      -8(%ebp), %eax
      cmpl      repetitions, %eax
      jle      .L10
      leal      -4(%ebp), %eax
      incl      (%eax)
      jmp      .L3
      .p2align 2
.L10:
      movl      -4(%ebp), %edx
      leal      -8(%ebp), %eax
      movl      %eax, -12(%ebp)
      movl      %edx, %eax
      movl      -12(%ebp), %ecx
      cltd
      idivl      (%ecx)
      movl      %edx, c
      leal      -8(%ebp), %eax
      incl      (%eax)
      jmp      .L7

And for the mod:
.L15:
      movl      -8(%ebp), %eax
      cmpl      repetitions, %eax
      jle      .L18
      leal      -4(%ebp), %eax
      incl      (%eax)
      jmp      .L11
      .p2align 2
.L18:
      movl      -4(%ebp), %edx
      leal      -8(%ebp), %eax
      movl      %eax, -12(%ebp)
      movl      %edx, %eax
      movl      -12(%ebp), %ecx
      cltd
      idivl      (%ecx)
      movl      %eax, c
      leal      -8(%ebp), %eax
      incl      (%eax)
      jmp      .L15

What's the difference? Nothing! Except for two move instructions, but they are absolutely equivalent. Please assemble your code only, and study the differences. I'm not convinced anyway ;)

Just a suggestion: put another outer loop over both loops, so you can eliminate program startup-stuff (if any).