• C

# Is % faster than /

I just wanted to know:

Is the mod operation on integers faster than division on integers?
And the reasons for either answer.

Just out of curiosity.
LVL 9
###### Who is Participating?

Commented:
ankuratvb,

That is.. they are the same in terms of speed.  For unsigned integers div is used for both operations.

b.
0

Commented:
ankuratvb,

No, not on IA32.  They both use idiv.  The modulo goes to edx and quotient goes to eax.

b.

0

Commented:
Just curious......
Even a SAR will do division by shifting the bits right......correct ?

When is an IDIV used and when is a SAR used ?

Cheers
0

Author Commented:
Thats what i was also thinking.For mod,you have to do division.

Would it also be the same for floating point numbers as well.
Other than the overhead of the function call for fmod(),would they both execute at the same speed.
0

Commented:
anupvijay,

>> Even a SAR will do division by shifting the bits right......correct ?

Yes but only for divisors that are powers of two.

>> When is an IDIV used and when is a SAR used ?

SAR/SHR is used when division is by a divisor that is a power of two and it is an immediate value (constant) or for the '>>' operator.  SAR is used for signed integers.  SHR is used for unsigned integers.

b.
0

Commented:
ankuratvb,

>> Would it also be the same for floating point numbers as well.

No, fprem determines a 'partial remainder' through iterative subtraction and may require multiple execution. While division is done with a single instruction, fdiv.  In other words, division will likely be faster.

b.
0

Groupware ConsultantCommented:
Mod always with IDIV?? I thought that a%32 is equivalent to a&31...
0

Commented:

Be careful, doing mod or div with masks and shifts works a bit differently for negative numbers!

0

Author Commented:
Thanx.I have one clarification.

I couldnt get this part:

>*fprem* determines a 'partial remainder' through iterative subtraction and may require >multiple execution.

0

Commented:
sjef_bosman,

Yes, you are correct.  With immediate values that are powers of 2, the compiler can optimize and use 'bitwise and'.  Similarly, for arithmetic (integer) operations '*' and '/' that involve immediate values that are powers of two, the compiler can optimize.  However, this is not possible for the general case.

b.
0

Commented:
Hi sjef_bosman,
> Mod always with IDIV?? I thought that a%32 is equivalent to a&31...

That's an optimizer issue. Sure it replaces those by & (%) and >> (/).

Modulo is basically the same as

#define MOD(a,b) (a-(b*(a/b)))

...so division is faster.

Cheers,
Stefan
0

Commented:
ankuratvb,

// fprem, is implemented something like the following:

bool fprem(float *a, float b) {
int d = a.exponent - b.exponent; // pseudo-code
if (d < 64) {
*a -=  b*(int)(a/b);
return 0;
} else {
*a -= b*(int)(a/b); // not exact.. see intel manual for exact formula
return 1;
}
}

// so fmod may need to execute fprem more than once.

float fmod (float a, float b) {
float r;
r = a;
while (fprem(&r,b));
return r;
}

main () {
float a,b,rem;
a = VERYBIG;
b = VERYSMALL;
rem = fmod(a,b);
}

b.
0

Groupware ConsultantCommented:
stefan73,

Division faster than AND? What can be faster than an AND in this case? Microcode division stuff is usually a repeated subtraction, the MOD comes for free.

Btw, I only reacted on the statement that "For mod, you have to do division." Just meant to show that if DIV can be done by shifting, MOD can be done using AND.

Sjef
0

Commented:
Assuming that divisor is in powers of 2

Is IDIV faster or SAR ?
Or is AND faster, if only MOD is required ?

I am not an expert in low level programming...so if that looks stupid please ignore :-))
May be I am talking of CPU cycles here.....but I do not know :-)

Cheers.
0

Author Commented:
>For mod, you have to do division

What i meant was to calculate the mod value,you'd have to perform the *division* operation(not necessarily IDIV),whether it does so by IDIV or SAR is irrevelant.
0

Commented:
Run this test program. Div is always faster than mod on my PC. When in doubt, I always use a timed benchmark.

------------------------------------------------------------------------------------------------------------------------
#include <iostream>
#include <time.h>
using namespace std;

int repetitions=10000;

clock_t start_time;                        // time at test start
clock_t time_elapsed;                  // time at test end
int c;

void main()
{
start_time=clock();                  // test mod time

for(int i=1; i<repetitions; i++)
{
for(int j=1; j<repetitions; j++)
{
c = i%j;
}
}
time_elapsed = clock()-start_time;

cout << "total time for mod: " << time_elapsed << endl;

start_time=clock();                  // test div time

for(i=1; i<repetitions; i++)
{
for(int j=1; j<repetitions; j++)
{
c = i/j;
}
}
time_elapsed = clock()-start_time;

cout << "total time for div: " << time_elapsed << endl;

}
---------------------------------------------------------------------------------------------
0

Commented:
oops you dont need the "using namespace" heh. Anyway, no amount of reading theory will help you with really specific performance questions like this (at least IMHO). Make test programs =)
0

Commented:
anupvijay,

Assuming IA32...

>> Is IDIV faster or SAR ?

..SAR is much faster.

>> Or is AND faster, if only MOD is required ?

..there is no MOD instruction.  Generally IDIV/DIV is used for integer modulo and 'AND' is much faster.

>> May be I am talking of CPU cycles here.....

Yes, clock cycles.

Fippy_Darkpaw,

Though benchmarking does have its place in comparative perfomance, I'm afraid you're sorely underestimating the value of reading.

ankuratvb,

Adding to my previous post on fprem, I understand that fprem rarely requires multiple execution so the main reason floating point division is faster than a floating point modulo operation is that the modulo requires a division, multiplication and a subtraction as pointed out earlier by stefan73.

b.
0

Author Commented:
Fippy_Darkpaw,

I had tried timing programs on this but i got variable results that couldnt conclusively say which is faster.

Try running your program multiple times and see the results:

Here is what i got:

reps=10000
total time for mod: 495
total time for div: 504
total time for mod: 495
total time for div: 492
total time for mod: 497
total time for div: 499

reps=100
total time for mod: 5
total time for div: 5
total time for mod: 5
total time for div: 5

reps=5000
total time for mod: 128
total time for div: 123
total time for mod: 124
total time for div: 123
total time for mod: 125
total time for div: 124
total time for mod: 123
total time for div: 123

reps=10000
total time for mod: 497
total time for div: 495
total time for mod: 497
total time for div: 494
total time for mod: 493
total time for div: 493

0

Author Commented:
>reps=100
should be
reps=1000
0

Groupware ConsultantCommented:
The only way to find out what's happening is to disassemble the executable, or to generate assembly. Probably, most likely, in both cases IDIV will be used, and the total number of instructions executed won't be different. If there is a difference, it can only be that additional tests are required for negative numbers or so. In this case, differences are only due to the OS, handling interrupts etc. when the loop is busy. If run under DOS, the results should be more reliable.

And don't use an optimizing compiler, a good one might remove these loops completely since the results are discarded immediately so the loops can be considered useless.

By the way, do you think that the next line is a valid C statement?
12345;
0

Author Commented:
>12345;
Yes.

The compiler treats the 12345 as a constant value.

All of these will work:
'a';
123.4;
"abcdef";

The compiler would create temp. variables for these constant values and on replacing would look something like:

Taking 12345; as an example:
the compiler 'd create a temp variable say temp_t.

So,the statement would become:

temp_t;

which is a perfectly valid C statement.
0

Groupware ConsultantCommented:
Don't confuse the language C and a compiler for the language C. Indeed
12345;
is a valid C statement, for it is an expression followed by a semicolon, or, in BNF, it is accepted according to the rule
<expression> ';'
just like in fact a function call like
func();
Since all functions return a result, that may be discarded, this line is also an expression followed by a semicolon, hence a valid statement.

At least, in the original C (K&R) language. The current C language, with a lot more typing rules in it, might refuse the statement or at least warn about it. Another ideosyncrasy: I suppose nobody uses the comma as statement separator? Yeah, only in for-statements probably, but the following lines are also valid in C:
12345,6789;
12345+6789;

:)
0

Commented:
--- Fippy_Darkpaw,
---
---- I had tried timing programs on this but i got variable
---- results that couldnt conclusively say  which is faster.

Wow what comp did you run that on? I ran it on my P4 1.8 laptop with 512 RAM. Division is always substantially faster (like 20% or more). DIv vs. mod must be architecture dependent as well.

0

Commented:
Oh yeah. And what compiler did you use? I used VC++ 6.0. Perhaps a different compiler optimized differently?
0

Groupware ConsultantCommented:
I changed the code above to fit a standard C compiler. I ran it on Linux, but the results were unpredictable because of the other processes running.

Nevertheless, did you try to assemble the program only? I.e. cc -S bookmark.c or comparable? If you look at the code generated for the div or the mod, it looks like this (Linux):

For the div:
.L7:
movl      -8(%ebp), %eax
cmpl      repetitions, %eax
jle      .L10
leal      -4(%ebp), %eax
incl      (%eax)
jmp      .L3
.p2align 2
.L10:
movl      -4(%ebp), %edx
leal      -8(%ebp), %eax
movl      %eax, -12(%ebp)
movl      %edx, %eax
movl      -12(%ebp), %ecx
cltd
idivl      (%ecx)
movl      %edx, c
leal      -8(%ebp), %eax
incl      (%eax)
jmp      .L7

And for the mod:
.L15:
movl      -8(%ebp), %eax
cmpl      repetitions, %eax
jle      .L18
leal      -4(%ebp), %eax
incl      (%eax)
jmp      .L11
.p2align 2
.L18:
movl      -4(%ebp), %edx
leal      -8(%ebp), %eax
movl      %eax, -12(%ebp)
movl      %edx, %eax
movl      -12(%ebp), %ecx
cltd
idivl      (%ecx)
movl      %eax, c
leal      -8(%ebp), %eax
incl      (%eax)
jmp      .L15

What's the difference? Nothing! Except for two move instructions, but they are absolutely equivalent. Please assemble your code only, and study the differences. I'm not convinced anyway ;)

Just a suggestion: put another outer loop over both loops, so you can eliminate program startup-stuff (if any).
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.