Solved

Is % faster than /

Posted on 2004-04-22
26
473 Views
Last Modified: 2010-04-15
I just wanted to know:

Is the mod operation on integers faster than division on integers?
And the reasons for either answer.

Just out of curiosity.
0
Comment
Question by:ankuratvb
  • 7
  • 6
  • 5
  • +4
26 Comments
 
LVL 4

Expert Comment

by:booki
ID: 10887435
ankuratvb,

No, not on IA32.  They both use idiv.  The modulo goes to edx and quotient goes to eax.

b.


0
 
LVL 2

Expert Comment

by:anupvijay
ID: 10887495
Just curious......
Even a SAR will do division by shifting the bits right......correct ?

When is an IDIV used and when is a SAR used ?

Cheers
0
 
LVL 4

Accepted Solution

by:
booki earned 30 total points
ID: 10887496
ankuratvb,

That is.. they are the same in terms of speed.  For unsigned integers div is used for both operations.

b.
0
 
LVL 9

Author Comment

by:ankuratvb
ID: 10887545
Thats what i was also thinking.For mod,you have to do division.

Would it also be the same for floating point numbers as well.
Other than the overhead of the function call for fmod(),would they both execute at the same speed.
0
 
LVL 4

Expert Comment

by:booki
ID: 10887547
anupvijay,

>> Even a SAR will do division by shifting the bits right......correct ?

Yes but only for divisors that are powers of two.

>> When is an IDIV used and when is a SAR used ?

SAR/SHR is used when division is by a divisor that is a power of two and it is an immediate value (constant) or for the '>>' operator.  SAR is used for signed integers.  SHR is used for unsigned integers.

b.
0
 
LVL 4

Expert Comment

by:booki
ID: 10887696
ankuratvb,

>> Would it also be the same for floating point numbers as well.

No, fprem determines a 'partial remainder' through iterative subtraction and may require multiple execution. While division is done with a single instruction, fdiv.  In other words, division will likely be faster.

b.
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 10887714
Mod always with IDIV?? I thought that a%32 is equivalent to a&31...
0
 
LVL 22

Expert Comment

by:grg99
ID: 10887731

Be careful, doing mod or div with masks and shifts works a bit differently for negative numbers!

0
 
LVL 9

Author Comment

by:ankuratvb
ID: 10887734
Thanx.I have one clarification.

I couldnt get this part:

>*fprem* determines a 'partial remainder' through iterative subtraction and may require >multiple execution.

0
 
LVL 4

Expert Comment

by:booki
ID: 10887796
sjef_bosman,

Yes, you are correct.  With immediate values that are powers of 2, the compiler can optimize and use 'bitwise and'.  Similarly, for arithmetic (integer) operations '*' and '/' that involve immediate values that are powers of two, the compiler can optimize.  However, this is not possible for the general case.

b.
0
 
LVL 12

Assisted Solution

by:stefan73
stefan73 earned 30 total points
ID: 10887994
Hi sjef_bosman,
> Mod always with IDIV?? I thought that a%32 is equivalent to a&31...

That's an optimizer issue. Sure it replaces those by & (%) and >> (/).

Modulo is basically the same as

#define MOD(a,b) (a-(b*(a/b)))

...so division is faster.

Cheers,
Stefan
0
 
LVL 4

Expert Comment

by:booki
ID: 10888023
ankuratvb,

// fprem, is implemented something like the following:

bool fprem(float *a, float b) {
    int d = a.exponent - b.exponent; // pseudo-code
    if (d < 64) {
        *a -=  b*(int)(a/b);
        return 0;
    } else {
        *a -= b*(int)(a/b); // not exact.. see intel manual for exact formula
        return 1;
    }
}

// so fmod may need to execute fprem more than once.

float fmod (float a, float b) {
    float r;
    r = a;
    while (fprem(&r,b));
    return r;
}

main () {
    float a,b,rem;
    a = VERYBIG;
    b = VERYSMALL;
    rem = fmod(a,b);
}

b.
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 10888476
stefan73,

Division faster than AND? What can be faster than an AND in this case? Microcode division stuff is usually a repeated subtraction, the MOD comes for free.

Btw, I only reacted on the statement that "For mod, you have to do division." Just meant to show that if DIV can be done by shifting, MOD can be done using AND.

Sjef
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 2

Expert Comment

by:anupvijay
ID: 10888675
Assuming that divisor is in powers of 2

Is IDIV faster or SAR ?
Or is AND faster, if only MOD is required ?

I am not an expert in low level programming...so if that looks stupid please ignore :-))
May be I am talking of CPU cycles here.....but I do not know :-)

Cheers.
0
 
LVL 9

Author Comment

by:ankuratvb
ID: 10888978
>For mod, you have to do division

What i meant was to calculate the mod value,you'd have to perform the *division* operation(not necessarily IDIV),whether it does so by IDIV or SAR is irrevelant.
0
 
LVL 4

Expert Comment

by:Fippy_Darkpaw
ID: 10892717
Run this test program. Div is always faster than mod on my PC. When in doubt, I always use a timed benchmark.

------------------------------------------------------------------------------------------------------------------------
#include <iostream>
#include <time.h>
using namespace std;

int repetitions=10000;

clock_t start_time;                        // time at test start
clock_t time_elapsed;                  // time at test end
int c;

void main()
{
      start_time=clock();                  // test mod time

      for(int i=1; i<repetitions; i++)
      {
            for(int j=1; j<repetitions; j++)
            {
                  c = i%j;      
            }
      }
      time_elapsed = clock()-start_time;
      
      cout << "total time for mod: " << time_elapsed << endl;


      start_time=clock();                  // test div time

      for(i=1; i<repetitions; i++)
      {
            for(int j=1; j<repetitions; j++)
            {
                  c = i/j;      
            }
      }
      time_elapsed = clock()-start_time;
      
      cout << "total time for div: " << time_elapsed << endl;

}
---------------------------------------------------------------------------------------------
0
 
LVL 4

Expert Comment

by:Fippy_Darkpaw
ID: 10892767
oops you dont need the "using namespace" heh. Anyway, no amount of reading theory will help you with really specific performance questions like this (at least IMHO). Make test programs =)
0
 
LVL 4

Expert Comment

by:booki
ID: 10894825
anupvijay,

Assuming IA32...

>> Is IDIV faster or SAR ?

..SAR is much faster.

>> Or is AND faster, if only MOD is required ?

..there is no MOD instruction.  Generally IDIV/DIV is used for integer modulo and 'AND' is much faster.

>> May be I am talking of CPU cycles here.....

Yes, clock cycles.

Fippy_Darkpaw,

>> ..no amount of reading theory will help you with really specific performance questions like this

Though benchmarking does have its place in comparative perfomance, I'm afraid you're sorely underestimating the value of reading.

ankuratvb,

Adding to my previous post on fprem, I understand that fprem rarely requires multiple execution so the main reason floating point division is faster than a floating point modulo operation is that the modulo requires a division, multiplication and a subtraction as pointed out earlier by stefan73.

b.
0
 
LVL 9

Author Comment

by:ankuratvb
ID: 10895853
Fippy_Darkpaw,

I had tried timing programs on this but i got variable results that couldnt conclusively say which is faster.

Try running your program multiple times and see the results:

Here is what i got:

reps=10000
total time for mod: 495
total time for div: 504
total time for mod: 495
total time for div: 492                                                        
total time for mod: 497                                                        
total time for div: 499                                                        

reps=100
total time for mod: 5                                                          
total time for div: 5                                                          
total time for mod: 5                                                          
total time for div: 5                                                          

reps=5000
total time for mod: 128                                                        
total time for div: 123                                                        
total time for mod: 124                                                        
total time for div: 123                                                        
total time for mod: 125                                                        
total time for div: 124                                                        
total time for mod: 123                                                        
total time for div: 123                                                        

reps=10000
total time for mod: 497                                                        
total time for div: 495                                                        
total time for mod: 497                                                        
total time for div: 494                                                        
total time for mod: 493                                                        
total time for div: 493                                                        
                                                                               
0
 
LVL 9

Author Comment

by:ankuratvb
ID: 10895886
>reps=100
should be
reps=1000
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 10897411
The only way to find out what's happening is to disassemble the executable, or to generate assembly. Probably, most likely, in both cases IDIV will be used, and the total number of instructions executed won't be different. If there is a difference, it can only be that additional tests are required for negative numbers or so. In this case, differences are only due to the OS, handling interrupts etc. when the loop is busy. If run under DOS, the results should be more reliable.

And don't use an optimizing compiler, a good one might remove these loops completely since the results are discarded immediately so the loops can be considered useless.

By the way, do you think that the next line is a valid C statement?
    12345;
0
 
LVL 9

Author Comment

by:ankuratvb
ID: 10907825
>12345;
Yes.

The compiler treats the 12345 as a constant value.

All of these will work:
'a';
123.4;
"abcdef";

The compiler would create temp. variables for these constant values and on replacing would look something like:

Taking 12345; as an example:
the compiler 'd create a temp variable say temp_t.

So,the statement would become:

temp_t;

which is a perfectly valid C statement.
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 10908182
Don't confuse the language C and a compiler for the language C. Indeed
    12345;
is a valid C statement, for it is an expression followed by a semicolon, or, in BNF, it is accepted according to the rule
    <expression> ';'
just like in fact a function call like
    func();
Since all functions return a result, that may be discarded, this line is also an expression followed by a semicolon, hence a valid statement.

At least, in the original C (K&R) language. The current C language, with a lot more typing rules in it, might refuse the statement or at least warn about it. Another ideosyncrasy: I suppose nobody uses the comma as statement separator? Yeah, only in for-statements probably, but the following lines are also valid in C:
    12345,6789;
    12345+6789;

:)
0
 
LVL 4

Expert Comment

by:Fippy_Darkpaw
ID: 11038184
--- Fippy_Darkpaw,
---
---- I had tried timing programs on this but i got variable
---- results that couldnt conclusively say  which is faster.

Wow what comp did you run that on? I ran it on my P4 1.8 laptop with 512 RAM. Division is always substantially faster (like 20% or more). DIv vs. mod must be architecture dependent as well.

0
 
LVL 4

Expert Comment

by:Fippy_Darkpaw
ID: 11038192
Oh yeah. And what compiler did you use? I used VC++ 6.0. Perhaps a different compiler optimized differently?
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 11038444
I changed the code above to fit a standard C compiler. I ran it on Linux, but the results were unpredictable because of the other processes running.

Nevertheless, did you try to assemble the program only? I.e. cc -S bookmark.c or comparable? If you look at the code generated for the div or the mod, it looks like this (Linux):

For the div:
.L7:
      movl      -8(%ebp), %eax
      cmpl      repetitions, %eax
      jle      .L10
      leal      -4(%ebp), %eax
      incl      (%eax)
      jmp      .L3
      .p2align 2
.L10:
      movl      -4(%ebp), %edx
      leal      -8(%ebp), %eax
      movl      %eax, -12(%ebp)
      movl      %edx, %eax
      movl      -12(%ebp), %ecx
      cltd
      idivl      (%ecx)
      movl      %edx, c
      leal      -8(%ebp), %eax
      incl      (%eax)
      jmp      .L7

And for the mod:
.L15:
      movl      -8(%ebp), %eax
      cmpl      repetitions, %eax
      jle      .L18
      leal      -4(%ebp), %eax
      incl      (%eax)
      jmp      .L11
      .p2align 2
.L18:
      movl      -4(%ebp), %edx
      leal      -8(%ebp), %eax
      movl      %eax, -12(%ebp)
      movl      %edx, %eax
      movl      -12(%ebp), %ecx
      cltd
      idivl      (%ecx)
      movl      %eax, c
      leal      -8(%ebp), %eax
      incl      (%eax)
      jmp      .L15

What's the difference? Nothing! Except for two move instructions, but they are absolutely equivalent. Please assemble your code only, and study the differences. I'm not convinced anyway ;)

Just a suggestion: put another outer loop over both loops, so you can eliminate program startup-stuff (if any).
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

An Outlet in Cocoa is a persistent reference to a GUI control; it connects a property (a variable) to a control.  For example, it is common to create an Outlet for the text field GUI control and change the text that appears in this field via that Ou…
Preface I don't like visual development tools that are supposed to write a program for me. Even if it is Xcode and I can use Interface Builder. Yes, it is a perfect tool and has helped me a lot, mainly, in the beginning, when my programs were small…
The goal of this video is to provide viewers with basic examples to understand opening and writing to files in the C programming language.
The goal of this video is to provide viewers with basic examples to understand how to use strings and some functions related to them in the C programming language.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now