Link to home
Start Free TrialLog in
Avatar of Lescha
Lescha

asked on

Float multiplication - difference between assembly code with and without optimizer

I've recently encountered the following problem. I have a program written in C using Visual Studio 6.0. When run in Release version without optimization, the results of float multiplication are slightly different from those in Release version with Maximize speed.

I tried to look at assembly code, but was unable to find a difference. The problem is that when the program is compiled with optimization it is impossible to debug it, otherwise I would've looked at the registers.

So: is it true that Max Speed Optimizer affects the assembly code? If so, can I turn this specific feature off?

If it will be any help, I will post the relevant piece of source / assembly code by request.

P.S. This is a copy of the same question I posted in other sections. I'm aware of that, so please don't make special comments about it.
Avatar of stefan73
stefan73
Flag of Germany image

Hi Lescha,
> is it true that Max Speed Optimizer affects the assembly code?
Of course! Otherwise there would be no improvement.
But the optimizer should still create FP code that fully complies with IEEE-754. Some compilers have options which explicitly create non-compliant code (such as Sun cc with -fast), but the documentation should say so.

Cheers,
Stefan
Avatar of Lescha
Lescha

ASKER

Okay, okay, I see where my formulation of the question was misleading.
I rephrase:

Is it true that Max Speed Optimizer affects the assembly code which concerns arithmetic operations, and multiplication in particular? If so, can I turn this specific feature off?

Lescha,
Before I say "of course" again, perhaps let me re-phrase your question - I think I know what you're aiming at:

Is it true that Max Speed Optimizer affects the behavior of arithmetic operations, so that the result can differ from non-optimized code?

If that's what you mean: The bahavior of floating-point operations is defined in the IEEE-754 standard (read more at http://babbage.cs.qc.edu/courses/cs341/IEEE-754references.html).

This standard regulates the way floating-point operations are handled. Typical examples of non-complying optimizations are:

double x=12345.56789;
for(i=0;i<99;i++)
    array[i] /= x;

Since multiplications is usually faster than division, the optimizer replaces x by x1:

double x=12345.56789;
double x1=1.0/x;
for(i=0;i<99;i++)
    array[i] *= x1;

The MS documentation MUST mention non-compliant optimizations. Here's an example from Sun's cc man page:
          -fsimple=0
          Permits no simplifying assumptions. Preserves strict
          IEEE 754 conformance.

          -fsimple=1
          Allows conservative simplifications. The resulting code
          does not strictly conform to IEEE 754, but numeric
          results of most programs are unchanged.

          With -fsimple=1, the optimizer can assume the follow-
          ing:
          o  The IEEE 754 default rounding/trapping modes do not
          change after process initialization.
          o Computations producing no visible result other than
          potential floating- point exceptions may be deleted.
          o Computations with Infinity or NaNs as operands need
          not propagate NaNs to their results. For example, x*0
          may be replaced by 0.
          o Computations do not depend on sign of zero.

          With -fsimple=1, the optimizer is not allowed to optim-
          ize completely without regard to roundoff or
          exceptions. In particular, a floating-point computation
          cannot be replaced by one that produces different
          results with rounding modes held constant at run time.

          -fsimple=2
          Permits aggressive floating point optimizations that
          may cause many programs to produce different numeric
          results due to changes in rounding. For example, -fsim-
          ple=2 permits the optimizer to attempt replacing compu-
          tations of x/y in a given loop where y and z are known
          to have constant values, with x*z, where z=1/y is com-
          puted once and saved in a temporary, thereby eliminat-
          ing costly divide operations.

          Even with -fsimple=2, the optimizer still is not per-
          mitted to introduce a floating point exception in a
          program that otherwise produces none.

This very clearly defines boundaries of optimizer behaviour.


Stefan
Avatar of Lescha

ASKER

Yeah, okay, so I guess what I am actually asking is this: how can I retain <i>most</i> of maximize speed options, but bar it from optimizing the arithmetics?
Avatar of Lescha

ASKER

I think I'll just post both assembly codes here for you.
Avatar of Lescha

ASKER

WITHOUT OPTIMIZER

; 346  :                               CurInd.Range = (CurValue.Range - AmbResInput->StripData[NStrip].MinRange)*AmbResInput->MapGrid.InvStep.Range;

      mov      ecx, DWORD PTR ?NStrip@@3KA            ; NStrip
      imul      ecx, 12                              ; 0000000cH
      mov      edx, DWORD PTR _AmbResInput$[ebp]
      fld      DWORD PTR _CurValue$[ebp]
      fsub      DWORD PTR [edx+ecx+3145996]
      mov      eax, DWORD PTR _AmbResInput$[ebp]
      fmul      DWORD PTR [eax+3145984]
      fstp      DWORD PTR _CurInd$[ebp]

; 347  :                               // Calculate the cell index

; 348  :                               IndR = (DWORD)CurInd.Range;

      fld      DWORD PTR _CurInd$[ebp]
      call      __ftol
      mov      DWORD PTR _IndR$[ebp], eax
Avatar of Lescha

ASKER

WITH OPTIMIZER

; 346  :                               CurInd.Range = (CurValue.Range - AmbResInput->StripData[NStrip].MinRange)*AmbResInput->MapGrid.InvStep.Range;

      fld      DWORD PTR _CurValue$[esp+3192]
      fsub      DWORD PTR [esi+3145996]
      fld      ST(0)
      fmul      DWORD PTR [edi+3145984]

; 347  :                               // Calculate the cell index
; 348  :                               IndD = (DWORD)CurInd.Doppler;

      fld      DWORD PTR _CurInd$[esp+3196]
      call      __ftol

; 349  :                               IndR = (DWORD)CurInd.Range;

      fld      ST(0)
      mov      edi, eax
      call      __ftol

Avatar of Lescha

ASKER

Do you see any significant difference?
Or do you need more data?
Lescha,
Optimizing is OK, as long as you get the same results. Are yours different?

Stefan
Avatar of DanRollins
And what does it look like *with* optimization?    And... why does it matter?  What is it about the optimized code that offends you?

-- Dan
Avatar of Lescha

ASKER

Yes! That's what my question is about! The "random fluctuation" beyond the decimal places of real significance are different!

For example, I can get 25611.2345 without an optimizer and 25611.2367 with an optimizer. This would not matter much, but, of course, sometimes it is 123456.9999 in one case and 123457.0001 in the other case, and this, when floor-ed to an integer gives a different result.

So, again: why are the arithmetical ops different with and without the optimizer?
AFAIK, that looks OK. Keep in mind that Intel FPUs use a stack, so commands like fld ST(0) are faster than accessing memory via a pointer. The relevant commands are fsub, fmul and the __ftol call. fld and fstp are just load and store commands.
SOLUTION
Avatar of stefan73
stefan73
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Here is an article about floating point optimizations with VC++:

http://www.microsoft.com/indonesia/msdn/floapoint.asp
Extract: The compiler needs to be called with

cl -fp:precise source.cpp
   or
cl -fp:precise source.cpp

Check if the optimize for speed uses -fp:fast.
Maybe another option would be to generally use 'double' instead of 'float' ... IMO the problems
you see come from the fact that values are taken from memory (as float) in unoptimized code
while they may be taken from FPU's stack (as double) in optimized code ... I think the results
won't differ so extremely when values in memory are even doubles.

ZOPPO
Zoppo,
> use 'double' instead of 'float'
Good point. But I think the decision here is against doubles for saving space. The ASM code above shows that there are pretty big objects on the stack already, so using doubles might not be an option. Or is it, Lescha?

You'd get less noise with doubles.

BTW: Have a look at this nice page here:
http://babbage.cs.qc.edu/courses/cs341/IEEE-754.html

It shows you all the exact binary layout of a double or float you enter.

Stefan
hm ... yes, 'space' is one argument ... but, dealing with numbers like '25611.2345' will lead to problems anyway
with float's precision of at max. 7 significant digits.

ZOPPO
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
grg99,
> there are very very very few programs that need more than 32-bits of precision

For a single value, that's for most cases true. But think about error propagation...

Stefan
Stefan, with respect to http://www.microsoft.com/indonesia/msdn/floapoint.asp 

> Beginning with version 8.0 (Visual C++® "Whidbey"),

That's part of the yet-to-be-released Visual Studio 2005 - see the road map at: http://msdn.microsoft.com/vstudio/productinfo/roadmap.aspx
Odd that the version displayed by cl.exe in .NET 2003 is...

Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86
Copyright (C) Microsoft Corporation 1984-2002. All rights reserved.

....but we all know it as 7.1.
rstaveley,
> version 8.0
Ouch, you're right!

Hmm, maybe the option also works in earlier versions.

Regarding the compiler version: That's probably just the compiler itself, not the Studio.

Stefan
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Lescha

ASKER

Wow! That's a hell of a lot of comments! I just increased the points, otherwise I won't have enough to split between all of you guys.

Now, to answer your questions:

1a) I switch to using floats because it saves both space and time. Space is obvious, time because on a 32-bit machine the same calculations done in floats are much faster than in doubles. That's also the reason why, for instance, I use DWORD or long where a byte or a short might have sufficed.
1b) With me, time is the decisive factor here, I'm talking about a pretty heavy algorithm and I managed to get it down to about 13ms without the optimization. Cannot go to double, it will (he-he) almost double the time. And, for the same reason, I cannot add the to-double and from-double lines of code.

2) I don't think I can replace truncation with rounding. That would solve the problem, of course, but, unfortunately, will give birth to other edge-effect problems.

3) Dan, what pragma would that be? Can you spell it out for me? Thanks!

4) Can I use a non-optimized DLL with an optimized EXE? Won't it create problems during the link stage?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial