• C

Setting precision

I've been learning C since a couple of weeks.
I have to write down a program that makes calculations with many math
functions (mainly sin and cos), using a 3-byte floating point: 8 bits for
exponent, 1 for sign and 15 for mantissa.
How can I set this precision?

Many thanks.

Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Why not just use float or double as your data types?

pietropaoloAuthor Commented:
My goal is to calculate the error I make using a 3-bytes floating
point in a sine/cosine equation instead of an 8-bytes floating point.
The online help of my compiler reports that sin and cos functions accept only an 8-bytes double as argument.
So, what should I do?


The math routines typically return a double as an return value, and take one (or more) arguments that are also doubles.  

You can use casting to convert your 3-byte floating point number to the double argument required by the math routines, and use casting to force the result to a 3-byte floating point number.

What your question does not indicate is how you intend to store the 3-byte floating point value. Assume you have typedef'd it as FLOAT_3BYTE.

Then the equation y = sin (x) would be evaluated in 3-byte floating points in C as:

FLOAT_3BYTE     x, y;

y = sin ( doule)x);

This means that x (the 3-byte float you construct) would be converted to a double, the sine routine called, and a double result returned. Since y is of type FLOAT_3BYTE, C would automatically  convert the double result  to a FLOAT_3BYTE for you.

If you don't like all this casting, then an alternative is to  write 'wrapper' functions around the math routines that deal only with the FLOAT_3BYTE data type. For example, here is a sine routine that takes and returns the 3-byte floating type:

FLOAT_3BYTE sin_3byte ( FLOAT_3BYTE x)
    return sin ((FLOAT_3BYTE) x);

I hope this helps. MK


Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
pietropaoloAuthor Commented:
Thank you very much for your answer, Mjkajen.

Now the problem is: how can I define the 3-bytes floating point
type FLOAT_3BYTE ?

Many thanks again!


Here is some code that constructs a 3-byte floating point number and stores it in a double. Please see the notes at the end.

// precision.c

#include <stdio.h>
#include <assert.h>
#include <float.h>
#include <math.h>

/* Prototypes */
double Build3ByteFloat (int iSign, int iExponent, int iMantissa);
void   report (int iSign, int iExponent, int iMantissa);

void main (void)
      report (1, 0, 0);      
      report (1, 1, 0);
      report (1, 0, 1);

      report (1, 1, 10);
      report (1, 1, 15);

/* Routine to build a 3-byte floating point number that is
** stored in a double.
** Input args:
** iSign = 1 bit (-1 or 1)
** iMantissa = 15 bit number
** iExponent = 8 bit number
** The result is returned as a double.

double Build3ByteFloat (int iSign, int iMantissa,  int iExponent)
      ** Check that the input arguments will, in fact, fit into
      ** a double for this machine.
      assert (iMantissa <= DBL_MANT_DIG);
      assert (iSign == -1 || iSign == 1);
      assert ((double) abs(iExponent) <= DBL_MAX_EXP);

      ** Use the ldexp () function to do all the work.
      return iSign * ldexp ((double) iMantissa, iExponent);

** Routine to compute and display a 3-byte floating point
** number for debugging purposes.
void report (int iSign, int iExponent, int iMantissa)
      double            dResult;

      dResult = Build3ByteFloat (iSign, iExponent, iMantissa);
      printf ("Result for %d, %d, %d, is %e\n",
                  iSign, iExponent, iMantissa, dResult);


The above code demonstrates simulating a 3-byte float inside a double. There a subtle problems with this approach. Basically, on the machine I'm using, a double is "bigger" than the 3-byte float. Suppose I add together the  two largest 3-byte floating point numbers possible. Technically, this should cause an overflow, however, since these numbers are stored in doubles, an overflow will probably NOT occur.  This will therefore give you accurarcy that is not possible with pure 3-byte floating point numbers.

So, although I've shown how to simulate 3-byte floating point numbers with doubles, this may not be suitable for your task.

To accurately simulate arithmetic that is different from the native machine, one must also provide the addition, subtractioin, mult, and division operations. This is a lot of work. For example, one would provide an add function that would "know" how to add two 3-byte floating point numbers, and it would know all of the overflow rules. The same goes for subtraction, mult, and division (and, of course, sin, cos, etc.).

Again, this may not be what you need, but it's the 'correct' approach. Please don't think that when a 3-byte float is simulated using a double, that arithmetic performed on these double will reflect the precision of the 3-byte floats: it will only reflect the precision available  in the underlying simulation.

There are arithmetic packages that available to simulate aritrary precision, but these tend to be targeted to large, or exact, precision.



 so when the 3-byte floating point number would "overflow", the double representation
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.