Solved

# Setting precision

Posted on 1997-07-04
255 Views
Hi.
I've been learning C since a couple of weeks.
I have to write down a program that makes calculations with many math
functions (mainly sin and cos), using a 3-byte floating point: 8 bits for
exponent, 1 for sign and 15 for mantissa.
How can I set this precision?

Many thanks.

Pietropaolo
0
Question by:pietropaolo
• 2
• 2

LVL 5

Expert Comment

Why not just use float or double as your data types?

-julio
0

Author Comment

My goal is to calculate the error I make using a 3-bytes floating
point in a sine/cosine equation instead of an 8-bytes floating point.
The online help of my compiler reports that sin and cos functions accept only an 8-bytes double as argument.
So, what should I do?

Thanks.
0

Accepted Solution

mjkajen earned 70 total points

The math routines typically return a double as an return value, and take one (or more) arguments that are also doubles.

You can use casting to convert your 3-byte floating point number to the double argument required by the math routines, and use casting to force the result to a 3-byte floating point number.

What your question does not indicate is how you intend to store the 3-byte floating point value. Assume you have typedef'd it as FLOAT_3BYTE.

Then the equation y = sin (x) would be evaluated in 3-byte floating points in C as:

FLOAT_3BYTE     x, y;

y = sin ( doule)x);

This means that x (the 3-byte float you construct) would be converted to a double, the sine routine called, and a double result returned. Since y is of type FLOAT_3BYTE, C would automatically  convert the double result  to a FLOAT_3BYTE for you.

If you don't like all this casting, then an alternative is to  write 'wrapper' functions around the math routines that deal only with the FLOAT_3BYTE data type. For example, here is a sine routine that takes and returns the 3-byte floating type:

FLOAT_3BYTE sin_3byte ( FLOAT_3BYTE x)
{
return sin ((FLOAT_3BYTE) x);
}

I hope this helps. MK

0

Author Comment

Now the problem is: how can I define the 3-bytes floating point
type FLOAT_3BYTE ?

Many thanks again!

Pietropaolo.

0

Expert Comment

Here is some code that constructs a 3-byte floating point number and stores it in a double. Please see the notes at the end.

// precision.c

#include <stdio.h>
#include <assert.h>
#include <float.h>
#include <math.h>

/* Prototypes */
double Build3ByteFloat (int iSign, int iExponent, int iMantissa);
void   report (int iSign, int iExponent, int iMantissa);

void main (void)
{
report (1, 0, 0);
report (1, 1, 0);
report (1, 0, 1);

report (1, 1, 10);
report (1, 1, 15);
}

/*
/* Routine to build a 3-byte floating point number that is
** stored in a double.
** Input args:
** iSign = 1 bit (-1 or 1)
** iMantissa = 15 bit number
** iExponent = 8 bit number
**
** The result is returned as a double.
*/

double Build3ByteFloat (int iSign, int iMantissa,  int iExponent)
{
/*
** Check that the input arguments will, in fact, fit into
** a double for this machine.
*/
assert (iMantissa <= DBL_MANT_DIG);
assert (iSign == -1 || iSign == 1);
assert ((double) abs(iExponent) <= DBL_MAX_EXP);

/*
** Use the ldexp () function to do all the work.
*/
return iSign * ldexp ((double) iMantissa, iExponent);
}

/*
** Routine to compute and display a 3-byte floating point
** number for debugging purposes.
*/
void report (int iSign, int iExponent, int iMantissa)
{
double            dResult;

dResult = Build3ByteFloat (iSign, iExponent, iMantissa);

printf ("Result for %d, %d, %d, is %e\n",
iSign, iExponent, iMantissa, dResult);
}

NOTES:

The above code demonstrates simulating a 3-byte float inside a double. There a subtle problems with this approach. Basically, on the machine I'm using, a double is "bigger" than the 3-byte float. Suppose I add together the  two largest 3-byte floating point numbers possible. Technically, this should cause an overflow, however, since these numbers are stored in doubles, an overflow will probably NOT occur.  This will therefore give you accurarcy that is not possible with pure 3-byte floating point numbers.

So, although I've shown how to simulate 3-byte floating point numbers with doubles, this may not be suitable for your task.

To accurately simulate arithmetic that is different from the native machine, one must also provide the addition, subtractioin, mult, and division operations. This is a lot of work. For example, one would provide an add function that would "know" how to add two 3-byte floating point numbers, and it would know all of the overflow rules. The same goes for subtraction, mult, and division (and, of course, sin, cos, etc.).

Again, this may not be what you need, but it's the 'correct' approach. Please don't think that when a 3-byte float is simulated using a double, that arithmetic performed on these double will reflect the precision of the 3-byte floats: it will only reflect the precision available  in the underlying simulation.

There are arithmetic packages that available to simulate aritrary precision, but these tend to be targeted to large, or exact, precision.

MK

so when the 3-byte floating point number would "overflow", the double representation
0

## Featured Post

An Outlet in Cocoa is a persistent reference to a GUI control; it connects a property (a variable) to a control.  For example, it is common to create an Outlet for the text field GUI control and change the text that appears in this field via that Ou…
Windows programmers of the C/C++ variety, how many of you realise that since Window 9x Microsoft has been lying to you about what constitutes Unicode (http://en.wikipedia.org/wiki/Unicode)? They will have you believe that Unicode requires you to use…
The goal of this video is to provide viewers with basic examples to understand and use structures in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use while-loops in the C programming language.