asked on

Radixless number representation

I need to store big numbers in a radixless manner to decrease the amount of time required to output the number in the given radix. I am having difficulty finding documents that outline how this can be done. Any help is appreciated,

Cheers!
Exceter

ASKER CERTIFIED SOLUTION

jkr

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

wayside

What do you mean by "radixless"?

All representations of a number have to have a radix, otherwise you just have a collection of bits or digits that have ambiguous meaning.

Exceter

ASKER

>> What do you mean by "radixless"?

I mean stored in a manner that allows the number ot be easily converted from base 2 to an arbitrary base. I have thought of using the change of base formula( logb x = loga x / loga b ) but I think the logarithmic function calls would cost me any speed advantages I would have gained over a radix conversion from base 2 by division.

jkr

>>I mean stored in a manner that allows the number ot be easily converted from base 2

Then, why don't you store it as "Base 2" aka "binary" as I suggested?

Avik Dasgupta

>>I need to store big numbers in a radixless manner

The digits used in ur radixless number will be reflecting the radix. If u want to store a number radixless u have to have ur own representation and explanation of ur stored number. If u want to store large numbers then u can convert and store it in a large base to lessen ur space complexity. Time complexity as such is a little fuzzy. It is only because to save space ur going for "radixless" style. Well I do not call it radixless but as u mentioned in ur second post easy base convertion from base 2, It will be a radix of 2 then.
Afterall all physical representation is binary :)
To save space increase base/radix and store the number in that base along with ur original base and converted base. Write a functions for interconversion and use of converted data.

Avik.

Avik Dasgupta

Hi,
just missed out, why don't u try representing them with strings. It will also save space but will require considerable effort to check the validity of a number in a given base and other mathematical operations such as multiplication, division, square root , etc.

Avik.

jhshukla

basically you want a fast mathod to convert from binary numbers (as they are in the machine) to string representation in any radix you supply. well, it is an O(log(your_number)) algorithm.

number%radix -> most significant digit
number/radix -> number
number%radix -> next significant digit
number/radix -> number
...
repeat until (number < radix)

This algorithm is valid only for positive ints,shorts or longs. modification for negative numbers is straight forward. algorithm for floats and doubles requires conversion to strings and then converting it as we traverse it.

jhshukla

ooops. you want LARGE numbers. why don't you use a struct of four ints. you could pour the overflow to next higher int.
|-- int --|-- int --|-- int --|-- int --|
MSB LSB

MSB = MostSignificantBit
LSB = LeastSignificantBit

Mafalda

You could use the STL bitset

#include <bitset>
using std::bitset;

bitset<256> my_256_bits;

Mafalda

Exceter,
I didn't notice at first that you are the asker ...
I suppose that you already investigated this angle ...

Mafalda

By the way if you use a stream you can easily output the numbers using std::hex, std::dec and std:oct
Do you need it for other bases as well ?

Exceter

ASKER

I am working with big numbers so the standard operations for radix conversions will not work with my data type. At present I am using the modulus approach outlined by jhshukla. I output numbers in any base in the range 2-36. I store the numbers in 32 bit unsigned integer and perform the math in 16 bit to avoid carry logic. I perfrom division by conceptualizing each integer as one radix 4294967295 digit, I ise a similar approach when multiplying. I then perform standard long division using operator/ (This only works if I am dividing by a single digit) When converting to decimal I cheat a little by converting to radix 10000 instead of 10 to save time.

However, when I attpemt to output 10000! in decimal the radix conversion takes about 4 times as long as the time it takes to compute the number, depending on the speed of the computer. Output of radices which are a power of 2 is almost instantaneous. I was hoping for an approach, not necessarily an algorithm, that would reduce the amount of time needed to perform the radix conversion. My thought was that perhaps the number could be stored in a form, possibly an expression, that would allow for faster radix conversions. This may or may not be possible. Although, I am always open to faster algorithms.

Cheers!
Exceter

wayside

Check out the Gnu BigNum ( http://www.swox.com/gmp/home.html ) library, which says the folliong about radix conversions. You can get the source code there, too.

Binary to Radix
Conversions from binary to a power-of-2 radix use a simple and fast O(N) bit extraction algorithm.

Conversions from binary to other radices use one of two algorithms. Sizes below GET_STR_PRECOMPUTE_THRESHOLD use a basic O(N^2) method. Repeated divisions by b^n are made, where b is the radix and n is the biggest power that fits in a limb. But instead of simply using the remainder r from such divisions, an extra divide step is done to give a fractional limb representing r/b^n. The digits of r can then be extracted using multiplications by b rather than divisions. Special case code is provided for decimal, allowing multiplications by 10 to optimize to shifts and adds.

Above GET_STR_PRECOMPUTE_THRESHOLD a sub-quadratic algorithm is used. For an input t, powers b^(n*2^i) of the radix are calculated, until a power between t and sqrt(t) is reached. t is then divided by that largest power, giving a quotient which is the digits above that power, and a remainder which is those below. These two parts are in turn divided by the second highest power, and so on recursively. When a piece has been divided down to less than GET_STR_DC_THRESHOLD limbs, the basecase algorithm described above is used.

The advantage of this algorithm is that big divisions can make use of the sub-quadratic divide and conquer division (see Divide and Conquer Division), and big divisions tend to have less overheads than lots of separate single limb divisions anyway. But in any case the cost of calculating the powers b^(n*2^i) must first be overcome.

GET_STR_PRECOMPUTE_THRESHOLD and GET_STR_DC_THRESHOLD represent the same basic thing, the point where it becomes worth doing a big division to cut the input in half. GET_STR_PRECOMPUTE_THRESHOLD includes the cost of calculating the radix power required, whereas GET_STR_DC_THRESHOLD assumes that's already available, which is the case when recursing.

Since the base case produces digits from least to most significant but they want to be stored from most to least, it's necessary to calculate in advance how many digits there will be, or at least be sure not to underestimate that. For GMP the number of input bits is multiplied by chars_per_bit_exactly from mp_bases, rounding up. The result is either correct or one too big.

Examining some of the high bits of the input could increase the chance of getting the exact number of digits, but an exact result every time would not be practical, since in general the difference between numbers 100... and 99... is only in the last few bits and the work to identify 99... might well be almost as much as a full conversion.

mpf_get_str doesn't currently use the algorithm described here, it multiplies or divides by a power of b to move the radix point to the just above the highest non-zero digit (or at worst one above that location), then multiplies by b^n to bring out digits. This is O(N^2) and is certainly not optimal.

The r/b^n scheme described above for using multiplications to bring out digits might be useful for more than a single limb. Some brief experiments with it on the base case when recursing didn't give a noticable improvement, but perhaps that was only due to the implementation. Something similar would work for the sub-quadratic divisions too, though there would be the cost of calculating a bigger radix power.

Another possible improvement for the sub-quadratic part would be to arrange for radix powers that balanced the sizes of quotient and remainder produced, ie. the highest power would be an b^(n*k) approximately equal to sqrt(t), not restricted to a 2^i factor. That ought to smooth out a graph of times against sizes, but may or may not be a net speedup.

Exceter

ASKER

I've read that before. Would you mind translating into english? My geek speak is not the greatest.

jkr

Points to who can prove that "You cannot store numbers without a radix. However, the radix 2 which manifests the binary number system might be the best choice in the digital world" is incorrect.