Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

How to convert infinite stream of bytes to infinite stream of decadic numerals?

Posted on 2009-02-20
14
Medium Priority
?
412 Views
Last Modified: 2012-05-06
Hi,

I need to convert theoretically inifinite stream of bytes into the stream of decadic numerals 0-9. (Practically, you do not know how many bytes you get and you cannot wait until all of them come.)

This taks is easy if you want to produce stream of hexadecimal numerals. This is because every 4 bits can be directly expressed as on hexa numeral. The same probably cannot be done simply with decadic numerals, because there is no natural bit boundary related to decadic numerals.

On the other hand, I do not need reversible conversion. In other words, I do not need the output stream be ever converted back to the original. The output need not to have the features of a number with the same value as input.

For example, it would be nice to be able to convert one byte into a number 0-99 (which obviously mean some loss of the information). However, I want the numbers 0-99 appear with the same probability. For example "byte modulo 100" is not acceptable because the values 0-55 appear with higher probability then the numbers 56-99 (if the input bytes are random).

To be even more specific, I have a stream of bytes that comes as a kind of signature (but I do not know the size in advance). I know that I want say 4 couples like 01 23 45 67 (or 8 couples, or...).

Thanks,
   Petr
0
Comment
Question by:pepr
  • 5
  • 5
  • 2
  • +1
14 Comments
 
LVL 85

Accepted Solution

by:
ozo earned 1200 total points
ID: 23689867
instead of byte modulo 100, byte * 100 / 256 would be more evenly distributed.
Some values would still appear more often than others, bit those values would be spread out between 0 and 99 instead of all being concentrated from 0-55

You if you want the overall distribution to be flat, even if particular bytes are uneven, you can compensate for the above in subsequent bytes
Again. one method that comes to mind is Arithmetic Coding.
Arithmetic coding can also give you a reversible conversion, but relaxing that requirement may make it even easier since you can afford to be sloppy with the algorithm
0
 
LVL 53

Assisted Solution

by:Infinity08
Infinity08 earned 200 total points
ID: 23691605
Are these statements right ?

* The byte stream itself cannot be modified (we cannot change its composition on the sending side)
* Every bit (combination of bits) has to contribute (equally) to the stream of decimal digits (the probability of a decimal digit occurring has to be the same for all 10 decimal digits, assuming a random byte stream)
* the same stream of bytes always results in the same stream of decimal digits, but not necessarily the other way around

If so, arithmetic coding seems to violate the second requirement. Unless I misunderstood what ozo meant.

If not, then ignore me ;)
0
 
LVL 85

Expert Comment

by:ozo
ID: 23691736
How does arithmetic coding seem to violate the second requirement?
(I don't think it violated the second half of that requirement, and if I understood the part about it being ok or even nice to lose information, it doesn't sound like the first part is a requirement)


Or is the requirement for a hashing or checksuming algorithm?
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
LVL 53

Expert Comment

by:Infinity08
ID: 23692002
It depends on whether I understood the requirements correctly. But, if one requirement is that every decimal digit has to have an equal probability of occurring, that seems to clash with the premise of arithmetic coding that says that different symbols have different probabilities of occurring.
0
 
LVL 85

Expert Comment

by:ozo
ID: 23692138
Arithmetic coding allows you to choose the probabilities, so they can be chosen  to be unequal, or they can also be chosen to be equal, when used for compression, you'd want the input probabilities to match the predicted probabilities of each symbol according to your model of the input, and the output probabilities to be equal.  (and the opposite for decompression)
In this case our model would predict that each byte has equal probability, (or that each digit has equal probability if you think of it the other way)
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 23692259
>> or they can also be chosen to be equal

Then what's the difference with not using arithmetic coding ? In other words, how can it be used in this situation ?

I'm probably missing something obvious.
0
 
LVL 85

Expert Comment

by:ozo
ID: 23692562
Arithmetic Coding can be thought of as a generalization of converting from one base into another.
The generalization that allows different probabilities may not buy you much here compared to a binary to decimal conversion, but Arithmetic Coding also involves a trick so that it does not require infinite precision calculations even when dealing with infinite streams, which is the part I was thinking of here.
This was also related to the shifting of information from one byte to the other so that over the long run, the fluctuations
from a perfectly flat distribution caused by the mismatch between 99 and 255 can be smoothed out,
Arithmetic Coding may be overkill for this purpose, but it seemed easier to think of simplifying a well known algorithm than to invent complications in an unknown algorithm,
 (the trick to allow finite precision calculations means that the probabilities may not perfectly follow the model,
bit I thought it could still be close enough to equal probabilities for practical purposes,
-- although, without really understanding the practical practical purpose of this, I can't really be sure that it would be good enough)
0
 
LVL 22

Expert Comment

by:NovaDenizen
ID: 23694409
If you treat the incoming random stream as an arithmetically encoded series of decimal digits, then you could use an arithmetic decoder to produce a series of random digits.

Depending on how efficient or precise you want to be, arithmetic coding can be made arbitrarily close to any particular standard of accuracy you want to establish.  You could implement the whole thing in terms of infinite precision rational numbers and you would get exact accuracy.

However, one theoretical limitation you can't get away from is that this method will never be deterministic.  It's the same idea as using a random bitstream to randomly select one of three possibilities.  00.... will unambiguously select the first, and 11... will unambiguously select the third.  but a sequence starting 010101 or 101010 will not establish a winner until we get two 1's or two 0's in a row, which is not guaranteed to happen within a finite time.

So when you use a binary stream to decide on one of 10 alternatives, you similarly will not have deterministic performance.  
0
 
LVL 22

Assisted Solution

by:NovaDenizen
NovaDenizen earned 600 total points
ID: 23694542
So I guess I'm saying that you might as well use a simple "give up and restart" strategy.

Pull 10 bits off the input stream, convert them to an integer from 0 to 1023.
If the resulting integer is less than 1000, say 78, then output the three digits, like '0', '7', '8'.
If instead it was between 1000 and 1023, then throw these bits away and try again.

With this strategy the implementation is very simple.  It will require, on average, 10.24 bits to get 3 decimal digits.  This means that the efficiency is (ln(1000) / ln(2)) / 10.24 = 97.3%
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 23699029
>> then throw these bits away and try again.

What I'm not sure about, is whether this is allowed (pepr ?). If it is, we might as well just use BCD (same principle as what you described, just with even more loss lol).
0
 
LVL 29

Author Comment

by:pepr
ID: 23709394
For Infinity08 http:#23691605...

>> * The byte stream itself cannot be modified (we cannot change its composition on the sending side)

true

>> * Every bit (combination of bits) has to contribute (equally) to the stream of decimal digits

not neccessarily

>>    (the probability of a decimal digit occurring has to be the same for all 10 decimal digits, assuming a random byte stream)
>> * the same stream of bytes always results in the same stream of decimal digits, but not necessarily the other way around

Better to say, the same (consumed) prefix of the stream generates the same prefix of the output stream. The first produced output digits are not affected by the later bytes from the input stream.

The best analogy would be to get the hex digits (the ideal case, reversible encoding). Only, there would be more decadic digits produced as one dec digit can express less information than hex digit.

ozo is right in his guess. I can make it rather sloopy. The byte * 100 / 256 will be sufficient.

The NovaDenizen's http:#23694542 would probably be more precise, but I am too lazy to do that ;)

Thanks all for the help.
0
 
LVL 29

Author Closing Comment

by:pepr
ID: 31549149
Thank to all of you for helping to clarify and to solve the problem.
  Petr
0
 
LVL 85

Expert Comment

by:ozo
ID: 23709619
either byte * 100 / 256 or or nibble*10/16 or decle * 1000 / 1024 or throwing away values from 1000 to 1023 or throwing away values from 200 to 255 or throwing away values from 10 to 15, or various other trade offs could be produced with lossy arithmetic coding with appropriate parameters, and different forms of "give up and restart", such as digits A-F that are never output, or arbitrarily and deterministically treating a sequence starting 010101 or 101010  as if we then immediately got two 1's or two 0's in a row could be different trade offs between finite precision and perfectly uniform output,
which is why I thought the arithmetic coding paradigm might be illuminating on the problem in general
In some instances, it may be easier to code an algorithm tuned for particular parameters, than to code a general algorithm.  (or not, depending on the algorithm and the particular parameters)
In this case, byte * 100 / 256 almost certainly easier than a more general conversion.
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 23709749
Seems I did misunderstand the requirements ;)
0

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This algorithm (in C#) will resize any image down to a given size while maintaining the original aspect ratio. The maximum width and max height are both optional but if neither are given, the original image is returned. This example is designed t…
One of Google's most recent algorithm changes affecting local searches is entitled "The Pigeon Update." This update has dramatically enhanced search inquires for the keyword "Yelp." Google searches with the word "Yelp" included will now yield Yelp a…
This video shows how to quickly and easily deploy an email signature for all users in Office 365 and prevent it from being added to replies and forwards. (the resulting signature is applied on the server level in Exchange Online) The email signat…
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…
Suggested Courses

810 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question