Link to home
Start Free TrialLog in
Avatar of Sinclair
Sinclair

asked on

Reversing bytes

I need to write a function that, given a byte, will return the byte with bits in reverse order. For example, if it is given
 0101 1111
the return value would be
 1111 1010
Is there a better way to do this than checking bits in a loop and shifting the source byte and the target byte ?
    I know this sounds like homework, but it isn't. The reason I need to do this is that I want to read data from a device through LPT. I have a function that reads a word, but, inside that word, one byte is reversed. I need to un-reverse it QUICKLY, and the shifting loop takes too much time.
    I realize that I can do this faster in Assembly using the carry register, but I was looking for a way to do this in plain C, with & | ~ ^ and so on.
Avatar of Sinclair
Sinclair

ASKER

Edited text of question
Hmmm, I don't this that you can do it in less than 24 operations:
8 and operations to determine the bits and 8 shift operations to put them in the correct place and 8 to add the parts
See http://www.specs.de/users/danni/appl/soft/mirror/mirror.lst
for the fastest assembly code to do it.
The fastest way is to use a unsigned char array[256] were the reverse info is stored

unisgned char reverse[256]=
{0, 128, 64, 192, 32, 160, ....}
 I think there will be no faster way.
Avatar of ozo
You could do it in C with 4 shifts
but an array lookup is probably the fastest on most architectures
Without this array use this code

if (value & 1)
      reverse=128;
if (value & 2)
      reverse|=64;
if (value & 4)
      reverse|=32;
if (value & 8)
      reverse|=16;
if (value & 16)
      reverse|=8;
if (value & 32)
      reverse|=4;
if (value & 64)
      reverse|=2;
if (value & 128)
      reverse|=1;

BTW: Sinclair, if you look at the assembly code, you will see that there is a faster algorithm. The assembly code takes 17 clock cycli.
But the fastest algorithm in assembly code is the first one which uses a table as I mentioned. A good C compiler will generate a very fast code with the table.
Yes, but if you use a table you can better use a full table of all 256 values and it saves a lot of code.
reva = convtable[a]
Sorry rbr, now I understand your solution. That is what you proposed. I'm out of here. Your answer is the best
Ok, I am really intrigued now. I agree that a table lookup SEEMS like the fastest idea, but it has to get its values from RAM, while other solutions that people gave (using shifts and masks) can work entirely in the registers, which MAY be faster...
Actually, I believe ozo's solution from https://www.experts-exchange.com/Q.10074236 takes the cake:

((input*0x202020202)&0x10884422010)%0x3ff

My "aside" question to ozo is:
How in the world would you come up with that ? Can you explain step-by-step what you did ? I think I understand this conceptually, but I want to be sure... Thanks :-)
This table needs 256 bytes and it should be no problem to hold them permanent in the RAM to avoid paging. And the access to this array is only one assembler instruction. Look at http://www.specs.de/users/danni/appl/soft/mirror/mirror.lst where the table is the fastest algorithm.
input*0x202020202)&0x10884422010)%0x3ff needs one integer multiplikation and division. This wouldn't be faster than a table.
Sorry to be such a pain in the ass, but:

I assume you are referring to
                        ;C:  ACC:
MOV C, ACC.1            ;1 76543210  
RLC A                   ;7 65432101
MOV ACC.2, C            ;7 65432701
MOV C, ACC.3            ;2 65432701
RLC A                   ;6 54327012
MOV ACC.4, C            ;6 54367012
MOV C, ACC.5            ;3 54367012
RLC A                   ;5 43670123
MOV ACC.6, C            ;5 45670123
SWAP A                  ;5 01234567
RET

(quoted from the page).

Will this work on a PC (x86) ? In particular, can you access individual bits like that ("ACC.1"), and is there a SWAP command ? It's been a while since I used Assembly on anything that big...



ASKER CERTIFIED SOLUTION
Avatar of rbr
rbr

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Read the all the comments. ozo and mirkwood think that a table would be the fastest solution on the most architectures.
rbr, what's wrong with the XLAT x86 opcode?
Sinclair..  here is an Assembler function to be called from C
               which will also get the job done..  it's not as                complicated to understand and should be easy
               to incorperate into your code, Assemble it and add
               the Obj when you compile your program...



;-- REVERSE: Reverse the bits in AL -----------------------------------
;
;Usage:  X = _ReverseByte (Y);
;
;Declarations:  unsigned int    X,Y;
;
;Note:  the value in Y must be between 0 and 255 since we are
;       only talking about 1 byte..
;

                PUBLIC  _ReverseByte
_ReverseByte    proc    near
                push    bp              ;save BP register
                mov     bp,sp           ;set BP to SP (stack pointer)

                mov     ax,[bp+4]       ;AX = value of passed parameter

                cmp     ax,0            ;is Prameter is Zero
                je      reverse_099     ;don't waiste your time

                push    bx              ;save BX
                push    cx              ;     CX

                xor     ah,ah           ;clear AH for bit setting
                mov     cx,8            ;8 bits to reverse, we start at LSB
                mov     bl,00000001b    ;walk threw AL with BL lsb..msb
                mov     bh,10000000b    ;walk threw AH with BH msb..lsb

reverse_001:    test    bl,al           ;is AL (bit) set or not
                jz      reverse_002     ;no, skip the AH bit setting
                or      ah,bh           ;yes, set the bit in AH with BH
reverse_002:    shr     bh,1            ;traverse bits from lsb to msb
                shl     bl,1            ;         bits from msb to lsb
                loop    reverse_001     ;repeat for 8 bits

                xchg    al,ah           ;reversed AL is AH, switch into AL
                xor     ah,ah           ;clear high working register

                pop     cx              ;restore destroyed registers
                pop     bx              ;

reverse_099:    pop     bp              ;restore BP
                ret                     ;leave Result in AX
_ReverseByte    endp

any questions...Assemblr@hotmail.com                        
Sorry...made a mistake...

the Usage should say  X = ReverseByte (Y);

Ok, it appears that rbr's answer (a lookup table) is still the fastest, so I will give him the points. I wish I could give them to all you people, because your answers are also very interesting, but the system won't allow that.
Anyway, I am STILL wondering about ozo's solution, so, ozo, if you're reading this, please email me at
bugmaster@earthlink.net  
thanks.
>> but the system won't allow that.
posting a polite request (0-point question) to Linda in the customer support area usally takes care of that.  Make sure to include names and the Q. number.
Does this help?
  http://support.microsoft.com/support/kb/articles/q129/3/15.asp

(The first access of MSDN-online will require a one-time registration process)
I'm not sure what there is to explain about the multiplication solution,
other than observing what it does for each bit, and noting that the result for the sum of bits is the sum of the results for each bit.
basically the multiply duplicates the byte at different offsts,
the and selects bits in key positions, and the moduo collects them.

(BTW, I have seen at least one architecture on which the multiply could potentially be faster than the lookup,
but I don't think I've ever seen a web browser written for that architecture,
so if you're reading this, chances should be good that the lookup will be faster on your machine)
What is this architecture?
Ozo, can you maybe explain the thought process you used to arrive at the multiplication solution ? How does the modulo "collect" the bits ?