• C

Reversing bytes

I need to write a function that, given a byte, will return the byte with bits in reverse order. For example, if it is given
 0101 1111
the return value would be
 1111 1010
Is there a better way to do this than checking bits in a loop and shifting the source byte and the target byte ?
    I know this sounds like homework, but it isn't. The reason I need to do this is that I want to read data from a device through LPT. I have a function that reads a word, but, inside that word, one byte is reversed. I need to un-reverse it QUICKLY, and the shifting loop takes too much time.
    I realize that I can do this faster in Assembly using the carry register, but I was looking for a way to do this in plain C, with & | ~ ^ and so on.
Who is Participating?
No i refer to

;1.Brute force: 6 / 8 Cycle, 259 Byte
0000                   6     MIRROR1:
0000 6001              7             JZ MIRR1                ;0: leave unchanged
0002 83                8             MOVC A, @A+PC           ;mirror 1 to 0FFh (max 255 entries in table!)
0003 22                9     MIRR1:  RET
  0001                10             MB SET 1
                      11             REPT 255                ;use Keil Assembler to expand this macro
                      12               DB ((MB AND 1) SHL 7)+((MB AND 2) SHL 5)+((MB AND 4) SHL 3)+((MB AND 8) SHL 1)+((M
                             B AND 10h) SHR 1)+((MB AND 20h) SHR 3)+((MB AND 40h) SHR 5)+((MB AND 80h) SHR 7)
                      13               MB SET MB + 1         ;;count up to mirror next byte

a good c compiler will generate almost the same code for a table.
SinclairAuthor Commented:
Edited text of question
Hmmm, I don't this that you can do it in less than 24 operations:
8 and operations to determine the bits and 8 shift operations to put them in the correct place and 8 to add the parts
Simplify Active Directory Administration

Administration of Active Directory does not have to be hard.  Too often what should be a simple task is made more difficult than it needs to be.The solution?  Hyena from SystemTools Software.  With ease-of-use as well as powerful importing and bulk updating capabilities.

See http://www.specs.de/users/danni/appl/soft/mirror/mirror.lst
for the fastest assembly code to do it.
The fastest way is to use a unsigned char array[256] were the reverse info is stored

unisgned char reverse[256]=
{0, 128, 64, 192, 32, 160, ....}
 I think there will be no faster way.
You could do it in C with 4 shifts
but an array lookup is probably the fastest on most architectures
Without this array use this code

if (value & 1)
if (value & 2)
if (value & 4)
if (value & 8)
if (value & 16)
if (value & 32)
if (value & 64)
if (value & 128)

BTW: Sinclair, if you look at the assembly code, you will see that there is a faster algorithm. The assembly code takes 17 clock cycli.
But the fastest algorithm in assembly code is the first one which uses a table as I mentioned. A good C compiler will generate a very fast code with the table.
Yes, but if you use a table you can better use a full table of all 256 values and it saves a lot of code.
reva = convtable[a]
Sorry rbr, now I understand your solution. That is what you proposed. I'm out of here. Your answer is the best
SinclairAuthor Commented:
Ok, I am really intrigued now. I agree that a table lookup SEEMS like the fastest idea, but it has to get its values from RAM, while other solutions that people gave (using shifts and masks) can work entirely in the registers, which MAY be faster...
SinclairAuthor Commented:
Actually, I believe ozo's solution from http://www.experts-exchange.com/Q.10074236 takes the cake:


My "aside" question to ozo is:
How in the world would you come up with that ? Can you explain step-by-step what you did ? I think I understand this conceptually, but I want to be sure... Thanks :-)
This table needs 256 bytes and it should be no problem to hold them permanent in the RAM to avoid paging. And the access to this array is only one assembler instruction. Look at http://www.specs.de/users/danni/appl/soft/mirror/mirror.lst where the table is the fastest algorithm.
input*0x202020202)&0x10884422010)%0x3ff needs one integer multiplikation and division. This wouldn't be faster than a table.
SinclairAuthor Commented:
Sorry to be such a pain in the ass, but:

I assume you are referring to
                        ;C:  ACC:
MOV C, ACC.1            ;1 76543210  
RLC A                   ;7 65432101
MOV ACC.2, C            ;7 65432701
MOV C, ACC.3            ;2 65432701
RLC A                   ;6 54327012
MOV ACC.4, C            ;6 54367012
MOV C, ACC.5            ;3 54367012
RLC A                   ;5 43670123
MOV ACC.6, C            ;5 45670123
SWAP A                  ;5 01234567

(quoted from the page).

Will this work on a PC (x86) ? In particular, can you access individual bits like that ("ACC.1"), and is there a SWAP command ? It's been a while since I used Assembly on anything that big...

Read the all the comments. ozo and mirkwood think that a table would be the fastest solution on the most architectures.
rbr, what's wrong with the XLAT x86 opcode?
Sinclair..  here is an Assembler function to be called from C
               which will also get the job done..  it's not as                complicated to understand and should be easy
               to incorperate into your code, Assemble it and add
               the Obj when you compile your program...

;-- REVERSE: Reverse the bits in AL -----------------------------------
;Usage:  X = _ReverseByte (Y);
;Declarations:  unsigned int    X,Y;
;Note:  the value in Y must be between 0 and 255 since we are
;       only talking about 1 byte..

                PUBLIC  _ReverseByte
_ReverseByte    proc    near
                push    bp              ;save BP register
                mov     bp,sp           ;set BP to SP (stack pointer)

                mov     ax,[bp+4]       ;AX = value of passed parameter

                cmp     ax,0            ;is Prameter is Zero
                je      reverse_099     ;don't waiste your time

                push    bx              ;save BX
                push    cx              ;     CX

                xor     ah,ah           ;clear AH for bit setting
                mov     cx,8            ;8 bits to reverse, we start at LSB
                mov     bl,00000001b    ;walk threw AL with BL lsb..msb
                mov     bh,10000000b    ;walk threw AH with BH msb..lsb

reverse_001:    test    bl,al           ;is AL (bit) set or not
                jz      reverse_002     ;no, skip the AH bit setting
                or      ah,bh           ;yes, set the bit in AH with BH
reverse_002:    shr     bh,1            ;traverse bits from lsb to msb
                shl     bl,1            ;         bits from msb to lsb
                loop    reverse_001     ;repeat for 8 bits

                xchg    al,ah           ;reversed AL is AH, switch into AL
                xor     ah,ah           ;clear high working register

                pop     cx              ;restore destroyed registers
                pop     bx              ;

reverse_099:    pop     bp              ;restore BP
                ret                     ;leave Result in AX
_ReverseByte    endp

any questions...Assemblr@hotmail.com                        
Sorry...made a mistake...

the Usage should say  X = ReverseByte (Y);

SinclairAuthor Commented:
Ok, it appears that rbr's answer (a lookup table) is still the fastest, so I will give him the points. I wish I could give them to all you people, because your answers are also very interesting, but the system won't allow that.
Anyway, I am STILL wondering about ozo's solution, so, ozo, if you're reading this, please email me at
>> but the system won't allow that.
posting a polite request (0-point question) to Linda in the customer support area usally takes care of that.  Make sure to include names and the Q. number.
Does this help?

(The first access of MSDN-online will require a one-time registration process)
I'm not sure what there is to explain about the multiplication solution,
other than observing what it does for each bit, and noting that the result for the sum of bits is the sum of the results for each bit.
basically the multiply duplicates the byte at different offsts,
the and selects bits in key positions, and the moduo collects them.

(BTW, I have seen at least one architecture on which the multiply could potentially be faster than the lookup,
but I don't think I've ever seen a web browser written for that architecture,
so if you're reading this, chances should be good that the lookup will be faster on your machine)
What is this architecture?
SinclairAuthor Commented:
Ozo, can you maybe explain the thought process you used to arrive at the multiplication solution ? How does the modulo "collect" the bits ?
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.