Sinclair
asked on
Reversing bytes
I need to write a function that, given a byte, will return the byte with bits in reverse order. For example, if it is given
0101 1111
the return value would be
1111 1010
Is there a better way to do this than checking bits in a loop and shifting the source byte and the target byte ?
I know this sounds like homework, but it isn't. The reason I need to do this is that I want to read data from a device through LPT. I have a function that reads a word, but, inside that word, one byte is reversed. I need to un-reverse it QUICKLY, and the shifting loop takes too much time.
I realize that I can do this faster in Assembly using the carry register, but I was looking for a way to do this in plain C, with & | ~ ^ and so on.
0101 1111
the return value would be
1111 1010
Is there a better way to do this than checking bits in a loop and shifting the source byte and the target byte ?
I know this sounds like homework, but it isn't. The reason I need to do this is that I want to read data from a device through LPT. I have a function that reads a word, but, inside that word, one byte is reversed. I need to un-reverse it QUICKLY, and the shifting loop takes too much time.
I realize that I can do this faster in Assembly using the carry register, but I was looking for a way to do this in plain C, with & | ~ ^ and so on.
Hmmm, I don't this that you can do it in less than 24 operations:
8 and operations to determine the bits and 8 shift operations to put them in the correct place and 8 to add the parts
8 and operations to determine the bits and 8 shift operations to put them in the correct place and 8 to add the parts
See http://www.specs.de/users/danni/appl/soft/mirror/mirror.lst
for the fastest assembly code to do it.
for the fastest assembly code to do it.
The fastest way is to use a unsigned char array[256] were the reverse info is stored
unisgned char reverse[256]=
{0, 128, 64, 192, 32, 160, ....}
I think there will be no faster way.
unisgned char reverse[256]=
{0, 128, 64, 192, 32, 160, ....}
I think there will be no faster way.
You could do it in C with 4 shifts
but an array lookup is probably the fastest on most architectures
but an array lookup is probably the fastest on most architectures
Without this array use this code
if (value & 1)
reverse=128;
if (value & 2)
reverse|=64;
if (value & 4)
reverse|=32;
if (value & 8)
reverse|=16;
if (value & 16)
reverse|=8;
if (value & 32)
reverse|=4;
if (value & 64)
reverse|=2;
if (value & 128)
reverse|=1;
if (value & 1)
reverse=128;
if (value & 2)
reverse|=64;
if (value & 4)
reverse|=32;
if (value & 8)
reverse|=16;
if (value & 16)
reverse|=8;
if (value & 32)
reverse|=4;
if (value & 64)
reverse|=2;
if (value & 128)
reverse|=1;
BTW: Sinclair, if you look at the assembly code, you will see that there is a faster algorithm. The assembly code takes 17 clock cycli.
But the fastest algorithm in assembly code is the first one which uses a table as I mentioned. A good C compiler will generate a very fast code with the table.
Yes, but if you use a table you can better use a full table of all 256 values and it saves a lot of code.
reva = convtable[a]
reva = convtable[a]
Sorry rbr, now I understand your solution. That is what you proposed. I'm out of here. Your answer is the best
ASKER
Ok, I am really intrigued now. I agree that a table lookup SEEMS like the fastest idea, but it has to get its values from RAM, while other solutions that people gave (using shifts and masks) can work entirely in the registers, which MAY be faster...
ASKER
Actually, I believe ozo's solution from https://www.experts-exchange.com/Q.10074236 takes the cake:
((input*0x202020202)&0x108 84422010)% 0x3ff
My "aside" question to ozo is:
How in the world would you come up with that ? Can you explain step-by-step what you did ? I think I understand this conceptually, but I want to be sure... Thanks :-)
((input*0x202020202)&0x108
My "aside" question to ozo is:
How in the world would you come up with that ? Can you explain step-by-step what you did ? I think I understand this conceptually, but I want to be sure... Thanks :-)
This table needs 256 bytes and it should be no problem to hold them permanent in the RAM to avoid paging. And the access to this array is only one assembler instruction. Look at http://www.specs.de/users/danni/appl/soft/mirror/mirror.lst where the table is the fastest algorithm.
input*0x202020202)&0x10884 422010)%0x 3ff needs one integer multiplikation and division. This wouldn't be faster than a table.
ASKER
Sorry to be such a pain in the ass, but:
I assume you are referring to
;C: ACC:
MOV C, ACC.1 ;1 76543210
RLC A ;7 65432101
MOV ACC.2, C ;7 65432701
MOV C, ACC.3 ;2 65432701
RLC A ;6 54327012
MOV ACC.4, C ;6 54367012
MOV C, ACC.5 ;3 54367012
RLC A ;5 43670123
MOV ACC.6, C ;5 45670123
SWAP A ;5 01234567
RET
(quoted from the page).
Will this work on a PC (x86) ? In particular, can you access individual bits like that ("ACC.1"), and is there a SWAP command ? It's been a while since I used Assembly on anything that big...
I assume you are referring to
;C: ACC:
MOV C, ACC.1 ;1 76543210
RLC A ;7 65432101
MOV ACC.2, C ;7 65432701
MOV C, ACC.3 ;2 65432701
RLC A ;6 54327012
MOV ACC.4, C ;6 54367012
MOV C, ACC.5 ;3 54367012
RLC A ;5 43670123
MOV ACC.6, C ;5 45670123
SWAP A ;5 01234567
RET
(quoted from the page).
Will this work on a PC (x86) ? In particular, can you access individual bits like that ("ACC.1"), and is there a SWAP command ? It's been a while since I used Assembly on anything that big...
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Read the all the comments. ozo and mirkwood think that a table would be the fastest solution on the most architectures.
rbr, what's wrong with the XLAT x86 opcode?
Sinclair.. here is an Assembler function to be called from C
which will also get the job done.. it's not as complicated to understand and should be easy
to incorperate into your code, Assemble it and add
the Obj when you compile your program...
;-- REVERSE: Reverse the bits in AL -------------------------- ---------
;
;Usage: X = _ReverseByte (Y);
;
;Declarations: unsigned int X,Y;
;
;Note: the value in Y must be between 0 and 255 since we are
; only talking about 1 byte..
;
PUBLIC _ReverseByte
_ReverseByte proc near
push bp ;save BP register
mov bp,sp ;set BP to SP (stack pointer)
mov ax,[bp+4] ;AX = value of passed parameter
cmp ax,0 ;is Prameter is Zero
je reverse_099 ;don't waiste your time
push bx ;save BX
push cx ; CX
xor ah,ah ;clear AH for bit setting
mov cx,8 ;8 bits to reverse, we start at LSB
mov bl,00000001b ;walk threw AL with BL lsb..msb
mov bh,10000000b ;walk threw AH with BH msb..lsb
reverse_001: test bl,al ;is AL (bit) set or not
jz reverse_002 ;no, skip the AH bit setting
or ah,bh ;yes, set the bit in AH with BH
reverse_002: shr bh,1 ;traverse bits from lsb to msb
shl bl,1 ; bits from msb to lsb
loop reverse_001 ;repeat for 8 bits
xchg al,ah ;reversed AL is AH, switch into AL
xor ah,ah ;clear high working register
pop cx ;restore destroyed registers
pop bx ;
reverse_099: pop bp ;restore BP
ret ;leave Result in AX
_ReverseByte endp
any questions...Assemblr@hotma il.com
which will also get the job done.. it's not as complicated to understand and should be easy
to incorperate into your code, Assemble it and add
the Obj when you compile your program...
;-- REVERSE: Reverse the bits in AL --------------------------
;
;Usage: X = _ReverseByte (Y);
;
;Declarations: unsigned int X,Y;
;
;Note: the value in Y must be between 0 and 255 since we are
; only talking about 1 byte..
;
PUBLIC _ReverseByte
_ReverseByte proc near
push bp ;save BP register
mov bp,sp ;set BP to SP (stack pointer)
mov ax,[bp+4] ;AX = value of passed parameter
cmp ax,0 ;is Prameter is Zero
je reverse_099 ;don't waiste your time
push bx ;save BX
push cx ; CX
xor ah,ah ;clear AH for bit setting
mov cx,8 ;8 bits to reverse, we start at LSB
mov bl,00000001b ;walk threw AL with BL lsb..msb
mov bh,10000000b ;walk threw AH with BH msb..lsb
reverse_001: test bl,al ;is AL (bit) set or not
jz reverse_002 ;no, skip the AH bit setting
or ah,bh ;yes, set the bit in AH with BH
reverse_002: shr bh,1 ;traverse bits from lsb to msb
shl bl,1 ; bits from msb to lsb
loop reverse_001 ;repeat for 8 bits
xchg al,ah ;reversed AL is AH, switch into AL
xor ah,ah ;clear high working register
pop cx ;restore destroyed registers
pop bx ;
reverse_099: pop bp ;restore BP
ret ;leave Result in AX
_ReverseByte endp
any questions...Assemblr@hotma
Sorry...made a mistake...
the Usage should say X = ReverseByte (Y);
the Usage should say X = ReverseByte (Y);
ASKER
Ok, it appears that rbr's answer (a lookup table) is still the fastest, so I will give him the points. I wish I could give them to all you people, because your answers are also very interesting, but the system won't allow that.
Anyway, I am STILL wondering about ozo's solution, so, ozo, if you're reading this, please email me at
bugmaster@earthlink.net
thanks.
Anyway, I am STILL wondering about ozo's solution, so, ozo, if you're reading this, please email me at
bugmaster@earthlink.net
thanks.
>> but the system won't allow that.
posting a polite request (0-point question) to Linda in the customer support area usally takes care of that. Make sure to include names and the Q. number.
posting a polite request (0-point question) to Linda in the customer support area usally takes care of that. Make sure to include names and the Q. number.
Does this help?
http://support.microsoft.com/support/kb/articles/q129/3/15.asp
(The first access of MSDN-online will require a one-time registration process)
http://support.microsoft.com/support/kb/articles/q129/3/15.asp
(The first access of MSDN-online will require a one-time registration process)
I'm not sure what there is to explain about the multiplication solution,
other than observing what it does for each bit, and noting that the result for the sum of bits is the sum of the results for each bit.
basically the multiply duplicates the byte at different offsts,
the and selects bits in key positions, and the moduo collects them.
(BTW, I have seen at least one architecture on which the multiply could potentially be faster than the lookup,
but I don't think I've ever seen a web browser written for that architecture,
so if you're reading this, chances should be good that the lookup will be faster on your machine)
other than observing what it does for each bit, and noting that the result for the sum of bits is the sum of the results for each bit.
basically the multiply duplicates the byte at different offsts,
the and selects bits in key positions, and the moduo collects them.
(BTW, I have seen at least one architecture on which the multiply could potentially be faster than the lookup,
but I don't think I've ever seen a web browser written for that architecture,
so if you're reading this, chances should be good that the lookup will be faster on your machine)
What is this architecture?
ASKER
Ozo, can you maybe explain the thought process you used to arrive at the multiplication solution ? How does the modulo "collect" the bits ?
ASKER