So far, I was able to optimize the nibbles-to-hex converter some:
The original was:
and eax,0x0F0F0F0F // eax is lo nibbles (four of them; use ah for example)
mov ecx,eax // save a copy
add eax,0x06060606 // ah is lo nibble + 6
and eax,0x10101010 // ah is 1 if a low nibble was > 9
shr eax,2 // ah is 4 if a low nibble was > 9
add ecx,eax // ah is low nibble + 4 if a low nibble was > 9
shr eax,1 // += 2 if a low nibble was > 9
add ecx,eax // += 1 if a low nibble was > 9 (total +7)
add ecx,0x30303030 // += '0' (convert to 0-9 and A-F
The new one is (identical functionality):
and eax,$0F0F0F0F // mask nibbles
lea ecx,eax+$80808080-$0A0A0A0
shr ecx,4 // get part of the subtraction which is significant for us
not ecx // negate the offset nibbles
and ecx,$70707070 // mask offset
lea ecx,$30303030+ecx+eax // ecx = $30303030 + eax + ecx
More to come ;-)
Main Topics
Browse All Topics





by: AvonWyssPosted on 2002-03-24 at 12:28:40ID: 6892495
absong, in fact, this was exactly my point of the comment I made where I proposed to add that additional bits to make sure that the subtraction does not influencemore than one nibble at a time.
dan, I already did think about faster methods to assemble the hex bytes and put them in a 32 bit "block". However, since the nibbles are interleaved in two registers, it's pretty hard to find a better way of assembling them than to use byte access. Of course, if anyone has ideas, I'll be glad to hear about them, but I'm afraid that reaching a speed comparable or even better than the simple lookup method will be pretty hard to achieve.