for the original challenge
o *abel* dropped by, said "hmmmmm...." then dropped off the end of the earth.
o *AvonWyss* submitted a lookup-based algorithm that kicked butt.
o *Pavlik* submitted straight C code which beat all ASM to that point. The *compiler* (helped by an excellent, efficient algorithm) seemed to be smarter than all of us!
o *DanRollins* sipped a brew while generating some benchmarking code.
o *Jonin's* sprintf-based code illustrated an important point: The 'obvious' way is at least 100 times slower than a optimized, targeted solutions.
o AvonWyss's second solution used a clever non-lookup technique in both the hex and ASCII part. He even explained how that darn thing worked!
o *absong* showed an interesting hex-calc technique that used the upper nibble (after a subtraction) as a mask. Clever, but failed in certain situations.
o We opened new thread http:Q.20280946.html
o AvonWyss and DanRollins worked on minor tweeks to Avon's non-lookup code trying to sqeeze a few cycles out.
o able magically reappeared, bearing an MMX-based solution that knocked our sox off! (at least twice as fast as the fastest, but initially flawed in that it was not spacing out the hexdigit pairs). Abel's code forsakes the lookup tables in favor of the MMX ability to process 8 bytes in parallel.
o The spacing-out of the hex digits has ended up being complicated and time-consuming, but we have worked out several promising techniques. The resulting code remains fastest to date: about 50% faster than Pavlik's pure lookup.
o We continued here, with these thoughts in mind:
? EMMX: worth delving into?
? Can Prefetch give new life to lookup tables?
? Can code-interleave to avoid pipeline stalls speed things up?
? Loading MMX mask values from 32-bit regs -- faster?
Come on in!