• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 206
  • Last Modified:

parallel xlat insse2?

Dear all,

The xlat instruction is:


Is there a prallel implementaions in SSE2?

xmm1[0] = DWORD PTR[EBX+xmm1[0] ].
xmm1[1] = DWORD PTR[EBX+xmm1[1] ].
xmm1[2] = DWORD PTR[EBX+xmm1[2] ].
xmm1[3] = DWORD PTR[EBX+xmm1[3] ].

THank you.
1 Solution
XLAT is one of those instructions that  got left behind in slow microcode.  It takes FOUR cycles to do ONE byte.

On a Pentium you can do at least eight times as fast by making a 64K lookup table and doing:

mov  ax,Table[bx+si]   // where bx is the base address and si has two bytes you want to convert.

Actually you can do FOUR bytes in one cycle, as the two pipes can each do one of the above instructions per cycle, so that's SIXTEEN times faster than XLAT.

If that good enough?


Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now