Fast Hexadecimal Encode and Decode

Published:
Updated:
In Easy String Encryption Using CryptoAPI in C++ I described how to encrypt text and recommended that the encrypted text be stored as a series of hexadecimal digits -- because cyphertext may contain embedded NULLs or other characters that can cause processing problems.

Since then I've worked on an application in which large blocks of binary data must be encoded into hexadecimal rapidly, and when needed, decoded back to binary just as rapidly.  In the earlier article, I provided a sort of "throw-away" hex encoder/decoder, but I felt that I needed something that was more efficient for production work.

In this article, I'll explore some hex encode/decode options, including the one supported by the CryptoAPI, and at the end of the article, I'll present the ultra-fast functions that ended up in my production code.

Observe the ordering of characters in an ASCII Table The only real issue with hex encoding is that there is a gap between the 9 and the A.  Most of the complications come when dealing with that.

How to Encode to Hexadecimal
Your binary source byte is 8 bits.  Take the data 4 bits at a time (values 0-15).  If that value is 0-9, then just add '0' (0x30).  If the value is 10-15, then add 0x37 instead.  That gives you a hex digit ('0', '1',...,'9','A',..., 'F').  Output the resulting high hex digit followed by the low hex digit.
``````void ToHexManual( BYTE* pSrc, int nSrcLen, CString& sDest )
{
sDest="";
for ( int j=0; j<nSrcLen; j++ ) {
BYTE b1= *pSrc >> 4;   // hi nybble
BYTE b2= *pSrc & 0x0f; // lo nybble
b1+='0'; if (b1>'9') b1 += 7;  // gap between '9' and 'A'
b2+='0'; if (b2>'9') b2 += 7;
sDest += b1;
sDest += b2;
pSrc++;
}
}
``````
Here's a more concise (but slower) version that uses a printf-type tool:
``````void ToHexSlow( BYTE* pSrc, int nSrcLen, CString& sDest )
{
sDest="";
for ( int j=0; j<nSrcLen; j++ ) {
sDest.AppendFormat( "%02x", int(*pSrc) );
pSrc++;
}
}
``````
Both of these are usable, but not as efficient as I'd like.  The former requires calculations and conditional tests twice for each byte.  The latter makes a function call for each byte.  Both versions append the output to an ever-growing CString object, and that can be a significant problem when the output is large.  Let me explain...

Significant Problem with Long Strings
Each time you concatenate to an existing CString (or an STL string object), the code must check to see if that current allocation is large enough.  If not, it must allocate a new chunk that is larger, and copy the existing contents to that location before appending the new text.  Imagine how clumsy that gets when the string is about 1MB long!

We'll look at the efficient version that avoids these problems in a minute.

How to Decode from Hexadecimal
Your textual source data is composed of pairs of hex digits, for instance "72" or "0F' or "D3".  Take  the first digit and subtract '0' (0x30).  If the result is greater than 9 (it's 'A','B',...,'F'), then subtract 7, resulting in a value between 0 and 15.  Do the same for the second digit.  Multiply the first by 16 and add it to the second.  Output the resulting 8-bit value.
``````void FromHexManual( LPCSTR pSrc, int nSrcLen, BYTE* pDest )
{
for ( int j=0; j<nSrcLen; j+=2 ) {
BYTE b1= pSrc[j]   -'0'; if (b1>9) b1 -= 7;
BYTE b2= pSrc[j+1] -'0'; if (b2>9) b2 -= 7;
*pDest++ = (b1<<4) + b2;  // <<4 multiplies by 16
}
}
``````
Here's another try that uses sscanf from the C runtime library:
``````void FromHexSlow( LPCSTR pSrc, int nSrcLen, BYTE* pDest )
{
char sTmp[3]="xx";
int n;
for ( int j=0; j<nSrcLen; j += 2 ) {
sTmp[0]= pSrc[0];
sTmp[1]= pSrc[1];
sscanf( sTmp, "%x", &n );
pSrc += 2;
*pDest++ = (BYTE)n;
}
}
``````
The CryptoAPI Hex Conversion Functions
None of the above routines provided the kind of speed that I was hoping to achieve.   After some research, I found a couple of functions in the CryptoAPI toolkit that convert to and from hex:
CryptBinaryToString  and   CryptStringToBinary
and I tried them:
``````#pragma comment( lib, "Crypt32.lib" )
#include <Wincrypt.h>
void ToHexCryptoAPI( BYTE* pSrc, int nSrcLen, CString& sDest )
{
DWORD nOutLen= (nSrcLen*3)+4; // xx<space> + etc/
char* pBuf= new char[nOutLen];
BOOL fRet= CryptBinaryToString( pSrc,nSrcLen, CRYPT_STRING_HEX, pBuf, &nOutLen );
sDest= pBuf;
delete pBuf;
}
int FromHexCryptoAPI( LPCSTR pSrc, int nSrcLen, BYTE* pDest )
{
DWORD nOutLen= nSrcLen/2;
BOOL fRet= CryptStringToBinary( pSrc, nSrcLen, CRYPT_STRING_HEX, pDest, &nOutLen, 0, 0);
return( (int)nOutLen );
}
``````
Oddly, the only useful option is CRYPT_STRING_HEX -- the CRYPT_STRING_HEXRAW option is no longer supported.  The output from CryptBinaryToString appears to be formatted for screen output -- spaces between each pair of hex digits, embedded tab characters, CRLF at the end, etc.  The two functions above do work, and there certainly is value in using standard API functions, but the performance was lackluster, and I didn't want the overhead of the extra embedded characters.

[step="" title="Finally --  The Ultra-Fast Code I Promised!"][/step] Lookup Tables Rather Than Calculations
If you look at the assembly-language output from any of the above routines, you will see conditional jumps and a significant number of calculations.   Modern processors are fast, using multiple pipelines to handle conditionals, but there is nothing faster than using a lookup table -- especially when the table is in L1 processor cache... the table access is nearly as fast as register access!

So...
For binary-to-hex, I used a table that contains all possible output values for any byte of data.  This is very straightforward -- the data itself is the index into the table.  I use a WORD pointer to grab 16 bits (two hex digits) at a time.

For hex-to-binary, I realized that I could use a lookup table that contains all possible hexadecimal digit values.   Rather than compensate for the "gap" between '9'' and 'A', I just left empty spots in the table.   As a bonus, I left an extra gap between "F" and "a" -- so that the same routine could convert the hexadecimal data regardless of whether it contains A,B,C,D,E,F or a,b,c,d,e,f.
``````BYTE HexLookup[513]= {
"000102030405060708090a0b0c0d0e0f"
"101112131415161718191a1b1c1d1e1f"
"202122232425262728292a2b2c2d2e2f"
"303132333435363738393a3b3c3d3e3f"
"404142434445464748494a4b4c4d4e4f"
"505152535455565758595a5b5c5d5e5f"
"606162636465666768696a6b6c6d6e6f"
"707172737475767778797a7b7c7d7e7f"
"808182838485868788898a8b8c8d8e8f"
"909192939495969798999a9b9c9d9e9f"
"b0b1b2b3b4b5b6b7b8b9babbbcbdbebf"
"c0c1c2c3c4c5c6c7c8c9cacbcccdcecf"
"e0e1e2e3e4e5e6e7e8e9eaebecedeeef"
"f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff"
};
void ToHex( BYTE* pSrc, int nSrcLen, CString& sDest )
{
WORD* pwHex=  (WORD*)HexLookup;
WORD* pwDest= (WORD*)sDest.GetBuffer((nSrcLen*2)+1);

for (int j=0; j<nSrcLen; j++ ) {
*pwDest= pwHex[*pSrc];
pwDest++; pSrc++;
}
*((BYTE*)pwDest)= 0;  // terminate the string
sDest.ReleaseBuffer((nSrcLen*2)+1);
}
``````
``````BYTE DecLookup[] = {
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // gap before first hex digit
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,1,2,3,4,5,6,7,8,9,       // 0123456789
0,0,0,0,0,0,0,             // :;<=>?@ (gap)
10,11,12,13,14,15,         // ABCDEF
0,0,0,0,0,0,0,0,0,0,0,0,0, // GHIJKLMNOPQRS (gap)
0,0,0,0,0,0,0,0,0,0,0,0,0, // TUVWXYZ[/]^_` (gap)
10,11,12,13,14,15          // abcdef
};
void FromHex( BYTE* pSrc, int nSrcLen, BYTE* pDest )
{
for (int j=0; j<nSrcLen; j += 2 ) {
int d =  DecLookup[*pSrc++ ] << 4;
d |= DecLookup[*pSrc++ ];
*pDest++ = (BYTE)d;
}
}
``````

Notes:
In the final (ultra-fast) ToHex function, note the usage of s.GetBuffer() and s.ReleaseBuffer().  The function allocates the entire output buffer before it starts.  This avoids that Significant Problem with Long Strings that I mentioned earlier.  I could have allocated a separate buffer and set the result string to it as a final step, but that's one (possibly large) memory copy that I wanted to avoid.  By using GetBuffer, I allocate and "freeze" the CString buffer while I'm filling it using high-speed pointer operations.
All of the above functions would be safer if they had some error handling.  For instance, the FromHex functions all assume that the output buffer is large enough to hold the resulting binary data.  I omitted that but you might want to be more careful.  Note the technique used in the CryptoAPI functions, and emulate what Microsoft has done.
Also, none of the FromHex Xxxx routines take any special action when encountering an invalid hex digit; in fact, the fastest one may fail with an access violation if it indexes beyond the end of the lookup table.  In my case (and to avoid complicating this article) I assumed that I'd always have valid hexadeximal data.  You may not want to make that assumption.
The ToHexManual, ToHexSlow, FromHexManual, and FromHexSlow all assume that the letter digits are uppercase (ABCDEF). The FromHex Xxxx versions of these will fail if lowercase hex digits are found.  The CryptoAPI functions use lowercase.

The ToHex (fast) function generates lowercase letter-digits (abcdef) and FromHex (fast) will work with either uppercase or lowercase.  If you want uppercase output from ToHex, you can select the entire table in Visual Stuido, then use Edit/Advanced/Make Uppercase (If you want the option of using either, just create two tables).
The timing test results are interesting.   I read in a large (1MB) binary file (a ZIP file), then started the timer and ran 10 iterations of each function. Approx Ratio Debug Release (Release) ToHexManual 22,297 20,985 446 : 1 ToHexSlow 34,016 37,891 806 : 1 ToHexCryptoAPI 3,981 4,056 86 : 1 ToHex (fast) 172 47 1 : 1 FromHexManual 110 70 1.6 : 1 FromHexSlow 11,457 2,360 52 : 1 FromHexCryptoAPI 500 500 11 : 1 FromHex (fast) 47 45 1 : 1All times are in milliseconds, based on the less-than-precise GetTickCount() function.   I also ran a 100-iteration test on each to verify and got similar performance ratios.  The really big numbers for ToHexManual and ToHexSlow are almost certainly caused, at least in part, by the Significant Problem with Long Strings issue (which they both use).

The FromHexManual speed was faster than I thought it would be.  Upon examining the compiler-generated ASM source code, I found that the compiler had "inlined" the function.  Also, it seems that the calculate-and-jump-conditi onally sequence is handled very efficiently on my AMD-based PC -- it's only about 50% slower than the lookup-table technique.

In any case, the lookup-table versions proved to be significantly faster than the others.

[i] References:[/i]

CryptBinaryToString Function
http://msdn.microsoft.com/en-us/library/aa379887(VS.85).aspx

CryptStringToBinary Function
http://msdn.microsoft.com/en-us/library/aa380285(VS.85).aspx

How fast is your ASM hex converter?
https://www.experts-exchange.com/questions/20272901/How-fast-is-your-ASM-hex-converter.html
This Experts-Exchange "question" was really a challenge to create a particular kind of hexadecimal output in intel ASM.  The lookup table technique won out easily... until some smarty pants!!! figured out how to use MMX opcodes and handle larger chunks of data.   Interesting reading, and a lot of fun for the participants!

=-=-=-=-=-=-=-=-=-=-=-=-=- =-=-=-=-=- =-=-=-=-=- =-=-=-=-=- =-=-=-=-=- =-=-=-=-=- =-=-=-=
If you liked this article and want to see more from this author,  please click the Yes button near the:
label that is just below and to the right of this text.   Thanks!
=-=-=-=-=-=-=-=-=-=-=-=-=- =-=-=-=-=- =-=-=-=-=- =-=-=-=-=- =-=-=-=-=- =-=-=-=-=- =-=-=-=
3
17,049 Views

"Batchelor", Developer and EE Topic Advisor
CERTIFIED EXPERT
Top Expert 2015

Commented:
Astonishing result. I had expected the string versions to be slow, for obvious reason.
Coming from the old days of 3.5KB "PCs" and programmable calculators, I never would have considered the lookup table approach to be that speedy and hence useful. I reckon I should start thinking in bigger figures (i.e. GB) :-)
CERTIFIED EXPERT
Author of the Year 2009

Commented:
I'm glad you enjoyed the article.
Re: Lookup tables --  I used a 512-byte lookup table, but in that "hex challenge" link, one of the participants had a go at pre-calculating a 390KB table in order to lookup the output of two bytes at one time.  I called it "...either ridculous or sublime or both."  As I recall, the performance gain was less than stellar... once the table gets that large, you lose the L1 cache advantage.  But it can be fun to try that kind of thing -- just to find out.

Commented:
What is the license of this code?
MIT? GNU?
CERTIFIED EXPERT
Author of the Year 2009

Commented:

I know of no prohibition or limitation on use of any source code provided in an EE article or Q/A response.  Frankly, the whole point of this site is to provide usable solutions, so it would be strange if there were prohibitions on usage.   As a practical matter, the above code can be used in any way you want.  I'd expect that if you use the code in an article or publication, that you would properly attribute the author and the site.

-- Dan
CERTIFIED EXPERT

Commented:
Don't you think a Duff's device applied would make it somewhat faster? This part

for (int j=0; j<nSrcLen; j++ ) {
*pwDest= pwHex[*pSrc];
pwDest++; pSrc++;
}

can be unrolled and rumors are that duff's device achieves for example 30% faster with gcc. I believe its definitely worth a try.

``````#define SafeMacro(statment) \
do { statment; } while (0)

#define DuffLoop(aCount, statment) \
SafeMacro( \
auto count_ = (aCount); \
auto times_ = (count_ + 7) >> 3; \
switch (count_ & 7){ \
case 0: do { statment; \
case 7: statment; \
case 6: statment; \
case 5: statment; \
case 4: statment; \
case 3: statment; \
case 2: statment; \
case 1: statment; \
} while (--times_ > 0); \
} \
)

#define DuffLoopChecked(aCount, statment) \
SafeMacro( if(aCount) { DuffLoop((aCount), statment); } )

//
DuffLoopChecked(
nSrcLen,
*pwDest++= pwHex[*pSrc++];
);
``````

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.