Content-Type: application/x-www-form-urlencoded
Authorization: Basic TXlVc2VySUQ6TXlQYXNzd29yZA==
The text after the word, Basic is a base64-encoding of a username and password (specifically, that string encodes MyUserName:MyPassword).
#include <Wincrypt.h>
#pragma comment (lib, "Crypt32.lib")
int ToBase64Crypto( const BYTE* pSrc, int nLenSrc, char* pDst, int nLenDst )
{
DWORD nLenOut= nLenDst;
BOOL fRet= CryptBinaryToString(
(const BYTE*)pSrc, nLenSrc,
CRYPT_STRING_BASE64 | CRYPT_STRING_NOCRLF,
pDst, &nLenOut
);
if (!fRet) nLenOut=0; // failed
return( nLenOut );
}
int FromBase64Crypto( const BYTE* pSrc, int nLenSrc, char* pDst, int nLenDst )
{
DWORD nLenOut= nLenDst;
BOOL fRet= CryptStringToBinary(
(LPCSTR)pSrc, nLenSrc,
CRYPT_STRING_BASE64,
(BYTE*)pDst, &nLenOut,
NULL, // pdwSkip (not needed)
NULL // pdwFlags (not needed)
);
if (!fRet) nLenOut=0; // failed
return( nLenOut );
}
I used these CryptoAPI functions to validate my own functions and for performance testing. These were, as I expected, poor performers. But I really can't fault you if you use these functions. They are reliable and bullet-proof.
#include <atlenc.h>
int ToBase64ATL( const BYTE* pSrc, int nLenSrc, char* pDst, int nLenDst )
{
int nDestLen= nLenDst;
BOOL fRet= Base64Encode( pSrc, nLenSrc, pDst, &nDestLen, ATL_BASE64_FLAG_NOCRLF );
if (!fRet) nDestLen=0;
return( nDestLen );
}
int FromBase64ATL( const BYTE* pSrc, int nLenSrc, char* pDst, int nLenDst )
{
int nDestLen= nLenDst;
BOOL fRet= Base64Decode( (LPCSTR)pSrc, nLenSrc, (BYTE*)pDst, &nDestLen );
if (!fRet) nDestLen=0;
return( nDestLen );
}
The encoder turned out to be quite speedy, but I was not surprised at how poorly the decoder performed — ATL (and STL) code is generally written with flexibility as a priority (various CPUs, different character widths, special options such as handling embedded CRLFs, and so forth.) Even so, these are a good choice for your production code — you can be certain that these functions are in use at many thousands of sites that depend upon it for mission-critical operations.
// simple version illustrates the basic encode algorithm
//
static char Base64Digits[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
//
int ToBase64Simple( const BYTE* pSrc, int nLenSrc, char* pDst, int nLenDst )
{
int nLenOut= 0;
while ( nLenSrc > 0 ) {
if (nLenOut+4 > nLenDst) return(0); // error
// read three source bytes (24 bits)
BYTE s1= pSrc[0]; // (but avoid reading past the end)
BYTE s2= 0; if (nLenSrc>1) s2=pSrc[1]; //------ corrected, thanks to jprichey
BYTE s3= 0; if (nLenSrc>2) s3=pSrc[2];
DWORD n;
n = s1; // xxx1
n <<= 8; // xx1x
n |= s2; // xx12
n <<= 8; // x12x
n |= s3; // x123
//-------------- get four 6-bit values for lookups
BYTE m4= n & 0x3f; n >>= 6;
BYTE m3= n & 0x3f; n >>= 6;
BYTE m2= n & 0x3f; n >>= 6;
BYTE m1= n & 0x3f;
//------------------ lookup the right digits for output
BYTE b1 = Base64Digits[m1];
BYTE b2 = Base64Digits[m2];
BYTE b3 = Base64Digits[m3];
BYTE b4 = Base64Digits[m4];
//--------- end of input handling
*pDst++ = b1;
*pDst++ = b2;
if ( nLenSrc >= 3 ) { // 24 src bits left to encode, output xxxx
*pDst++ = b3;
*pDst++ = b4;
}
if ( nLenSrc == 2 ) { // 16 src bits left to encode, output xxx=
*pDst++ = b3;
*pDst++ = '=';
}
if ( nLenSrc == 1 ) { // 8 src bits left to encode, output xx==
*pDst++ = '=';
*pDst++ = '=';
}
pSrc += 3;
nLenSrc -= 3;
nLenOut += 4;
}
// Could optionally append a NULL byte like so:
// *pDst++= 0; nLenOut++;
return( nLenOut );
}
Once you have worked out the bit-handling (I tried a number of methods; I think that the one above is the easiest to follow), there is still the issue of what to do at the end of the input. There might NOT be a multiple of three bytes in the final pass through the loop.
static BYTE LookupDigits[] = {
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, //gap: ctrl chars
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, //gap: ctrl chars
0,0,0,0,0,0,0,0,0,0,0, //gap: spc,!"#$%'()*
62, // +
0, 0, 0, // gap ,-.
63, // /
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, // 0-9
0, 0, 0, // gap: :;<
99, // = (end padding)
0, 0, 0, // gap: >?@
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15,16,
17,18,19,20,21,22,23,24,25, // A-Z
0, 0, 0, 0, 0, 0, // gap: [\]^_`
26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,
43,44,45,46,47,48,49,50,51, // a-z
0, 0, 0, 0, // gap: {|}~ (and the rest...)
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
};
For instance, if we see a z in the input, we can use ...
// simple version illustrates the basic decode algorithm
//
int FromBase64Simple( const BYTE* pSrc, int nLenSrc, char* pDst, int nLenDst )
{
int nLenOut= 0;
for( int j=0; j<nLenSrc; j+=4 ) {
if ( nLenOut > nLenDst ) {
return( 0 ); // error, buffer too small
}
BYTE s1= LookupDigits[ *pSrc++ ];
BYTE s2= LookupDigits[ *pSrc++ ];
BYTE s3= LookupDigits[ *pSrc++ ];
BYTE s4= LookupDigits[ *pSrc++ ];
BYTE d1= ((s1 & 0x3f) << 2) | ((s2 & 0x30) >> 4);
BYTE d2= ((s2 & 0x0f) << 4) | ((s3 & 0x3c) >> 2);
BYTE d3= ((s3 & 0x03) << 6) | ((s4 & 0x3f) >> 0);
*pDst++ = d1; nLenOut++;
if (s3==99) break; // end padding found
*pDst++ = d2; nLenOut++;
if (s4==99) break; // end padding found
*pDst++ = d3; nLenOut++;
}
return( nLenOut );
}
// d1 d2 d3 s1 s2 s3 s4
// 00000000 00000000 00000000 00aaaaaa 00bbbbbb 00cccccc 00dddddd
// 00aaaaaa 00000000 00000000 s1 << 2
// aaaaaa00 00000000 00000000 000000bb s2 >> 4
// aaaaaabb 00000000 00000000
// aaaaaabb 00000000 00000000 bbbb???? s2 << 4
// aaaaaabb bbbb0000 00000000 0000cccc s3 >> 2
// aaaaaabb bbbbcccc 00000000 cc?????? s3 << 6
// aaaaaabb bbbbcccc cc000000 00dddddd
// aaaaaabb bbbbcccc ccdddddd
Now we're getting into my favorite part of this type of "recreational programming":
DWORD n32;
n32 = s1; n32 <<= 6;
n32 |= s2; n32 <<= 6;
n32 |= s3; n32 <<= 6;
n32 |= s4;
BYTE d3= ( n32 & 0x00ff ); n32 >>= 8;
BYTE d2= ( n32 & 0x00ff ); n32 >>= 8;
BYTE d1= ( n32 & 0x00ff );
This variation rolls the bits into a temporary DWORD variable and back out — all in a shorter series of steps, and that turns out to be very efficient.
// DWORD temp variable four 6-bit values;
// 00000000 00000000 00000000 00000000 00aaaaaa 00bbbbbb 00cccccc 00dddddd
// 00000000 00000000 00000000 00aaaaaa -------- 00bbbbbb 00cccccc 00dddddd
// 00000000 00000000 0000aaaa aa??????
// 00000000 00000000 0000aaaa aabbbbbb -------- 00cccccc 00dddddd
// 00000000 000000aa aaaabbbb bb??????
// 00000000 000000aa aaaabbbb bbcccccc -------- 00dddddd
// 00000000 aaaaaabb bbbbcccc cc??????
// 00000000 aaaaaabb bbbbcccc ccdddddd --------
This has fewer shifting and masking operations, but perhaps more importantly, the C++ compiler optimizes some of the steps into simple 8-bit register loads and other fast operations.
// fastest encoding variation I've come up with (Not really, see comments after the article!)
//
int ToBase64Fast( const BYTE* pSrc, int nLenSrc, char* pDst, int nLenDst )
{
int nLenOut= ((nLenSrc+2)/3)*4; // 4 out for every 3 in, rounded up
if ( nLenOut+1 > nLenDst ) {
return( 0 ); // fail!
}
DWORD* pdwDst= (DWORD*)pDst;
while( nLenSrc > 2 ) {
DWORD n= pSrc[0]; // xxx1 // roll the data in...
n <<= 8; // xx1x
n |= pSrc[1]; // xx12
n <<= 8; // x12x
n |= pSrc[2]; // x123
BYTE m4= n & 0x3f; n >>= 6; // roll it out, 6 bits at a time
BYTE m3= n & 0x3f; n >>= 6;
BYTE m2= n & 0x3f; n >>= 6;
BYTE m1= n & 0x3f;
n = Base64Digits[m4]; n <<= 8;
n |= Base64Digits[m3]; n <<= 8;
n |= Base64Digits[m2]; n <<= 8;
n |= Base64Digits[m1];
*pdwDst++ = n; // write four bytes in one operation
nLenSrc -= 3;
pSrc += 3;
}
// -------------- end of buffer special handling (see text)
pDst= (char*)pdwDst;
if ( nLenSrc > 0 ) { // some left after last triple
int n1= (*pSrc & 0xfc) >> 2;
int n2= (*pSrc & 0x03) << 4;
if (nLenSrc > 1 ) { // corrected. Thanks to jprichey
pSrc++;
n2 |= (*pSrc & 0xf0) >> 4;
}
*pDst++ = Base64Digits[n1]; // encode at least 2 outputs
*pDst++ = Base64Digits[n2];
if (nLenSrc == 2) { // 2 src bytes left to encode, output xxx=
int n3= (*pSrc & 0x0f) << 2;
pSrc++;
n3 |= (*pSrc & 0xc0) >> 6;
*pDst++ = Base64Digits[n3];
}
if (nLenSrc == 1) { // 1 src byte left to encode, output xx==
*pDst++ = '=';
}
*pDst++ = '=';
}
// *pDst= 0; nLenOut++ // could terminate with NULL, here
return( nLenOut );
}
The trick in lines 22-25 is to get the bytes in the right order in the DWORD so that the "little endian" output is in the right order. It's easy once you've seen it done :-)
// Utra-Fast base64-decoding, using 128KB lookup table
int FromBase64Fast( const BYTE* pSrc, int nLenSrc, char* pDst, int nLenDst )
{
if ( NULL==gpLookup16 ) SetupLookup16(); // see below
int nLenOut= 0;
if ( nLenDst < ((nLenSrc / 4)-1) * 3 ) {
return (0); // (buffer too small)
}
int nLoopMax= (nLenSrc/4)-1;
WORD* pwSrc= (WORD*)pSrc;
for( int j=0; j<nLoopMax; j++ ) {
WORD s1= gpLookup16[ pwSrc[0] ]; // translate two "digits" at once
WORD s2= gpLookup16[ pwSrc[1] ]; // ... and two more
DWORD n32;
n32 = s1; // xxxxxxxx xxxxxxxx xx111111 222222xx
n32 <<= 10; // xxxxxxxx 11111122 2222xxxx xxxxxxxx
n32 |= s2 >> 2; // xxxxxxxx 11111122 22223333 33444444
BYTE b3= ( n32 & 0x00ff ); n32 >>= 8; // in reverse (WORD order)
BYTE b2= ( n32 & 0x00ff ); n32 >>= 8;
BYTE b1= ( n32 & 0x00ff );
// *pDst++ = b1; *pDst++ = b2; *pDst++ = b3; //slighly slower
pDst[0]= b1; // slightly faster
pDst[1]= b2;
pDst[2]= b3;
pwSrc += 2;
pDst += 3;
}
nLenOut= ((nLenSrc/4)-1) * 3;
//-------------------- special handling outside of loop for end
WORD s1= gpLookup16[ pwSrc[0] ];
WORD s2= gpLookup16[ pwSrc[1] ];
DWORD n32;
n32 = s1;
n32 <<= 10;
n32 |= s2 >> 2;
BYTE b3= ( n32 & 0x00ff ); n32 >>= 8;
BYTE b2= ( n32 & 0x00ff ); n32 >>= 8;
BYTE b1= ( n32 & 0x00ff );
if (nLenOut >= nLenDst) return(0); // error
*pDst++ = b1; nLenOut++;
if (b2 != 99) {
if (nLenOut >= nLenDst) return(0); // error
*pDst++ = b2; nLenOut++;
}
if (b3 != 99) {
if (nLenOut >= nLenDst) return(0); // error
*pDst++ = b3; nLenOut++;
}
return( nLenOut );
}
//----------------------------------------------------
// Using two-byte lookup table
// must call here before calling the above
//----------------------------------------------------
WORD* gpLookup16= 0;
void SetupLookup16()
{
int nLenTbl= 256*256; // yes, the table is 128Kb!
if (NULL==gpLookup16) {
gpLookup16= new WORD[nLenTbl];
}
WORD* p= gpLookup16;
for (int j= 0; j<256; j++ ) {
for (int k= 0; k<256; k++ ) {
WORD w;
w = LookupDigits[k] << 8;
w |= LookupDigits[j] << 2 ; // pre-shifted! See notes
*p++ = w;
}
}
}
Note the simplicity of the inner loop (lines 12-31). This is partially thanks to an optimization I discovered in creating the lookup table. I pre-shifted the data values in the table to avoid a series of mask-and-shift operations in the loop. Also, the one-byte variables (b1,b2,b3) end up being optimized into 8-bit register values by the compiler. The entire inner loop is just 56 opcodes long!
Rate Approximate
Version: (Kb/ms) Ratio
----------------- ----- -----------
ToBase64Crypto 133 3.4 : 1
ToBase64ATL 420 1.1 : 1
ToBase64Simple 224 1.9 : 1
ToBase64Fast 446 1 : 1
FromBase64Crypto 92 8.7 : 1
FromBase64ATL 84 9.5 : 1
FromBase64Simple 260 3.1 : 1
FromBase64Fast 805 1 : 1
Notes:
Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.
Comments (12)
Commented:
Once you've calculated the table (8k or 128k), could you turn it into a static resource?
Author
Commented:--- Dan
Commented:
Awesome article.
I'd like to use your 'fast-b64' routine in my app --- could you clarify the license?
Is it public domain / MIT / LGPL / other?
Thanks!
Ignac
Author
Commented:You have my permission to use the code in any way you want.
-- Dan
Commented:
Love the article.
I'm doing some MSVC++ coding and thought I'd try your Base64Fast functions out.
The first string I tried to encode was "encode me!" (10 bytes) which yielded "ZW5jb2RlIG1lIQ==". But then when I passed that back into the decode function it returns 12 bytes "encode me!¿p".
I tried "hello world!" which encoded to "aGVsbG8gd29ybGQh" and decoded back to "hello world!" perfectly.
I tried a four letter word and it came back with 6 after the decode that looked similar to other extra non-alpha characters. So, 4 character and 10 character sources seem to be causing a problem.
I tested the "ZW5jb2RlIG1lIQ==" string via the web based decoder and it decoded correctly. There must be a gotcha in the FromBase64Fast code. I'll update if/when I figure it out.
Update:
The condition of b2 or b3 being 99 never triggered when the "=" pad came up. My solution is a total hack but it seems to work for now, I'll update again with something sexier later.
Until then, consider these changes to the last section of code for "FromBase64Fast":
Open in new window
I ran some generated data through it and it matched the results correctly. More testing is needed, but so far so good.Cheers,
Chris
View More