- Experts Exchange Approved
What is Base64?
Base64 is a way to represent binary data in a way that avoids some of the problems associated with binary data. When you handle binary data, you need to take special care about embedded nulls, commas, quotes, apostrophes, the less-than, ampersand, and space characters, and so forth. Base64 data, on the other hand, is simple text; you can assign it to a string data type, copy it, scan, and transfer it without worries.
You may see base64-encoded data in emails and in XML CDATA sections. If you have ever worked with HTTP headers, you have probably seen:
The text after the word, Basic is a base64-encoding of a username and password (specifically, that string encodes MyUserName:MyPassword).
You may see base64 used to transport encrypted data, which often contains such otherwise "awkward" characters. In an earlier article, Easy String Encryption Using CryptoAPI in C++, I recommended converting encrypted data to hexadecimal to avoid special-character handling problems. Base64 can be used similarly, and it has one advantage over hexadecimal encoding: It is more concise. While hex-encoded data is twice the size of the original binary data, base64-encoded data is only 1.33 times the size. For instance, if a hexadecimal encoding is 5,000 bytes long, a base64 encoding of the same source data will be 3,336 bytes long (saving 1664 bytes of transmission bandwidth).
Binary uses two digits (01), base 10 — decimal — uses 10 digits (0123456789), base 16 — hexadecimal — uses 16 (0123456789ABCDEF). So, as you might expect, there are 64 "digits" used in base64 encoding:
ABCDEFGHIJKLMNOPQRSTUVWXY
Each digit encodes six bits of source data. So it takes four digits to encode three bytes of source data (4*6 = 3*8 = 24 bits). This makes encoding and decoding much trickier than hex encoding. Also, there is the issue that any base64-encoded data must end up as an exact multiple of four digits. Since the source data is often not an exact multiple of three bytes in length, you need special end-of-string logic to pad the base64 output when encoding and to ignore that padding when decoding.
Encoding and Decoding Options
We will look at four options available to C++ programmers who need to encode and decode base64:
- Windows CryptoAPI functions
- The ATL (Active Template Library) functions
- A pair of straightforward functions that I wrote
- My ultimate high-speed functions
For each option, I wrote a small wrapper so that I could do testing with uniform parameters. At the end of this article, I'll show a timing/performance comparison chart, and I'll provide links to documentation of the standard library functions.
For years, I've done my base64 work using a C++ class object written by PJ Naughter. I wanted to include his code in this article, but when I surfed around to find it, I learned that his programs no longer contain the base64-handling functions. He switched to using the ATL functions a while back. So I'll be showing that option.
The last one I'll show was a result of a "recreational programming" challenge to myself: Can I write code that will out-perform Microsoft? You might consider using this high-speed code if, for instance, your website must encode many large PDF or image files for transmission.
Windows CryptoAPI: CryptBinaryToString and CryptStringToBinary
The Windows CryptoAPI provides a set of general-purpose functions (CryptBinaryToString and CryptStringToBinary) that support base64 encoding and decoding. The following is a pair of functions that wrap that API:
I used these CryptoAPI functions to validate my own functions and for performance testing. These were, as I expected, poor performers. But I really can't fault you if you use these functions. They are reliable and bullet-proof.
ATL functions: Base64Encode and Base64Decode
The ATL header files that have come with VC++ since VS2005 provide a nice pair of base64 tools (though the decoder is rather slow). The following is my wrapper for those functions.
The encoder turned out to be quite speedy, but I was not surprised at how poorly the decoder performed — ATL (and STL) code is generally written with flexibility as a priority (various CPUs, different character widths, special options such as handling embedded CRLFs, and so forth.) Even so, these are a good choice for your production code — you can be certain that these functions are in use at many thousands of sites that depend upon it for mission-critical operations.
An advantage of these functions is that the source code is on your disk. If you want to see how it's done, this is some high-quality code for you to review.
My own straightforward functions
These functions illustrate the bit manipulation needed for encoding and decoding.
Simple Version: Encoding
The base64 encoding algorithm can be described simply:
Once you have worked out the bit-handling (I tried a number of methods; I think that the one above is the easiest to follow), there is still the issue of what to do at the end of the input. There might NOT be a multiple of three bytes in the final pass through the loop.
This creates two problems: firstly, it would be improper (even fatal) to read even one byte beyond the end of the input buffer, so lines 14-15 simulate an input value of 0 when that would happen; secondly, when only one or two source bytes remain, the coded string must end with one or two padding characters. Both exceptions require special handling.
If you have an eye for performance optimization, you will see that the above implementation is very inefficient. Lines 39-50 check for the end-of-buffer cases in every pass through loop. Imagine encoding a 3 Mb buffer of data — the many "if" statements must be executed one million times. The above code also checks for overrun of the caller's output buffer each time through the loop. In the ultra-fast version, we'll move all of the special handling to the outside of the loop (and we'll use a few other optimization tricks, to boot).
Simple Version: Decoding
The decoding algorithm can also be stated simply:
And once again, we have end-of-buffer issues: If we get an equal sign, we know not to write a byte of output. More specifically, instead of three bytes, we must output just one or two.
The table below is used to convert base64 digits and padding (A-Z, a-z, 0-9, +, /, or =) into a binary value. The table is 256 bytes long so that the 8-bit character code from the input can be used directly as an index into the table. Note that UNICODE or UTF or character translation issues don't apply here... the table just holds an ordered sequence of 8-bit binary values.
For instance, if we see a z in the input, we can use ...
n= LookupDigits[ 'z' ];
... to obtain the 6-bit value, 51 (0x33, 00110011b), that z encodes. If we see an equal sign (=), the looked-up value will be 99 which we can use as a flag during end-of-buffer handling.
So here's the relatively simple, though pretty fast, code:
As with the simple encoder, this simple decoder could can also be improved by moving the end-of-buffer handling outside of the main loop. I tried other tricks...
In an earlier article on hexadecimal conversion, I was able to squeeze out a bit more speed by reading and writing in 32-bit DWORD blocks. I tried DWORD reads here, but perhaps because of CPU and L1 caching differences or perhaps because of the overhead of rearranging the "little-endian" bytes, I could not get a performance increase. I went with the above since it is quite straightforward.
The incoming digits must be handled in a very specific way. Lines 10-13 convert the input into a set of four 6-bit values. Lines 15-17 pack the total of 24 bits into three bytes. The lines mean: (15) the bits of the first digit are moved all the way to the left of the first byte, to which two bits of the second digit are added; (16) the remaining four bits are assembled with four bits of the third digit; (17) the two remaining bits and the entire last digit make up the third byte. This is illustrated below.
Now we're getting into my favorite part of this type of "recreational programming":
— Is there a better way to do that last part?
— There is!
This variation rolls the bits into a temporary DWORD variable and back out — all in a shorter series of steps, and that turns out to be very efficient.
This has fewer shifting and masking operations, but perhaps more importantly, the C++ compiler optimizes some of the steps into simple 8-bit register loads and other fast operations.
My High-Performance Functions
The following functions use optimization tricks and techniques to improve performance.
High-Speed Version: Encoding
This is similar to the "simple" version earlier, but I've taken the end-of-buffer handling out of the inner loop. I also improved upon the "DWORD roll-in" technique, by writing the DWORD out as a single operation.
The trick in lines 22-25 is to get the bytes in the right order in the DWORD so that the "little endian" output is in the right order. It's easy once you've seen it done :-)
Also, note that the function is overall much longer. But all of the additional code is repeated logic from the inner loop code... removed from the inner loop to avoid if/then conditional jumps. The CPU can spin through the inner loop without pre-fetch flushing or other unwanted interruptions.
I wrote some assembly-language code to replace the inner loop, but I could not get a performance boost. The compiler is much smarter than me... it shuffles the code sequence in strange ways (to, I presume, avoid pipeline stalls) and delving into that would be a topic for a different day.
High-Speed Version: Decoding
Some will say that with this function, I've edged over from the sublime and stepped into a steaming stinky pile of the ridiculous. That's because the following base64 decoding function uses a 128Kb lookup table.
Yes, you heard me right. Before running the first time, this function allocates and builds a 131,072-byte lookup table. The result is that the inner loop can lookup values of two base-64 digits at the same time. You can make the table smaller (as little as about 30K) and still do a 16-bit lookup, but that would involve doing some calculations on each byte of the input data — likely costing a significant fraction of the performance gain.
Here's the code:
Note the simplicity of the inner loop (lines 12-31). This is partially thanks to an optimization I discovered in creating the lookup table. I pre-shifted the data values in the table to avoid a series of mask-and-shift operations in the loop. Also, the one-byte variables (b1,b2,b3) end up being optimized into 8-bit register values by the compiler. The entire inner loop is just 56 opcodes long!
The function calls SetupLookup16() the first time it is used. If you were to wrap this into a C++ object, you would persist that table as an object member and initialize it in the constructor. The time spent building the table is spread out across all invocations of the decoding function.
Is this at all practical? Probably not.
While I can imagine a web server needing an ultra-fast encoder to process Gigabytes of outgoing PDF files, it is difficult to come up with a scenario where high-speed decoding was worth the price in memory usage. On the other hand, with 6GB of RAM coming standard on many home computers, 128K is only a tiny fraction of 1% of the system memory. So, as Tom Cruise says in "Risky Business" — What the heck!
Summary
This article reviewed various options for encoding and decoding base64 data. We looked at the algorithms and at C++ code to implement them.
I did time trials on the various options and the results are shown in the following table:
Notes:
- Added post-publication:
A significantly faster encoding algorithm has been posted in the comments, below. See
http://www.experts-exchange.com/A_32 16.html#c1 5564
By using an 8Kb lookup table, the encoder can process 688 Kb/ms (50% faster!)
- The CryptoAPI encoder is much slower even than an unoptimized "roll your own" encoder.
- The ATL encoder is very fast, within the margin of error of the speed of my best effort in this article (Added: But NOT as good as the one in the article comments, here).
- For decoding, even the "simple" version that I wrote is considerably faster than either the ATL version or the CryptoAPI version.
- My ultra-fast decoder is nearly ten times as fast as other available options, but it has the drawback of using a 128Kb lookup table -- which is probably impractical. Note, too that timing results do not include time taken to create that lookup table.
Testing methodology:
I read-in a 1MB JPG file and ran 1000 iterations on it. I timed only the outer loop, and I used the relatively low-resolution GetTickCount() function. However, the duration of each test was at least several seconds (minimizing resolution issues) and results were repeatable, with only small variations.
All code versions were 32-bit executables built in Release mode, with full optimization enabled.
I ran the tests on a 2.60 GHz AMD Athlon II QuadCore 620 CPU. While running, my CPU meter went to 28%, and indicated that one core was spiking at 100%.
Writing and testing this code was fun and challenging. If you are interested in trying your hand at writing faster versions, then I'd be delighted to look at your code. I'm quite certain that hand-optimized ASM code, and especially code that uses 64-bit registers and advanced MMX/SSE/AVX operations, can out-perform anything I've tried here. But I'd especially like to see standard 32-bit code in which you have implemented an improved algorithm. Perhaps a multi-threaded version that could take advantage of those idle cycles on a secondary CPU core...
Resources:
Formal description of base64 (RFC 2045)
http://www.ietf.org/rfc/rf
Wikipedia article, Base64 (excellent coverage and quirk description)
http://en.wikipedia.org/wi
MSDN Documentation:
CryptBinaryToString http://msdn.microsoft.com/
CryptStringToBinary http://msdn.microsoft.com/
ATL Base64Encode http://msdn.microsoft.com/
ATL Base64Decode http://msdn.microsoft.com/
A website where you can encode or decode base64:
http://www.opinionatedgeek
Script-accessible way to encode and decode base64 (uses MSXML)
http://www.nonhostile.com/
http://www.nonhostile.com/
Related Articles Written by Dan Rollins:
Easy String Encryption Using CryptoAPI in C++
Fast Hexadecimal Encode and Decode
Convert String to int / Convert int to String in C++
Binary Bit Flags: Tutorial and Usage Tips
=-=-=-=-=-=-=-=-=-=-=-=-=-
If you liked this article and want to see more from this author, please click the Yes button near the:
Was this article helpful?
label that is just below and to the right of this text. Thanks!
=-=-=-=-=-=-=-=-=-=-=-=-=-
by: Infinity08 on 2010-06-09 at 01:07:33ID: 15538