Fast base-62 encode/decode

Marcus Bointon
Marcus Bointon used Ask the Experts™
on
I use base-62 encoding quite a lot as it's URL safe and reasonably obfuscated, however, my current implementation is not very fast and I'm wondering if there's a better route, perhaps using an iterative method, perhaps using pack() and friends. As an example: base62_encode('http://www.php.net') gives 'ScoQhfdZc5xJxw6hwvPQMo3W'.

My current implementation use bc_math to treat strings as numbers in base-256, converts them to base-10, then to base-62 (or the reverse). I do this with these functions which do arbitrary base conversions:

function dec2base($dec, $base, $digits = FALSE) {
      if($base < 2 or $base > 256) {
            die("Invalid Base: .$base\n");
      }
      bcscale(0);
      $value = '';
      if(!$digits) {
            $digits = digits($base);
      }
      while($dec > $base - 1) {
            $rest = bcmod($dec,$base);
            $dec = bcdiv($dec,$base);
            $value = $digits[$rest].$value;
      }
      $value=$digits[intval($dec)].$value;
      return (string) $value;
}

function base2dec($value, $base, $digits = FALSE) {
      if($base < 2 or $base > 256) {
            die("Invalid Base: .$base\n");
      }
      bcscale(0);
      if($base < 37) {
            $value = strtolower($value);
      }
      if(!$digits) {
            $digits = digits($base);
      }
      $size = strlen($value);
      $dec = '0';
      for($loop=0; $loop < $size; $loop++) {
            $element = strpos($digits, $value[$loop]);
            $power = bcpow($base, $size-$loop-1);
            $dec = bcadd($dec, bcmul($element, $power));
      }
      return (string)$dec;
}

function digits($base) {
      if($base < 64) {
            return substr('0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_', 0, $base);
      } else {
            return substr("\x0\x1\x2\x3\x4\x5\x6\x7\x8\x9\xa\xb\xc\xd\xe\xf\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !\x22#\x24%&'()*+,-./0123456789:;<=>\x3f@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff", 0, $base);
      }
}

function base62_encode($value) {
      return dec2base(base2dec($value, 256), 62);
}

function base62_decode($value) {
      return dec2base(base2dec($value, 62), 256);
}

So, they all work ok, but it's a bit roundabout. It would be nice to go directly from 256->62, but I can't seem to think that through. There are some base conversion functions in bc_math, but they only go up to base-36. The same goes for the big_int extension in pecl, which despite it's claims, seems slower than bc_math.

I think I need to ask someone who's more of a mathematician than a coder!
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Sorry about that long line...
BTW, some of these functions were adapted from PHP user notes.

Commented:
Are the numbers too large to use base_convert() ?
http://php.net/base_convert

OWASP: Avoiding Hacker Tricks

Learn to build secure applications from the mindset of the hacker and avoid being exploited.

Commented:
Sorry, disregard my comment, I had forgotten it only goes to base-36 (like you already stated in your question :)
ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
   $dec = 0;
     for($loop=0; $loop < $size; $loop++) {
          $dec = $dec * $base + strpos($digits, $value[$loop]);
     }
That's no use - the numbers involved are well outside what's feasible to handle using ints.
For example, the 256-bit number 'http://www.experts-exchange.com/' converts to this in base-10:

47246294140391775270645689091620257283684500560481702162465929546238692125999
ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
Could you do base 64 encoding instead of base 62?
Would you be willing to waste a litte space in order to gain some speed?
ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
If you could use 5 base 62 digits to encode 29 bits at a time, it would be feasible to do it with 32 bit integer arithmetic
or if you use 21 base 62 digits to encode 125 bits at a time, you can limit the size of your multiprecision arithmetic
          $dec = bcadd($element, bcmul($dec, $base)); would at least cut out a bcpow($base, $size-$loop-1); operation
I would use base-64, but it's not is not safe to use in all of URLs, domain-names, pathnames and email addresses.
Also, all the usual implementations add padding characters.

You second comment is more like the kind of thing I was after! Needs a bit of thinking about.

As it happens, since I originally posted this question my need for this function has somewhat diminished, but I'm still curious for a solution.
Most Valuable Expert 2014
Top Expert 2015
Commented:
Since 62^n != 256^m unless m ==0 and n == 0, base62 will have padding too
multi-precision arithmetic usually gets slower as the numbers get larger, so if you can do in chunks you may speed things up.
e.g. instead of base 10, if you work in base 10^9, you may be able to speed things by a factor of 9,
and its easy to convert from base 10 to base 10^9 and vice versa
Instead of treating the string as a huge multiprecision number, you could treat it as a series of fixed-precision smaller numbers.  This would cost you about 20% more space but it should be a lot faster to calculate.

ln(62)/ln(2) = 5.954 bits.  You could represent 2 url characters with 3 base-62 characters, 4 url characters with up to 6 (5.37) base-62 characters (no point in doing this because it's the same expansion as for 2 url characters), 8 url characters with up to 11 (10.75) base-62 characters.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial