Link to home
Start Free TrialLog in
Avatar of Marcus Bointon
Marcus BointonFlag for France

asked on

Fast base-62 encode/decode

I use base-62 encoding quite a lot as it's URL safe and reasonably obfuscated, however, my current implementation is not very fast and I'm wondering if there's a better route, perhaps using an iterative method, perhaps using pack() and friends. As an example: base62_encode('http://www.php.net') gives 'ScoQhfdZc5xJxw6hwvPQMo3W'.

My current implementation use bc_math to treat strings as numbers in base-256, converts them to base-10, then to base-62 (or the reverse). I do this with these functions which do arbitrary base conversions:

function dec2base($dec, $base, $digits = FALSE) {
      if($base < 2 or $base > 256) {
            die("Invalid Base: .$base\n");
      }
      bcscale(0);
      $value = '';
      if(!$digits) {
            $digits = digits($base);
      }
      while($dec > $base - 1) {
            $rest = bcmod($dec,$base);
            $dec = bcdiv($dec,$base);
            $value = $digits[$rest].$value;
      }
      $value=$digits[intval($dec)].$value;
      return (string) $value;
}

function base2dec($value, $base, $digits = FALSE) {
      if($base < 2 or $base > 256) {
            die("Invalid Base: .$base\n");
      }
      bcscale(0);
      if($base < 37) {
            $value = strtolower($value);
      }
      if(!$digits) {
            $digits = digits($base);
      }
      $size = strlen($value);
      $dec = '0';
      for($loop=0; $loop < $size; $loop++) {
            $element = strpos($digits, $value[$loop]);
            $power = bcpow($base, $size-$loop-1);
            $dec = bcadd($dec, bcmul($element, $power));
      }
      return (string)$dec;
}

function digits($base) {
      if($base < 64) {
            return substr('0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_', 0, $base);
      } else {
            return substr("\x0\x1\x2\x3\x4\x5\x6\x7\x8\x9\xa\xb\xc\xd\xe\xf\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !\x22#\x24%&'()*+,-./0123456789:;<=>\x3f@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff", 0, $base);
      }
}

function base62_encode($value) {
      return dec2base(base2dec($value, 256), 62);
}

function base62_decode($value) {
      return dec2base(base2dec($value, 62), 256);
}

So, they all work ok, but it's a bit roundabout. It would be nice to go directly from 256->62, but I can't seem to think that through. There are some base conversion functions in bc_math, but they only go up to base-36. The same goes for the big_int extension in pecl, which despite it's claims, seems slower than bc_math.

I think I need to ask someone who's more of a mathematician than a coder!
Avatar of Marcus Bointon
Marcus Bointon
Flag of France image

ASKER

Sorry about that long line...
BTW, some of these functions were adapted from PHP user notes.
Are the numbers too large to use base_convert() ?
http://php.net/base_convert

Sorry, disregard my comment, I had forgotten it only goes to base-36 (like you already stated in your question :)
   $dec = 0;
     for($loop=0; $loop < $size; $loop++) {
          $dec = $dec * $base + strpos($digits, $value[$loop]);
     }
That's no use - the numbers involved are well outside what's feasible to handle using ints.
For example, the 256-bit number 'https://www.experts-exchange.com/' converts to this in base-10:

47246294140391775270645689091620257283684500560481702162465929546238692125999
Could you do base 64 encoding instead of base 62?
Would you be willing to waste a litte space in order to gain some speed?
If you could use 5 base 62 digits to encode 29 bits at a time, it would be feasible to do it with 32 bit integer arithmetic
or if you use 21 base 62 digits to encode 125 bits at a time, you can limit the size of your multiprecision arithmetic
          $dec = bcadd($element, bcmul($dec, $base)); would at least cut out a bcpow($base, $size-$loop-1); operation
I would use base-64, but it's not is not safe to use in all of URLs, domain-names, pathnames and email addresses.
Also, all the usual implementations add padding characters.

You second comment is more like the kind of thing I was after! Needs a bit of thinking about.

As it happens, since I originally posted this question my need for this function has somewhat diminished, but I'm still curious for a solution.
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial