Fast base-62 encode/decode

I use base-62 encoding quite a lot as it's URL safe and reasonably obfuscated, however, my current implementation is not very fast and I'm wondering if there's a better route, perhaps using an iterative method, perhaps using pack() and friends. As an example: base62_encode('http://www.php.net') gives 'ScoQhfdZc5xJxw6hwvPQMo3W'.

My current implementation use bc_math to treat strings as numbers in base-256, converts them to base-10, then to base-62 (or the reverse). I do this with these functions which do arbitrary base conversions:

function dec2base($dec, $base, $digits = FALSE) {
      if($base < 2 or $base > 256) {
            die("Invalid Base: .$base\n");
      }
      bcscale(0);
      $value = '';
      if(!$digits) {
            $digits = digits($base);
      }
      while($dec > $base - 1) {
            $rest = bcmod($dec,$base);
            $dec = bcdiv($dec,$base);
            $value = $digits[$rest].$value;
      }
      $value=$digits[intval($dec)].$value;
      return (string) $value;
}

function base2dec($value, $base, $digits = FALSE) {
      if($base < 2 or $base > 256) {
            die("Invalid Base: .$base\n");
      }
      bcscale(0);
      if($base < 37) {
            $value = strtolower($value);
      }
      if(!$digits) {
            $digits = digits($base);
      }
      $size = strlen($value);
      $dec = '0';
      for($loop=0; $loop < $size; $loop++) {
            $element = strpos($digits, $value[$loop]);
            $power = bcpow($base, $size-$loop-1);
            $dec = bcadd($dec, bcmul($element, $power));
      }
      return (string)$dec;
}

function digits($base) {
      if($base < 64) {
            return substr('0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_', 0, $base);
      } else {
            return substr("\x0\x1\x2\x3\x4\x5\x6\x7\x8\x9\xa\xb\xc\xd\xe\xf\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !\x22#\x24%&'()*+,-./0123456789:;<=>\x3f@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff", 0, $base);
      }
}

function base62_encode($value) {
      return dec2base(base2dec($value, 256), 62);
}

function base62_decode($value) {
      return dec2base(base2dec($value, 62), 256);
}

So, they all work ok, but it's a bit roundabout. It would be nice to go directly from 256->62, but I can't seem to think that through. There are some base conversion functions in bc_math, but they only go up to base-36. The same goes for the big_int extension in pecl, which despite it's claims, seems slower than bc_math.

I think I need to ask someone who's more of a mathematician than a coder!
LVL 25
Marcus BointonAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Marcus BointonAuthor Commented:
Sorry about that long line...
0
Marcus BointonAuthor Commented:
BTW, some of these functions were adapted from PHP user notes.
0
TeRReFCommented:
Are the numbers too large to use base_convert() ?
http://php.net/base_convert

0
Rowby Goren Makes an Impact on Screen and Online

Learn about longtime user Rowby Goren and his great contributions to the site. We explore his method for posing questions that are likely to yield a solution, and take a look at how his career transformed from a Hollywood writer to a website entrepreneur.

TeRReFCommented:
Sorry, disregard my comment, I had forgotten it only goes to base-36 (like you already stated in your question :)
0
ozoCommented:
   $dec = 0;
     for($loop=0; $loop < $size; $loop++) {
          $dec = $dec * $base + strpos($digits, $value[$loop]);
     }
0
Marcus BointonAuthor Commented:
That's no use - the numbers involved are well outside what's feasible to handle using ints.
For example, the 256-bit number 'http://www.experts-exchange.com/' converts to this in base-10:

47246294140391775270645689091620257283684500560481702162465929546238692125999
0
ozoCommented:
Could you do base 64 encoding instead of base 62?
Would you be willing to waste a litte space in order to gain some speed?
0
ozoCommented:
If you could use 5 base 62 digits to encode 29 bits at a time, it would be feasible to do it with 32 bit integer arithmetic
or if you use 21 base 62 digits to encode 125 bits at a time, you can limit the size of your multiprecision arithmetic
          $dec = bcadd($element, bcmul($dec, $base)); would at least cut out a bcpow($base, $size-$loop-1); operation
0
Marcus BointonAuthor Commented:
I would use base-64, but it's not is not safe to use in all of URLs, domain-names, pathnames and email addresses.
Also, all the usual implementations add padding characters.

You second comment is more like the kind of thing I was after! Needs a bit of thinking about.

As it happens, since I originally posted this question my need for this function has somewhat diminished, but I'm still curious for a solution.
0
ozoCommented:
Since 62^n != 256^m unless m ==0 and n == 0, base62 will have padding too
multi-precision arithmetic usually gets slower as the numbers get larger, so if you can do in chunks you may speed things up.
e.g. instead of base 10, if you work in base 10^9, you may be able to speed things by a factor of 9,
and its easy to convert from base 10 to base 10^9 and vice versa
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
NovaDenizenCommented:
Instead of treating the string as a huge multiprecision number, you could treat it as a series of fixed-precision smaller numbers.  This would cost you about 20% more space but it should be a lot faster to calculate.

ln(62)/ln(2) = 5.954 bits.  You could represent 2 url characters with 3 base-62 characters, 4 url characters with up to 6 (5.37) base-62 characters (no point in doing this because it's the same expansion as for 2 url characters), 8 url characters with up to 11 (10.75) base-62 characters.

0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.