Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

I use base-62 encoding quite a lot as it's URL safe and reasonably obfuscated, however, my current implementation is not very fast and I'm wondering if there's a better route, perhaps using an iterative method, perhaps using pack() and friends. As an example: base62_encode('http://www.php.net') gives 'ScoQhfdZc5xJxw6hwvPQMo3W'.

My current implementation use bc_math to treat strings as numbers in base-256, converts them to base-10, then to base-62 (or the reverse). I do this with these functions which do arbitrary base conversions:

function dec2base($dec, $base, $digits = FALSE) {

if($base < 2 or $base > 256) {

die("Invalid Base: .$base\n");

}

bcscale(0);

$value = '';

if(!$digits) {

$digits = digits($base);

}

while($dec > $base - 1) {

$rest = bcmod($dec,$base);

$dec = bcdiv($dec,$base);

$value = $digits[$rest].$value;

}

$value=$digits[intval($dec)].$value;

return (string) $value;

}

function base2dec($value, $base, $digits = FALSE) {

if($base < 2 or $base > 256) {

die("Invalid Base: .$base\n");

}

bcscale(0);

if($base < 37) {

$value = strtolower($value);

}

if(!$digits) {

$digits = digits($base);

}

$size = strlen($value);

$dec = '0';

for($loop=0; $loop < $size; $loop++) {

$element = strpos($digits, $value[$loop]);

$power = bcpow($base, $size-$loop-1);

$dec = bcadd($dec, bcmul($element, $power));

}

return (string)$dec;

}

function digits($base) {

if($base < 64) {

return substr('0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_', 0, $base);

} else {

return substr("\x0\x1\x2\x3\x4\x5\x6\x7\x8\x9\xa\xb\xc\xd\xe\xf\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !\x22#\x24%&'()*+,-./0123456789:;<=>\x3f@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff", 0, $base);

}

}

function base62_encode($value) {

return dec2base(base2dec($value, 256), 62);

}

function base62_decode($value) {

return dec2base(base2dec($value, 62), 256);

}

So, they all work ok, but it's a bit roundabout. It would be nice to go directly from 256->62, but I can't seem to think that through. There are some base conversion functions in bc_math, but they only go up to base-36. The same goes for the big_int extension in pecl, which despite it's claims, seems slower than bc_math.

I think I need to ask someone who's more of a mathematician than a coder!

My current implementation use bc_math to treat strings as numbers in base-256, converts them to base-10, then to base-62 (or the reverse). I do this with these functions which do arbitrary base conversions:

function dec2base($dec, $base, $digits = FALSE) {

if($base < 2 or $base > 256) {

die("Invalid Base: .$base\n");

}

bcscale(0);

$value = '';

if(!$digits) {

$digits = digits($base);

}

while($dec > $base - 1) {

$rest = bcmod($dec,$base);

$dec = bcdiv($dec,$base);

$value = $digits[$rest].$value;

}

$value=$digits[intval($dec

return (string) $value;

}

function base2dec($value, $base, $digits = FALSE) {

if($base < 2 or $base > 256) {

die("Invalid Base: .$base\n");

}

bcscale(0);

if($base < 37) {

$value = strtolower($value);

}

if(!$digits) {

$digits = digits($base);

}

$size = strlen($value);

$dec = '0';

for($loop=0; $loop < $size; $loop++) {

$element = strpos($digits, $value[$loop]);

$power = bcpow($base, $size-$loop-1);

$dec = bcadd($dec, bcmul($element, $power));

}

return (string)$dec;

}

function digits($base) {

if($base < 64) {

return substr('0123456789abcdefgh

} else {

return substr("\x0\x1\x2\x3\x4\x5

}

}

function base62_encode($value) {

return dec2base(base2dec($value, 256), 62);

}

function base62_decode($value) {

return dec2base(base2dec($value, 62), 256);

}

So, they all work ok, but it's a bit roundabout. It would be nice to go directly from 256->62, but I can't seem to think that through. There are some base conversion functions in bc_math, but they only go up to base-36. The same goes for the big_int extension in pecl, which despite it's claims, seems slower than bc_math.

I think I need to ask someone who's more of a mathematician than a coder!

Experts Exchange Solution brought to you by

Enjoy your complimentary solution view.

Get every solution instantly with premium.
Start your 7-day free trial.

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

for($loop=0; $loop < $size; $loop++) {

$dec = $dec * $base + strpos($digits, $value[$loop]);

}

For example, the 256-bit number 'http://www.experts-exchange.com/' converts to this in base-10:

47246294140391775270645689

Would you be willing to waste a litte space in order to gain some speed?

or if you use 21 base 62 digits to encode 125 bits at a time, you can limit the size of your multiprecision arithmetic

$dec = bcadd($element, bcmul($dec, $base)); would at least cut out a bcpow($base, $size-$loop-1); operation

Also, all the usual implementations add padding characters.

You second comment is more like the kind of thing I was after! Needs a bit of thinking about.

As it happens, since I originally posted this question my need for this function has somewhat diminished, but I'm still curious for a solution.

multi-precision arithmetic usually gets slower as the numbers get larger, so if you can do in chunks you may speed things up.

e.g. instead of base 10, if you work in base 10^9, you may be able to speed things by a factor of 9,

and its easy to convert from base 10 to base 10^9 and vice versa

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trialln(62)/ln(2) = 5.954 bits. You could represent 2 url characters with 3 base-62 characters, 4 url characters with up to 6 (5.37) base-62 characters (no point in doing this because it's the same expansion as for 2 url characters), 8 url characters with up to 11 (10.75) base-62 characters.

PHP

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Experts Exchange Solution brought to you by

Enjoy your complimentary solution view.

Get every solution instantly with premium.
Start your 7-day free trial.