Learn to build secure applications from the mindset of the hacker and avoid being exploited.

I use base-62 encoding quite a lot as it's URL safe and reasonably obfuscated, however, my current implementation is not very fast and I'm wondering if there's a better route, perhaps using an iterative method, perhaps using pack() and friends. As an example: base62_encode('http://www.php.net') gives 'ScoQhfdZc5xJxw6hwvPQMo3W'.

My current implementation use bc_math to treat strings as numbers in base-256, converts them to base-10, then to base-62 (or the reverse). I do this with these functions which do arbitrary base conversions:

function dec2base($dec, $base, $digits = FALSE) {

if($base < 2 or $base > 256) {

die("Invalid Base: .$base\n");

}

bcscale(0);

$value = '';

if(!$digits) {

$digits = digits($base);

}

while($dec > $base - 1) {

$rest = bcmod($dec,$base);

$dec = bcdiv($dec,$base);

$value = $digits[$rest].$value;

}

$value=$digits[intval($dec)].$value;

return (string) $value;

}

function base2dec($value, $base, $digits = FALSE) {

if($base < 2 or $base > 256) {

die("Invalid Base: .$base\n");

}

bcscale(0);

if($base < 37) {

$value = strtolower($value);

}

if(!$digits) {

$digits = digits($base);

}

$size = strlen($value);

$dec = '0';

for($loop=0; $loop < $size; $loop++) {

$element = strpos($digits, $value[$loop]);

$power = bcpow($base, $size-$loop-1);

$dec = bcadd($dec, bcmul($element, $power));

}

return (string)$dec;

}

function digits($base) {

if($base < 64) {

return substr('0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_', 0, $base);

} else {

return substr("\x0\x1\x2\x3\x4\x5\x6\x7\x8\x9\xa\xb\xc\xd\xe\xf\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !\x22#\x24%&'()*+,-./0123456789:;<=>\x3f@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff", 0, $base);

}

}

function base62_encode($value) {

return dec2base(base2dec($value, 256), 62);

}

function base62_decode($value) {

return dec2base(base2dec($value, 62), 256);

}

So, they all work ok, but it's a bit roundabout. It would be nice to go directly from 256->62, but I can't seem to think that through. There are some base conversion functions in bc_math, but they only go up to base-36. The same goes for the big_int extension in pecl, which despite it's claims, seems slower than bc_math.

I think I need to ask someone who's more of a mathematician than a coder!

My current implementation use bc_math to treat strings as numbers in base-256, converts them to base-10, then to base-62 (or the reverse). I do this with these functions which do arbitrary base conversions:

function dec2base($dec, $base, $digits = FALSE) {

if($base < 2 or $base > 256) {

die("Invalid Base: .$base\n");

}

bcscale(0);

$value = '';

if(!$digits) {

$digits = digits($base);

}

while($dec > $base - 1) {

$rest = bcmod($dec,$base);

$dec = bcdiv($dec,$base);

$value = $digits[$rest].$value;

}

$value=$digits[intval($dec

return (string) $value;

}

function base2dec($value, $base, $digits = FALSE) {

if($base < 2 or $base > 256) {

die("Invalid Base: .$base\n");

}

bcscale(0);

if($base < 37) {

$value = strtolower($value);

}

if(!$digits) {

$digits = digits($base);

}

$size = strlen($value);

$dec = '0';

for($loop=0; $loop < $size; $loop++) {

$element = strpos($digits, $value[$loop]);

$power = bcpow($base, $size-$loop-1);

$dec = bcadd($dec, bcmul($element, $power));

}

return (string)$dec;

}

function digits($base) {

if($base < 64) {

return substr('0123456789abcdefgh

} else {

return substr("\x0\x1\x2\x3\x4\x5

}

}

function base62_encode($value) {

return dec2base(base2dec($value, 256), 62);

}

function base62_decode($value) {

return dec2base(base2dec($value, 62), 256);

}

So, they all work ok, but it's a bit roundabout. It would be nice to go directly from 256->62, but I can't seem to think that through. There are some base conversion functions in bc_math, but they only go up to base-36. The same goes for the big_int extension in pecl, which despite it's claims, seems slower than bc_math.

I think I need to ask someone who's more of a mathematician than a coder!

Do more with

EXPERT OFFICE^{®} is a registered trademark of EXPERTS EXCHANGE^{®}

for($loop=0; $loop < $size; $loop++) {

$dec = $dec * $base + strpos($digits, $value[$loop]);

}

Would you be willing to waste a litte space in order to gain some speed?

or if you use 21 base 62 digits to encode 125 bits at a time, you can limit the size of your multiprecision arithmetic

$dec = bcadd($element, bcmul($dec, $base)); would at least cut out a bcpow($base, $size-$loop-1); operation

multi-precision arithmetic usually gets slower as the numbers get larger, so if you can do in chunks you may speed things up.

e.g. instead of base 10, if you work in base 10^9, you may be able to speed things by a factor of 9,

and its easy to convert from base 10 to base 10^9 and vice versa

ln(62)/ln(2) = 5.954 bits. You could represent 2 url characters with 3 base-62 characters, 4 url characters with up to 6 (5.37) base-62 characters (no point in doing this because it's the same expansion as for 2 url characters), 8 url characters with up to 11 (10.75) base-62 characters.

## Premium Content

You need an Expert Office subscription to comment.Start Free Trial