Comparing Base64 and Base62

hankknight
hankknight used Ask the Experts™
on
I need to pass data through a URL query and will first  Base64 or Base62 encode it.

I have three questions:

1. What is the difference between Base64 or Base62?
2. Do both Base64 and Base62 encode data in a way that is URL query safe?
3. Which is space efficient (Takes less bytes)?   Base64 or Base62?
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Base-64 has / (which is probably okay) and + (which would normally be replaced with a space, but space isn't used so you could deal with that on the other end).

I don't know what base-62 is but I would guess it's upper and lower case letters and digits.  Base 64 would be more efficient then because it can hold more information per byte.

You could also urlencode the data first, then you wouldn't have to worry what was URL query safe.
Commented:
yeah - Base62 is numbers+ capital letters + lower letter.

The bigger the base, the shorter the string - that's the main rule. So yes - Base64 would need less space for the same data. Except when the standard base64 encoding encodes things like / and + - these get converte to (%2F and  %2B) to make them URL safe and that makes the string longer. The standard way usually is not to go for Base62 but to use the slightly modified base64 that has - and _ instead :)

But I agree with Superdave, go for a URLEncoder instead - that guarantees you that it is safe.
Base-62 is slightly less space-efficient than base-64 because it doesn't fit exactly into a given number of bits (64 is a power of 2, 62 isn't), but has some advantages that no format of base-64 can match. In particular it can be used in domain/host names and email addresses without fear of breaking the native  encodings that each use. This is useful for temporary domains for security purposes and disposable email addresses.
Base-62 is usually some variant of [a-zA-Z0-9]. There are two common encodings for base-64: [a-zA-Z0-9\/\+] and [a-zA-Z0-9_-]. The latter is preferred for URLs as it results in URL-safe strings that don't require further URL-encoding. Both encodings ofte use an '=' to pad to 4-byte boundaries, but it's possible to live without that. Base-62 avoids all this ambiguity at a cost of slight space overhead.
One other downside of base-62 is that I've never found a decent implementation of it...
Forgot to mention - Depending on what data you're using URL encoding can be horribly inefficient. base-64 adds a fixed and consistent 33% overhead, but URL encoding can add anywhere between 0 and 300% overhead, depending on your data. Whether this is relevant also depends if URL length is a problem - 2k is about the safe limit.

Commented:
If anything is above 2k, it should not be used in any URL related activities if you ask me. Good point for the overhead.

I have a few implementations of base62 somewhere - none in java though - the only one I ever found in Java that at least was working well enough was doing it by going through Base64 and then to Base62 (and on the way back from 62 to 64 to decoded)

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial