Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17


Resolve / Convert string or MD5 hash into a number within a specific range

Posted on 2006-06-21
Medium Priority
Last Modified: 2007-12-19

I want to convert a string into a unique integer value that lies within a range.  What I will be doing is feeding a given URL, or MD5 hash of that URL as input to a formula that returns a value not less than the bottom of my range, and not greater than my range.  

For example, lets say I have as a string, and a range from one (1) to sixty (60).
All I need is to feed that URL into the formula, and it evaluates it somehow into a number that falls between 1 and 60. But if I have a large set of very similar, but unique URLS, I need to get a fairly even distribution between that range. That is to say, if I had 60 unique strings, I could probably get numbers anywhere in that range.

As another example, two similar strings such as:

might return what seems like random values between 1 and 60, as oppose to two integers that are very close to each other (because the strings are the same)

So I was thinking, maybe perform an md5 hash on the string, and then add up the ASCII values of each character in the MD5 hash. Then somehow use this get my number betwwen 1 and 60.

This "number" will act sort of as a serial number for the string, always the same unique evaluation between those ranges.  The upper value in my range with be the last record in a database table. I'll use the "serial number" to always retreive the same record from the table.

At some point in the future, I'll want each url to have a new unique serial number, so all I do is extend my range (for example, as records are added to the database table).
Question by:resonantwa
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 22

Accepted Solution

cookre earned 1500 total points
ID: 16957545
Just take a 32-bit hash mod the range: index=hash32(str) mod n

Since collisions are always possible, the only way to guarantee uniqueness is to save all generated values and, say, add 1 in case a collision occurs.


Expert Comment

ID: 16958910
>>Since collisions are always possible:

Virtually guaranteed ...

I do not think you can achieve what you are asking in this way.

Hashing systems have always worked on what is known as a "bathtub" distribuion and you will naturally come across collisions when your "bathtub" is aproximately 60% full.

with a "bathtub" of 60 you are almost guarnteed to hit your first collision by the time you have added just 36 hash entries to your table of approx 60.

I beleive your requirement is unrealastic, I would advise that you rethink it!


Author Comment

ID: 16962110
HI cookre,

I knew there was an easy way of getting a number within a range, I just couldn't remember my math (mod). How would the hash string be evaluated into a numerical value on which MOD 'N' would be performed?


The bathub metaphor gives me an idea how effective this would be. I am not advanced enough to know, but given what you've said I think it is reasonable to assume that the uniqueness of the hash will dictate how good the distribution is. Correct?  Uniqueness is actually not critical in this situation, but a distribution covering better than 36 of 60 values would be nice. Do you have any constructive advice to move forward?
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

LVL 22

Expert Comment

ID: 16962401
Sample code:

I would suspect most would provide reasonably well distributed results.

Author Comment

ID: 16962544
I guess what I mean is, how is a string represented numerically when math is performed on it?

Given a set of hashes, with an acceptable amount of collisions, each hash is going to be the same length, and use a similar character set. Because whatever numerical form the hash translates to mathmatically would also be a factor in the distribution throughout that range.
LVL 22

Expert Comment

ID: 16965112
A quick and dirty way that ought to give reasonable results:

Pad the string with its first n bytes to make it a multiple of 4 bytes.
Treat each 4-byte climp as an unsigned int.
Add them up.
Use the result as the seed to a random generator and mod the result.

To get the int representing a given 4-byte clump, use a union or, if c#, a [StructLayout(LayoutKind.Explicit)]

Expert Comment

ID: 16966196
>>I guess what I mean is, how is a string represented numerically when math is performed on it?
normally the char are converted to ascii codes and it's the ascii code value that math is being performed on it
LVL 22

Expert Comment

ID: 16968970
union {
         s char[4];
         i int;
         } u;
Plop four chars from the target string into u.s, then use u.i.

Or, for c#:
struct FourBytesToInt
[FieldOffset(0)] public byte b0;
[FieldOffset(1)] public byte b1;
[FieldOffset(2)] public byte b2;
[FieldOffset(3)] public byte b3;
[FieldOffset(0)] public IntOut;

Author Comment

ID: 16972425
well, I didn't do a very great job of weeding out the ancillary, but essentially  my question was a math question.

And the MOD gets it!

LVL 22

Expert Comment

ID: 16973148
<I guess what I mean is, how is a string represented numerically when math is performed on it?
normally the char are converted to ascii codes and it's the ascii code value that math is being performed on it>

That's what the union does for you.  You copy 4 characters into u.s and use it numerically as u.i

For example, let's say your url is  The corresponding 4-byte clumps are

Looking at the first one in RAM you would see the bytes  68 74 74 70.  Then, by referring to u.i you'ld have the integer 1886680168.  Adding up all the u.i integers for each 4-byte clump would give you the final integer you'ld eather take the mod of. or, optionally, use it as a seed for a random generator.

Unless I'm having an extended senior moment and have missed the whole point.

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This is about my first experience with programming Arduino.
The SignAloud Glove is capable of translating American Sign Language signs into text and audio.
Six Sigma Control Plans
Starting up a Project

721 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question