Solved

Resolve / Convert string or MD5 hash into a number within a specific range

Posted on 2006-06-21
10
1,209 Views
Last Modified: 2007-12-19
Hello,

I want to convert a string into a unique integer value that lies within a range.  What I will be doing is feeding a given URL, or MD5 hash of that URL as input to a formula that returns a value not less than the bottom of my range, and not greater than my range.  

For example, lets say I have http://www.yourdomain.com/cats/ as a string, and a range from one (1) to sixty (60).
All I need is to feed that URL into the formula, and it evaluates it somehow into a number that falls between 1 and 60. But if I have a large set of very similar, but unique URLS, I need to get a fairly even distribution between that range. That is to say, if I had 60 unique strings, I could probably get numbers anywhere in that range.

As another example, two similar strings such as:
http://www.your...../cats/
and
http://www.your...../cats1/

might return what seems like random values between 1 and 60, as oppose to two integers that are very close to each other (because the strings are the same)

So I was thinking, maybe perform an md5 hash on the string, and then add up the ASCII values of each character in the MD5 hash. Then somehow use this get my number betwwen 1 and 60.


This "number" will act sort of as a serial number for the string, always the same unique evaluation between those ranges.  The upper value in my range with be the last record in a database table. I'll use the "serial number" to always retreive the same record from the table.

At some point in the future, I'll want each url to have a new unique serial number, so all I do is extend my range (for example, as records are added to the database table).
0
Comment
Question by:resonantwa
10 Comments
 
LVL 22

Accepted Solution

by:
cookre earned 500 total points
ID: 16957545
Just take a 32-bit hash mod the range: index=hash32(str) mod n

Since collisions are always possible, the only way to guarantee uniqueness is to save all generated values and, say, add 1 in case a collision occurs.

0
 
LVL 4

Expert Comment

by:David_Ward
ID: 16958910
>>Since collisions are always possible:

Virtually guaranteed ...

I do not think you can achieve what you are asking in this way.

Hashing systems have always worked on what is known as a "bathtub" distribuion and you will naturally come across collisions when your "bathtub" is aproximately 60% full.

with a "bathtub" of 60 you are almost guarnteed to hit your first collision by the time you have added just 36 hash entries to your table of approx 60.


I beleive your requirement is unrealastic, I would advise that you rethink it!

0
 
LVL 1

Author Comment

by:resonantwa
ID: 16962110
HI cookre,

I knew there was an easy way of getting a number within a range, I just couldn't remember my math (mod). How would the hash string be evaluated into a numerical value on which MOD 'N' would be performed?

David,

The bathub metaphor gives me an idea how effective this would be. I am not advanced enough to know, but given what you've said I think it is reasonable to assume that the uniqueness of the hash will dictate how good the distribution is. Correct?  Uniqueness is actually not critical in this situation, but a distribution covering better than 36 of 60 values would be nice. Do you have any constructive advice to move forward?
0
 
LVL 22

Expert Comment

by:cookre
ID: 16962401
Sample code:
http://www.obviex.com/samples/hash.aspx

I would suspect most would provide reasonably well distributed results.
0
 
LVL 1

Author Comment

by:resonantwa
ID: 16962544
I guess what I mean is, how is a string represented numerically when math is performed on it?

Given a set of hashes, with an acceptable amount of collisions, each hash is going to be the same length, and use a similar character set. Because whatever numerical form the hash translates to mathmatically would also be a factor in the distribution throughout that range.
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 22

Expert Comment

by:cookre
ID: 16965112
A quick and dirty way that ought to give reasonable results:

Pad the string with its first n bytes to make it a multiple of 4 bytes.
Treat each 4-byte climp as an unsigned int.
Add them up.
Use the result as the seed to a random generator and mod the result.

To get the int representing a given 4-byte clump, use a union or, if c#, a [StructLayout(LayoutKind.Explicit)]
0
 
LVL 2

Expert Comment

by:seet82
ID: 16966196
>>I guess what I mean is, how is a string represented numerically when math is performed on it?
normally the char are converted to ascii codes and it's the ascii code value that math is being performed on it
0
 
LVL 22

Expert Comment

by:cookre
ID: 16968970
union {
         s char[4];
         i int;
         } u;
Plop four chars from the target string into u.s, then use u.i.


Or, for c#:
[StructLayout(LayoutKind.Explicit)]
struct FourBytesToInt
{
[FieldOffset(0)] public byte b0;
[FieldOffset(1)] public byte b1;
[FieldOffset(2)] public byte b2;
[FieldOffset(3)] public byte b3;
[FieldOffset(0)] public IntOut;
}
0
 
LVL 1

Author Comment

by:resonantwa
ID: 16972425
well, I didn't do a very great job of weeding out the ancillary, but essentially  my question was a math question.

And the MOD gets it!

Thanks.
0
 
LVL 22

Expert Comment

by:cookre
ID: 16973148
<I guess what I mean is, how is a string represented numerically when math is performed on it?
normally the char are converted to ascii codes and it's the ascii code value that math is being performed on it>

That's what the union does for you.  You copy 4 characters into u.s and use it numerically as u.i

For example, let's say your url is http://www.there.com.  The corresponding 4-byte clumps are
http
://w
ww.t
here
.com

Looking at the first one in RAM you would see the bytes  68 74 74 70.  Then, by referring to u.i you'ld have the integer 1886680168.  Adding up all the u.i integers for each 4-byte clump would give you the final integer you'ld eather take the mod of. or, optionally, use it as a seed for a random generator.

Unless I'm having an extended senior moment and have missed the whole point.
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Suggested Solutions

This is an explanation of a simple data model to help parse a JSON feed
Entering a date in Microsoft Access can be tricky. A typo can cause month and day to be shuffled, entering the day only causes an error, as does entering, say, day 31 in June. This article shows how an inputmask supported by code can help the user a…
An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now