• C

# STRINGS, KEYS, HASHING...

I would like to obtain from a string a unique key code to that string.

What function can build it avoiding collision?

f(string1) = key1
f(string2) = key2

string1 <> string2 --> key1 <> key2

len(string): 1..14.000 chars
###### Who is Participating?

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Commented:
you cannot make it UNIQUE unless you have the same amount of storage (14000 bytes).

There are many hasing algorithms.  The best one depends on the distribution of your string data.

0

Experts Exchange Solution brought to you by

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Commented:
Yes, You may not get a unique key as said above.
Look at the hashing, CRC or checksum routines in
http://www.snippets.org/
0
Author Commented:

Those functions become from a key and then apply hash function to that key, but what I don't know is the step before how to get the code.

Step 1:
f(string1) = key1
f(string2) = key2

Usually key1 and key2 are numbers.

Step2:
hash(key1) = position1
hash(key2) = position2

I only need the first step.

0
Commented:
MFC uses the following hash code (slightly edited)

unsigned HashKey(char* key)
{
unsigned nHash = 0;
while (*key) nHash = (nHash<<5) + nHash + *key++;
return nHash;
}

0
Commented:
all you do is say

and you havea hash.

What is this "first step" you are talking about?

BTW: snippets is a good idea to check out .. I often recommend people to look there for code.

0
Commented:
Anyway , as I said, you cannot do what you asked for (have a unique id that yu can compare).

Instead you would code (something) like this.

int StringsEqual (char* string1, char* string2) {
unsigned hash1 = HashKey(string1);
unsigned hash2 = HashKey(string2);
if (hash1 != hash2) return FALSE;
return 0 == strcmp(string1,string2);
}

Only to be useful, you wouldn't recalc the hash values every time you want to compare .. hashing may well be more time consuming than just doing a compare.  But if you can hash a test string ONCE, and then reuse that hash value for comparing with other strings, then you can safe some time.

eg. you might have a table of keywords.  so you could write something like this

typedef struct {
char* key;
unsigned has;
} KeyAndHash;

KeyAndHash keywords[NUMKEY];

...

for (i = 0; i < NUMKEY; i++) {
keywords[i].hash = HashKey(keywords[i].key);
}
...
char* token;
...
unsigned hash = HashKey(token);
...
for (i = 0; i < NUMKEY; i++) {
if (hash == keywords[i].hash) {
if (0 == strcmp(token,HashKey(keywords[i].key)) {
break;
}
}
}

This sort of code can speed your comparisons greatly because a lot of cases are dealt with with a simple unsigned int comparison.

0
Commented:
Given a fixed set of keywords, you can construct a minimal perfect hash function that works for those keywords.  (probably in linear time)
Given a fixed hash function, there will exist a set of keywords that collide.  (although some hash functions probably make it computationaly infeasible to find them)
0
Commented:
Of course, rather than try to (mathematically) create a hashing function that is unique for a given set of keywords, one can just try one to see.  If you have a hash that does give unique hash entries for each keyword, then use it.  If it doesn't try another.  If still no luck, then it may be time to construct a perfect hash function.

Also the perfect hash will not tell you if a given string is a keyword or not on its own .. just whether it is POSSIBLY a keyword or DEFINITELY not one.  You still need to test a possible keyword to make sure.

Alternatives are to create a tree based on each letter of the keywords .. so all the ones starting with 'A' are in one branch etc.  Then finding a mathc or not is just tree traversal.  This tree can be in 'code' as an if.
eg. (pseudo code .. don't try to compile this)
found = NOT_FOUND;
switch (mystring[0])
..
case 'C': // test keywords starting with 'C'
switch (mystring[1]) {
case 'O': // test for "const"
if (0 == strcmp(&mystring[2],"NST")) found = CONST_KEYWORD;
break;
case 'A': // test for "case"
if (0 == strcmp(&mystring[2],"SE")) found = CASE_KEYWORD;
break;
..
}
break;
..
}

ie. use switch to get to a single keyword to check, then just compare.

0
###### It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C

From novice to tech pro — start learning today.