Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
Solved

# STRINGS, KEYS, HASHING...

Posted on 1998-05-12
Medium Priority
227 Views
I would like to obtain from a string a unique key code to that string.

What function can build it avoiding collision?

f(string1) = key1
f(string2) = key2

string1 <> string2 --> key1 <> key2

len(string): 1..14.000 chars
0
Question by:carlos0007
[X]
###### Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

• Help others & share knowledge
• Earn cash & points

LVL 10

Accepted Solution

RONSLOW earned 800 total points
ID: 1250664
you cannot make it UNIQUE unless you have the same amount of storage (14000 bytes).

There are many hasing algorithms.  The best one depends on the distribution of your string data.

0

LVL 2

Expert Comment

ID: 1250665
Yes, You may not get a unique key as said above.
Look at the hashing, CRC or checksum routines in
http://www.snippets.org/
0

Author Comment

ID: 1250666

Those functions become from a key and then apply hash function to that key, but what I don't know is the step before how to get the code.

Step 1:
f(string1) = key1
f(string2) = key2

Usually key1 and key2 are numbers.

Step2:
hash(key1) = position1
hash(key2) = position2

I only need the first step.

0

LVL 10

Expert Comment

ID: 1250667
MFC uses the following hash code (slightly edited)

unsigned HashKey(char* key)
{
unsigned nHash = 0;
while (*key) nHash = (nHash<<5) + nHash + *key++;
return nHash;
}

0

LVL 10

Expert Comment

ID: 1250668
all you do is say

and you havea hash.

What is this "first step" you are talking about?

BTW: snippets is a good idea to check out .. I often recommend people to look there for code.

0

LVL 10

Expert Comment

ID: 1250669
Anyway , as I said, you cannot do what you asked for (have a unique id that yu can compare).

Instead you would code (something) like this.

int StringsEqual (char* string1, char* string2) {
unsigned hash1 = HashKey(string1);
unsigned hash2 = HashKey(string2);
if (hash1 != hash2) return FALSE;
return 0 == strcmp(string1,string2);
}

Only to be useful, you wouldn't recalc the hash values every time you want to compare .. hashing may well be more time consuming than just doing a compare.  But if you can hash a test string ONCE, and then reuse that hash value for comparing with other strings, then you can safe some time.

eg. you might have a table of keywords.  so you could write something like this

typedef struct {
char* key;
unsigned has;
} KeyAndHash;

KeyAndHash keywords[NUMKEY];

...

for (i = 0; i < NUMKEY; i++) {
keywords[i].hash = HashKey(keywords[i].key);
}
...
char* token;
...
unsigned hash = HashKey(token);
...
for (i = 0; i < NUMKEY; i++) {
if (hash == keywords[i].hash) {
if (0 == strcmp(token,HashKey(keywords[i].key)) {
break;
}
}
}

This sort of code can speed your comparisons greatly because a lot of cases are dealt with with a simple unsigned int comparison.

0

LVL 84

Expert Comment

ID: 1250670
Given a fixed set of keywords, you can construct a minimal perfect hash function that works for those keywords.  (probably in linear time)
Given a fixed hash function, there will exist a set of keywords that collide.  (although some hash functions probably make it computationaly infeasible to find them)
0

LVL 10

Expert Comment

ID: 1250671
Of course, rather than try to (mathematically) create a hashing function that is unique for a given set of keywords, one can just try one to see.  If you have a hash that does give unique hash entries for each keyword, then use it.  If it doesn't try another.  If still no luck, then it may be time to construct a perfect hash function.

Also the perfect hash will not tell you if a given string is a keyword or not on its own .. just whether it is POSSIBLY a keyword or DEFINITELY not one.  You still need to test a possible keyword to make sure.

Alternatives are to create a tree based on each letter of the keywords .. so all the ones starting with 'A' are in one branch etc.  Then finding a mathc or not is just tree traversal.  This tree can be in 'code' as an if.
eg. (pseudo code .. don't try to compile this)
found = NOT_FOUND;
switch (mystring[0])
..
case 'C': // test keywords starting with 'C'
switch (mystring[1]) {
case 'O': // test for "const"
if (0 == strcmp(&mystring[2],"NST")) found = CONST_KEYWORD;
break;
case 'A': // test for "case"
if (0 == strcmp(&mystring[2],"SE")) found = CASE_KEYWORD;
break;
..
}
break;
..
}

ie. use switch to get to a single keyword to check, then just compare.

0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This is a short and sweet, but (hopefully) to the point article. There seems to be some fundamental misunderstanding about the function prototype for the "main" function in C and C++, more specifically what type this function should return. I see soâ€¦
Examines three attack vectors, specifically, the different types of malware used in malicious attacks, web application attacks, and finally, network based attacks.  Concludes by examining the means of securing and protecting critical systems and infâ€¦
The goal of this video is to provide viewers with basic examples to understand recursion in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use for-loops in the C programming language.
###### Suggested Courses
Course of the Month10 days, 23 hours left to enroll