Solved

STRINGS, KEYS, HASHING...

Posted on 1998-05-12
8
217 Views
Last Modified: 2010-04-15
I would like to obtain from a string a unique key code to that string.

What function can build it avoiding collision?

f(string1) = key1
f(string2) = key2

string1 <> string2 --> key1 <> key2

len(string): 1..14.000 chars
0
Comment
Question by:carlos0007
8 Comments
 
LVL 10

Accepted Solution

by:
RONSLOW earned 200 total points
ID: 1250664
you cannot make it UNIQUE unless you have the same amount of storage (14000 bytes).

There are many hasing algorithms.  The best one depends on the distribution of your string data.



0
 
LVL 2

Expert Comment

by:seedy
ID: 1250665
Yes, You may not get a unique key as said above.
Look at the hashing, CRC or checksum routines in
     http://www.snippets.org/
They might help you.
0
 

Author Comment

by:carlos0007
ID: 1250666
I already know hash functions.

Those functions become from a key and then apply hash function to that key, but what I don't know is the step before how to get the code.

Step 1:
f(string1) = key1
f(string2) = key2

Usually key1 and key2 are numbers.

Step2:
hash(key1) = position1
hash(key2) = position2

I only need the first step.

0
 
LVL 10

Expert Comment

by:RONSLOW
ID: 1250667
MFC uses the following hash code (slightly edited)

unsigned HashKey(char* key)
{
  unsigned nHash = 0;
  while (*key) nHash = (nHash<<5) + nHash + *key++;
  return nHash;
}

0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 10

Expert Comment

by:RONSLOW
ID: 1250668
all you do is say

unsigned hash = HasKey("my string");

and you havea hash.

What is this "first step" you are talking about?

BTW: snippets is a good idea to check out .. I often recommend people to look there for code.

0
 
LVL 10

Expert Comment

by:RONSLOW
ID: 1250669
Anyway , as I said, you cannot do what you asked for (have a unique id that yu can compare).

Instead you would code (something) like this.

int StringsEqual (char* string1, char* string2) {
  unsigned hash1 = HashKey(string1);
  unsigned hash2 = HashKey(string2);
  if (hash1 != hash2) return FALSE;
  return 0 == strcmp(string1,string2);
}

Only to be useful, you wouldn't recalc the hash values every time you want to compare .. hashing may well be more time consuming than just doing a compare.  But if you can hash a test string ONCE, and then reuse that hash value for comparing with other strings, then you can safe some time.

eg. you might have a table of keywords.  so you could write something like this

typedef struct {
  char* key;
  unsigned has;
} KeyAndHash;

KeyAndHash keywords[NUMKEY];

...

for (i = 0; i < NUMKEY; i++) {
  keywords[i].hash = HashKey(keywords[i].key);
}
...
char* token;
...
unsigned hash = HashKey(token);
...
for (i = 0; i < NUMKEY; i++) {
  if (hash == keywords[i].hash) {
    if (0 == strcmp(token,HashKey(keywords[i].key)) {
       break;
    }
  }
}

This sort of code can speed your comparisons greatly because a lot of cases are dealt with with a simple unsigned int comparison.

0
 
LVL 84

Expert Comment

by:ozo
ID: 1250670
Given a fixed set of keywords, you can construct a minimal perfect hash function that works for those keywords.  (probably in linear time)
Given a fixed hash function, there will exist a set of keywords that collide.  (although some hash functions probably make it computationaly infeasible to find them)
0
 
LVL 10

Expert Comment

by:RONSLOW
ID: 1250671
Of course, rather than try to (mathematically) create a hashing function that is unique for a given set of keywords, one can just try one to see.  If you have a hash that does give unique hash entries for each keyword, then use it.  If it doesn't try another.  If still no luck, then it may be time to construct a perfect hash function.

Also the perfect hash will not tell you if a given string is a keyword or not on its own .. just whether it is POSSIBLY a keyword or DEFINITELY not one.  You still need to test a possible keyword to make sure.

Alternatives are to create a tree based on each letter of the keywords .. so all the ones starting with 'A' are in one branch etc.  Then finding a mathc or not is just tree traversal.  This tree can be in 'code' as an if.
eg. (pseudo code .. don't try to compile this)
found = NOT_FOUND;
switch (mystring[0])
..
case 'C': // test keywords starting with 'C'
  switch (mystring[1]) {
  case 'O': // test for "const"
    if (0 == strcmp(&mystring[2],"NST")) found = CONST_KEYWORD;
    break;
  case 'A': // test for "case"
    if (0 == strcmp(&mystring[2],"SE")) found = CASE_KEYWORD;
    break;
  ..
  }
  break;
..
}

ie. use switch to get to a single keyword to check, then just compare.

0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

This tutorial is posted by Aaron Wojnowski, administrator at SDKExpert.net.  To view more iPhone tutorials, visit www.sdkexpert.net. This is a very simple tutorial on finding the user's current location easily. In this tutorial, you will learn ho…
This is a short and sweet, but (hopefully) to the point article. There seems to be some fundamental misunderstanding about the function prototype for the "main" function in C and C++, more specifically what type this function should return. I see so…
The goal of this video is to provide viewers with basic examples to understand opening and reading files in the C programming language.
The goal of this video is to provide viewers with basic examples to understand how to create, access, and change arrays in the C programming language.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now