Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

STRINGS, KEYS, HASHING...

Posted on 1998-05-12
8
Medium Priority
?
227 Views
Last Modified: 2010-04-15
I would like to obtain from a string a unique key code to that string.

What function can build it avoiding collision?

f(string1) = key1
f(string2) = key2

string1 <> string2 --> key1 <> key2

len(string): 1..14.000 chars
0
Comment
Question by:carlos0007
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
8 Comments
 
LVL 10

Accepted Solution

by:
RONSLOW earned 800 total points
ID: 1250664
you cannot make it UNIQUE unless you have the same amount of storage (14000 bytes).

There are many hasing algorithms.  The best one depends on the distribution of your string data.



0
 
LVL 2

Expert Comment

by:seedy
ID: 1250665
Yes, You may not get a unique key as said above.
Look at the hashing, CRC or checksum routines in
     http://www.snippets.org/
They might help you.
0
 

Author Comment

by:carlos0007
ID: 1250666
I already know hash functions.

Those functions become from a key and then apply hash function to that key, but what I don't know is the step before how to get the code.

Step 1:
f(string1) = key1
f(string2) = key2

Usually key1 and key2 are numbers.

Step2:
hash(key1) = position1
hash(key2) = position2

I only need the first step.

0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 10

Expert Comment

by:RONSLOW
ID: 1250667
MFC uses the following hash code (slightly edited)

unsigned HashKey(char* key)
{
  unsigned nHash = 0;
  while (*key) nHash = (nHash<<5) + nHash + *key++;
  return nHash;
}

0
 
LVL 10

Expert Comment

by:RONSLOW
ID: 1250668
all you do is say

unsigned hash = HasKey("my string");

and you havea hash.

What is this "first step" you are talking about?

BTW: snippets is a good idea to check out .. I often recommend people to look there for code.

0
 
LVL 10

Expert Comment

by:RONSLOW
ID: 1250669
Anyway , as I said, you cannot do what you asked for (have a unique id that yu can compare).

Instead you would code (something) like this.

int StringsEqual (char* string1, char* string2) {
  unsigned hash1 = HashKey(string1);
  unsigned hash2 = HashKey(string2);
  if (hash1 != hash2) return FALSE;
  return 0 == strcmp(string1,string2);
}

Only to be useful, you wouldn't recalc the hash values every time you want to compare .. hashing may well be more time consuming than just doing a compare.  But if you can hash a test string ONCE, and then reuse that hash value for comparing with other strings, then you can safe some time.

eg. you might have a table of keywords.  so you could write something like this

typedef struct {
  char* key;
  unsigned has;
} KeyAndHash;

KeyAndHash keywords[NUMKEY];

...

for (i = 0; i < NUMKEY; i++) {
  keywords[i].hash = HashKey(keywords[i].key);
}
...
char* token;
...
unsigned hash = HashKey(token);
...
for (i = 0; i < NUMKEY; i++) {
  if (hash == keywords[i].hash) {
    if (0 == strcmp(token,HashKey(keywords[i].key)) {
       break;
    }
  }
}

This sort of code can speed your comparisons greatly because a lot of cases are dealt with with a simple unsigned int comparison.

0
 
LVL 84

Expert Comment

by:ozo
ID: 1250670
Given a fixed set of keywords, you can construct a minimal perfect hash function that works for those keywords.  (probably in linear time)
Given a fixed hash function, there will exist a set of keywords that collide.  (although some hash functions probably make it computationaly infeasible to find them)
0
 
LVL 10

Expert Comment

by:RONSLOW
ID: 1250671
Of course, rather than try to (mathematically) create a hashing function that is unique for a given set of keywords, one can just try one to see.  If you have a hash that does give unique hash entries for each keyword, then use it.  If it doesn't try another.  If still no luck, then it may be time to construct a perfect hash function.

Also the perfect hash will not tell you if a given string is a keyword or not on its own .. just whether it is POSSIBLY a keyword or DEFINITELY not one.  You still need to test a possible keyword to make sure.

Alternatives are to create a tree based on each letter of the keywords .. so all the ones starting with 'A' are in one branch etc.  Then finding a mathc or not is just tree traversal.  This tree can be in 'code' as an if.
eg. (pseudo code .. don't try to compile this)
found = NOT_FOUND;
switch (mystring[0])
..
case 'C': // test keywords starting with 'C'
  switch (mystring[1]) {
  case 'O': // test for "const"
    if (0 == strcmp(&mystring[2],"NST")) found = CONST_KEYWORD;
    break;
  case 'A': // test for "case"
    if (0 == strcmp(&mystring[2],"SE")) found = CASE_KEYWORD;
    break;
  ..
  }
  break;
..
}

ie. use switch to get to a single keyword to check, then just compare.

0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This is a short and sweet, but (hopefully) to the point article. There seems to be some fundamental misunderstanding about the function prototype for the "main" function in C and C++, more specifically what type this function should return. I see so…
Examines three attack vectors, specifically, the different types of malware used in malicious attacks, web application attacks, and finally, network based attacks.  Concludes by examining the means of securing and protecting critical systems and inf…
The goal of this video is to provide viewers with basic examples to understand recursion in the C programming language.
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use for-loops in the C programming language.
Suggested Courses

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question