Solved

# name matching algorithm

Posted on 2002-07-10
500 Views
Hi !!!

Does anyone know about the matching algorithm which can handle this situation:

for example..we saecrh for "smi" (surname)
and when we type "smi" and press enter, the program looks into the datfile (surname.dat) and returns
the list which begined with "smi***" or in between ***smi*** or end uped with "***smi"
nysmi
nismithy
smi
smid
smith
smithe
smithy
smithye
wysmi

etc..

or we type  "smith"
the matching algorithm generates the list of matches as following:

smith
smithey
smithy
smithye
willsmith
willsmithyoung
youngsmith

or  type "will"

youngwillsmith
will
willsmith
----------------------

if this algorithm doesn't exist , could anyone implement the simple code which overcome this problem above

many thanks,

Korsila
0
Question by:korsila
• 16
• 7
• 2
• +3

LVL 30

Expert Comment

ID: 7143829
Is this homework?
0

LVL 1

Expert Comment

ID: 7143845
/*
** How's this?
*/
#include <stdio.h>
#include <strings.h>

/*
** Returns 1 if the *target contains *src
*/
int matches(char *src, char *target) {
int i;

for(i = 0; i < strlen(target); i++)
if(!strncmp(&target[i], src, strlen(src))) return(1);

return(0);
}

int main(int argc, char *argv[]) {

char *surname = "smi";
char data[4][15] = { "smith", "willsmith", "bobby", "willsmithson" };
int i;

/*
** Loop through the array printing matches
*/
for(i = 0; i < 4; i++)
if(matches(surname, data[i]))
printf("%s\n", data[i]);

return(1);
}
0

Author Comment

ID: 7143881
aXTER,
Not a homeowrk, it's something I am interested in most of my work..!!!but as you might not know am just a beginer of C or other programming language apart from HTML...

cheers,
KORSILA
0

LVL 1

Expert Comment

ID: 7143890
Korsila,

Yeah I looked around and you asked some questions similar to this (in SQL, etc.) so I figured it wasn't homework.  Try to read through and understand the code I gave.  Also, the loop in "matches" does too many iterations.  See if you can figure out how to change it to speed things up (and avoid possible memory problems).

Thanks,
Steve
0

LVL 5

Expert Comment

ID: 7143891
Use standard strstr function:

NAME
strstr - locate a substring

SYNOPSIS
#include <string.h>

char *strstr(const char *haystack, const char *needle);

DESCRIPTION
The  strstr()  function  finds the first occurrence of the
substring needle in the string haystack.  The  terminating
`\0' characters are not compared.

RETURN VALUE
The  strstr()  function returns a pointer to the beginning

0

Author Comment

ID: 7143901
Smisk,

can you code and test it with datafile...(surname.dat)

surname.dat

barra
baramy
nabarra
nybarramy
nysmi
nismithy
smi
smid
smith
smithe
smithy
smithye
will
willsmith
willsmithyoung
wysmi
youngsmith
youngwillsmith

-------------------------------

so the output should look like this:

enter name to macth: smith ---(example)---chick eneter

return name mathed with above requirement..!!!!

can you do this....

cheers,
Korsila

0

Expert Comment

ID: 7143912
hi,
u can not get more easy than this.....
i strongly belive, it _must_ help u.
sooooooooooo, i may expect 100 pts.

i will work, perfect.

# include <stdio.h>
# include <stdlib.h>
# include <string.h>

# define TRUE 1
# define FALSE 0
main (int c, char **argv)
{
char t[128] = { 0x00, };
char one[10] = { 0x00, }, two[10] =
{
0x00,};
if (c != 3)

{
printf ("\nUsage: expression matcher\n\n");
return 0;
}
/* call like */

/* fp = fopen ("./surname.dat","r")
*  fgets (...) ;  read each name
*  then
*    on success 1 on fail 0
*    */
printf ("\n[%d]\n", mask_match (argv[1], argv[2], 1));
return 0;
}
int
do_match (char *str, char *regexp)
{
char *p;
for (p = regexp; *p && *str;)

{
switch (*p)

{
case '?':
str++;
p++;
break;
case '*':
p++;
if (!*p)
return TRUE;
while (*str)

{
while (*str && (toupper (*p) != toupper (*str)))
str++;
if (do_match (str, p))
return TRUE;
if (!*str)
return FALSE;
str++;
}
return FALSE;
default:
if (toupper (*str) != toupper (*p))

{
return FALSE;
}
str++;
p++;
break;
}
}
if (!*p && !*str)
return TRUE;
if (!*p && (str[0] == '.') && (str[1] == 0))
return (TRUE);
if (!*str && *p == '?')

{
while (*p == '?')
p++;
return (!*p);
}
if (!*str && (*p == '*') && (p[1] == '\0'))
return TRUE;
return FALSE;
}
int
mask_match (char *str, char *regexp, int trans2)
{
char *p = NULL;
char pattern[1024] = { 0x00, }, string[1024] =
{
0x00,};
int matched;
if (str == NULL || regexp == NULL)
return TRUE;
strncpy (pattern, regexp, sizeof (pattern) - 1);
for (p = pattern; *p; p++)

{
while (*p == '*' && (p[1] == '?' || p[1] == '*'))

{
memmove (p + 1, p + 2, strlen (p + 2));
p[strlen (p + 2) + 1] = 0;
}
}
if (!strcasecmp (pattern, "*"))
return (TRUE);
strncpy (string, str, sizeof (string) - 1);
matched = do_match (string, pattern);
return matched;
}

0

LVL 30

Expert Comment

ID: 7143932
Hi (dhanaspace1), welcome to EE.

All of the experts here, for the most part have learn from other experts as to the proper etiquette for posting answer.

If you read the following link, you'll see why this is the preferred method for many of our valued experts, including myself.

Hi (korsila):
Feel free to click the [Reject Answer] button near (Answer-poster's) response, even if it seems like a good answer.
Doing so will increase your chance of obtaining additional input from other experts.  Later, you can click the [Select Comment as Answer] button on any response.
0

Author Comment

ID: 7143960
First of all, I think Axter is right...
secondly,  the code that dhanaspace1 has posted it here didn't work...I got something instead of correct outpput of name macthes after running the code:
"Usage: expression matcher"

so if you could figure out what it was and fix it for me..that would be ok though..!!
am glad load of experts are paying attendtion with my question and trying to help me out..!!!

Korsil;a
0

Author Comment

ID: 7143964
First of all, I think Axter is right...
secondly,  the code that dhanaspace1 has posted it here didn't work...I got something instead of correct outpput of name macthes after running the code:
"Usage: expression matcher"

so if you could figure out what it was and fix it for me..that would be ok though..!!
am glad load of experts are paying attendtion with my question and trying to help me out..!!!

Korsil;a
0

Author Comment

ID: 7143972
make sure you got subroutine called "position()" which will handle the job above...just wanna make it tidy..!!!
Korsila

0

Author Comment

ID: 7144015
so 4 steps in "posiiton ()" subroutine -if you could :
1. exact match  , smit--->smit
2. beginning match, smit--> smity, smith, smithy
3. middle match, smit-->willsmither, yungsmitty
4. ending match, smit-->willsmit, yungsmit

thanks,
Korsila
0

LVL 1

Expert Comment

ID: 7144111
/* Revisited... :) */
#include <stdio.h>
#include <fcntl.h>
#include <strings.h>

/*
** Returns 1 if the *target contains *src
*/
int matches(char *src, char *target) {
int i;

for(i = 0; i < strlen(target); i++)
if(!strncmp(&target[i], src, strlen(src))) return(1);

return(0);
}

int main(int argc, char *argv[]) {
char surname[12];
/* char data[4][15] = { "smith", "willsmith", "bobby", "willsmithson" }; */
char data[80];
int i;
FILE *fp;

if(argc == 2) {

/*
** Read the surname to test
*/
printf("Enter a surname [max 12 chars] (example : smi) : ");
fgets(surname, 12, stdin);
surname[strlen(surname) - 1] = '\0';

/*
** Loop through the datafile printing matches
*/
fp = fopen(argv[1], "r");
if(fp != NULL) {
while(fgets(data, 80, fp))
if(matches(surname, data))
printf("%s", data);

return(0);
}
else {
printf("Error accessing %s\n", argv[1]);
return(1);
}
fclose(fp);

}
else {
printf("Usage : %s <datafile>\n", argv[0]);
return(1);
}
}
0

LVL 6

Expert Comment

ID: 7144422
what was wrong with using Sapa's answer and using strstr?

if(strstr(szLookIn, szFind) != NULL)
{
/* we found it */
}
0

Author Comment

ID: 7144824
Ebosscher,
I need some help with the code , not some comments without it otherwise I CANN'T TEST how it works...!!!!as I am not that expert, but I would love to learn...provide me the code and I WILL run it if it works you got the poitns and I got help that's fair is n't it...?

Korsila

0

Author Comment

ID: 7144825
smisk,
I WIL HAVE A GO WITH YOUR CODE AND WILL let you know as soon as I get it run/done...

cheers,
Korsila
0

Author Comment

ID: 7144862
Is it ok for anyone if I would accept Smisk as the best answer here..as I got it working with my datafile and I reckon i would accept his asnwer as the good one..!!!

anybody is happy with that....? I Wwill wait for "dhanaspace1" for updating new codes and "Sapa" for implementing the code not commenting algorithm..!!!

Korsila
0

LVL 1

Expert Comment

ID: 7144956
Korsila,

You can basically swap out my "matches()" function with a call to "strstr()" which I had no idea existed (you learn something new every day! :)).

Try doing this switch :

fp = fopen(argv[1], "r");
if(fp != NULL) {
while(fgets(data, 80, fp))
if(matches(surname, data))
printf("%s", data);

fp = fopen(argv[1], "r");
if(fp != NULL) {
while(fgets(data, 80, fp))
if(strstr(data, surname))
printf("%s", data);

If that works for you (it does for me), use it and remove the function declaration of "matches()" from the program.  Note that the parameters are reversed in the call to strstr()...

Thanks,
Steve
0

Author Comment

ID: 7145660
Dear Smis or Steve,

here is what I have done to swap your matches() to strstr()

/* Revisited... :) */
#include <stdio.h>
#include <fcntl.h>
#include <strings.h>

/*
** Returns 1 if the *target contains *src
*/
int strstr(char *src, char *target) {
int i;

for(i = 0; i < strlen(target); i++)
if(!strncmp(&target[i], src, strlen(src))) return(1);

return(0);
}

int main(int argc, char *argv[]) {
char surname[12];
/* char data[4][15] = { "smith", "willsmith", "bobby", "willsmithson" }; */
char data[80];
int i;
FILE *fp;

if(argc == 2) {

/*
** Read the surname to test
*/
printf("Enter a surname [max 12 chars] (example : smi) : ");
fgets(surname, 12, stdin);
surname[strlen(surname) - 1] = '\0';

/*
** Loop through the datafile printing matches
*/
fp = fopen(argv[1], "r");
if(fp != NULL) {
while(fgets(data, 80, fp))
if(strstr(data, surname))
printf("%s", data);

return(0);
}
else {
printf("Error accessing %s\n", argv[1]);
return(1);
}
fclose(fp);

}
else {
printf("Usage : %s <datafile>\n", argv[0]);
return(1);
}
}

however, it got error when I compiled the code:
line 10: Inconsistent type declaration: "strstr"
line 10: Inconsistent parameter list declaration for "strstr"

can you provide your new update code (whole program)
0

LVL 1

Accepted Solution

smisk earned 100 total points
ID: 7146058
Korsila,

Nononono...  The whole point of using strstr() is that it is already a function defined in the library so you don't need to define it.  Try this :

/* Revisited again... :) */
#include <stdio.h>
#include <fcntl.h>
#include <strings.h>

int main(int argc, char *argv[]) {
char surname[12];
char data[80];
int i;
FILE *fp;

if(argc == 2) {

/*
** Read the surname to test
*/
printf("Enter a surname [max 12 chars] (example : smi) : ");
fgets(surname, 12, stdin);
surname[strlen(surname) - 1] = '\0';

/*
** Loop through the datafile printing matches
*/
fp = fopen(argv[1], "r");
if(fp != NULL) {
while(fgets(data, 80, fp))
if(strstr(data, surname))
printf("%s", data);

return(0);
}
else {
printf("Error accessing %s\n", argv[1]);
return(1);
}
fclose(fp);

}
else {
printf("Usage : %s <datafile>\n", argv[0]);
return(1);
}
}
0

Author Comment

ID: 7146129
Smisk,
that's better...but whenn I test with a large datfile which copntained aprroximately 5000 names..the program returns nothing...

or this is the cause of problem:
while(fgets(data, 80, fp))
you set up only 80 names or something..????
can you make it unlimited...both input name and name in datafile...
can I change these 2 line or even get rid of them:
char surname[12];
char data[80];

many thanks,
Korsila

0

Author Comment

ID: 7146160
Smisk,
i have tried to change these 2 lines:
char surname[12];
char data[80];

to
char surname[120];
char data[500];

and while(fgets(data, 500, fp))
fgets(surname, 120, stdin);

however, the program still return nothing after i enter name like "smith" or "smi" from a large datfile whichj contained more than 5000 names...

what we should do next?
I thuink to get rid of the unlimited length of input name and names in datafile is sa solution..any suggestion ..we are almost there

korsila
0

Author Comment

ID: 7146171
aha ha I have just found the real problem...

becus the input name is "lower case" but the names in datfile are upper case...!!!that's why the program couldn't generate any output...!!!

but is it going to work with a large datfile..??? please answer me..what i should do to set up an unlimited length of input name and datafile...(or maximum)
last question i hope..!!
many thanks,
Korsila
0

LVL 1

Expert Comment

ID: 7146177
korsila,

Hmm...  It works fine with a 75,000 line recordset for me.  Does the program print an error message or anything?  Keep in mind that this search IS case sensitive so "SMI" will not match "smi"!

Maybe you're on to something though.  Here's an explanation of those variables :

/*
** This is a character array of 12 characters that will
** hold the input.  This means that the input can only
** be 11 characters long at a maximum (one for the null).
** If you want to hold longer names, extend this as well
** as the "fgets(surname, 12, stdin);" to reflect the new
** length.
*/
char surname[12];

/*
** This is the maximum length of a line to read from
** the file.  It really shouldn't be a problem unless
** you have really long lines in the file...
*/
char line[80];

0

Author Comment

ID: 7146192
aha ha I have just found the real problem...

becus the input name is "lower case" but the names in datfile are upper case...!!!that's why the program couldn't generate any output...!!!

but is it going to work with a large datfile..??? please answer me..what i should do to set up an unlimited length of input name and datafile...(or maximum)
last question i hope..!!
many thanks,
Korsila
0

Author Comment

ID: 7146207
Smisk,
yeah , you are absolutely right , just a case sensitive though, I wil accept your answer anyway...!!!
weldone and a million thank for your great help...

Korsila
0

Author Comment

ID: 7146222
0

LVL 1

Expert Comment

ID: 7146223
No problem...
0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Preface I don't like visual development tools that are supposed to write a program for me. Even if it is Xcode and I can use Interface Builder. Yes, it is a perfect tool and has helped me a lot, mainly, in the beginning, when my programs were smallâ€¦
Windows programmers of the C/C++ variety, how many of you realise that since Window 9x Microsoft has been lying to you about what constitutes Unicode (http://en.wikipedia.org/wiki/Unicode)? They will have you believe that Unicode requires you to useâ€¦
The goal of this video is to provide viewers with basic examples to understand how to use strings and some functions related to them in the C programming language.
The goal of this video is to provide viewers with basic examples to understand opening and reading files in the C programming language.