• C

name matching algorithm

Hi !!!

Does anyone know about the matching algorithm which can handle this situation:

for example..we saecrh for "smi" (surname)
and when we type "smi" and press enter, the program looks into the datfile (surname.dat) and returns
the list which begined with "smi***" or in between ***smi*** or end uped with "***smi"
nysmi
nismithy
smi
smid
smith
smithe
smithy
smithye
wysmi

etc..

or we type  "smith"
the matching algorithm generates the list of matches as following:

smith
smithey
smithy
smithye
willsmith
willsmithyoung
youngsmith

or  type "will"

youngwillsmith
will
willsmith
----------------------

if this algorithm doesn't exist , could anyone implement the simple code which overcome this problem above

many thanks,

Korsila
Who is Participating?

Commented:
Korsila,

Nononono...  The whole point of using strstr() is that it is already a function defined in the library so you don't need to define it.  Try this :

/* Revisited again... :) */
#include <stdio.h>
#include <fcntl.h>
#include <strings.h>

int main(int argc, char *argv[]) {
char surname[12];
char data[80];
int i;
FILE *fp;

if(argc == 2) {

/*
** Read the surname to test
*/
printf("Enter a surname [max 12 chars] (example : smi) : ");
fgets(surname, 12, stdin);
surname[strlen(surname) - 1] = '\0';

/*
** Loop through the datafile printing matches
*/
fp = fopen(argv[1], "r");
if(fp != NULL) {
while(fgets(data, 80, fp))
if(strstr(data, surname))
printf("%s", data);

return(0);
}
else {
printf("Error accessing %s\n", argv[1]);
return(1);
}
fclose(fp);

}
else {
printf("Usage : %s <datafile>\n", argv[0]);
return(1);
}
}
0

Commented:
Is this homework?
0

Commented:
/*
** How's this?
*/
#include <stdio.h>
#include <strings.h>

/*
** Returns 1 if the *target contains *src
*/
int matches(char *src, char *target) {
int i;

for(i = 0; i < strlen(target); i++)
if(!strncmp(&target[i], src, strlen(src))) return(1);

return(0);
}

int main(int argc, char *argv[]) {

char *surname = "smi";
char data[4][15] = { "smith", "willsmith", "bobby", "willsmithson" };
int i;

/*
** Loop through the array printing matches
*/
for(i = 0; i < 4; i++)
if(matches(surname, data[i]))
printf("%s\n", data[i]);

return(1);
}
0

Author Commented:
aXTER,
Not a homeowrk, it's something I am interested in most of my work..!!!but as you might not know am just a beginer of C or other programming language apart from HTML...

cheers,
KORSILA
0

Commented:
Korsila,

Yeah I looked around and you asked some questions similar to this (in SQL, etc.) so I figured it wasn't homework.  Try to read through and understand the code I gave.  Also, the loop in "matches" does too many iterations.  See if you can figure out how to change it to speed things up (and avoid possible memory problems).

Thanks,
Steve
0

Commented:
Use standard strstr function:

NAME
strstr - locate a substring

SYNOPSIS
#include <string.h>

char *strstr(const char *haystack, const char *needle);

DESCRIPTION
The  strstr()  function  finds the first occurrence of the
substring needle in the string haystack.  The  terminating
`\0' characters are not compared.

RETURN VALUE
The  strstr()  function returns a pointer to the beginning

0

Author Commented:
Smisk,

can you code and test it with datafile...(surname.dat)

surname.dat

barra
baramy
nabarra
nybarramy
nysmi
nismithy
smi
smid
smith
smithe
smithy
smithye
will
willsmith
willsmithyoung
wysmi
youngsmith
youngwillsmith

-------------------------------

so the output should look like this:

enter name to macth: smith ---(example)---chick eneter

return name mathed with above requirement..!!!!

can you do this....

cheers,
Korsila

0

Commented:
hi,
u can not get more easy than this.....
i strongly belive, it _must_ help u.
sooooooooooo, i may expect 100 pts.

i will work, perfect.

# include <stdio.h>
# include <stdlib.h>
# include <string.h>

# define TRUE 1
# define FALSE 0
main (int c, char **argv)
{
char t[128] = { 0x00, };
char one[10] = { 0x00, }, two[10] =
{
0x00,};
if (c != 3)

{
printf ("\nUsage: expression matcher\n\n");
return 0;
}
/* call like */

/* fp = fopen ("./surname.dat","r")
*  fgets (...) ;  read each name
*  then
*    on success 1 on fail 0
*    */
printf ("\n[%d]\n", mask_match (argv[1], argv[2], 1));
return 0;
}
int
do_match (char *str, char *regexp)
{
char *p;
for (p = regexp; *p && *str;)

{
switch (*p)

{
case '?':
str++;
p++;
break;
case '*':
p++;
if (!*p)
return TRUE;
while (*str)

{
while (*str && (toupper (*p) != toupper (*str)))
str++;
if (do_match (str, p))
return TRUE;
if (!*str)
return FALSE;
str++;
}
return FALSE;
default:
if (toupper (*str) != toupper (*p))

{
return FALSE;
}
str++;
p++;
break;
}
}
if (!*p && !*str)
return TRUE;
if (!*p && (str[0] == '.') && (str[1] == 0))
return (TRUE);
if (!*str && *p == '?')

{
while (*p == '?')
p++;
return (!*p);
}
if (!*str && (*p == '*') && (p[1] == '\0'))
return TRUE;
return FALSE;
}
int
mask_match (char *str, char *regexp, int trans2)
{
char *p = NULL;
char pattern[1024] = { 0x00, }, string[1024] =
{
0x00,};
int matched;
if (str == NULL || regexp == NULL)
return TRUE;
strncpy (pattern, regexp, sizeof (pattern) - 1);
for (p = pattern; *p; p++)

{
while (*p == '*' && (p[1] == '?' || p[1] == '*'))

{
memmove (p + 1, p + 2, strlen (p + 2));
p[strlen (p + 2) + 1] = 0;
}
}
if (!strcasecmp (pattern, "*"))
return (TRUE);
strncpy (string, str, sizeof (string) - 1);
matched = do_match (string, pattern);
return matched;
}

0

Commented:
Hi (dhanaspace1), welcome to EE.

All of the experts here, for the most part have learn from other experts as to the proper etiquette for posting answer.

If you read the following link, you'll see why this is the preferred method for many of our valued experts, including myself.

Hi (korsila):
Feel free to click the [Reject Answer] button near (Answer-poster's) response, even if it seems like a good answer.
Doing so will increase your chance of obtaining additional input from other experts.  Later, you can click the [Select Comment as Answer] button on any response.
0

Author Commented:
First of all, I think Axter is right...
secondly,  the code that dhanaspace1 has posted it here didn't work...I got something instead of correct outpput of name macthes after running the code:
"Usage: expression matcher"

so if you could figure out what it was and fix it for me..that would be ok though..!!
am glad load of experts are paying attendtion with my question and trying to help me out..!!!

Korsil;a
0

Author Commented:
First of all, I think Axter is right...
secondly,  the code that dhanaspace1 has posted it here didn't work...I got something instead of correct outpput of name macthes after running the code:
"Usage: expression matcher"

so if you could figure out what it was and fix it for me..that would be ok though..!!
am glad load of experts are paying attendtion with my question and trying to help me out..!!!

Korsil;a
0

Author Commented:
make sure you got subroutine called "position()" which will handle the job above...just wanna make it tidy..!!!
Korsila

0

Author Commented:
so 4 steps in "posiiton ()" subroutine -if you could :
1. exact match  , smit--->smit
2. beginning match, smit--> smity, smith, smithy
3. middle match, smit-->willsmither, yungsmitty
4. ending match, smit-->willsmit, yungsmit

thanks,
Korsila
0

Commented:
/* Revisited... :) */
#include <stdio.h>
#include <fcntl.h>
#include <strings.h>

/*
** Returns 1 if the *target contains *src
*/
int matches(char *src, char *target) {
int i;

for(i = 0; i < strlen(target); i++)
if(!strncmp(&target[i], src, strlen(src))) return(1);

return(0);
}

int main(int argc, char *argv[]) {
char surname[12];
/* char data[4][15] = { "smith", "willsmith", "bobby", "willsmithson" }; */
char data[80];
int i;
FILE *fp;

if(argc == 2) {

/*
** Read the surname to test
*/
printf("Enter a surname [max 12 chars] (example : smi) : ");
fgets(surname, 12, stdin);
surname[strlen(surname) - 1] = '\0';

/*
** Loop through the datafile printing matches
*/
fp = fopen(argv[1], "r");
if(fp != NULL) {
while(fgets(data, 80, fp))
if(matches(surname, data))
printf("%s", data);

return(0);
}
else {
printf("Error accessing %s\n", argv[1]);
return(1);
}
fclose(fp);

}
else {
printf("Usage : %s <datafile>\n", argv[0]);
return(1);
}
}
0

Commented:
what was wrong with using Sapa's answer and using strstr?

if(strstr(szLookIn, szFind) != NULL)
{
/* we found it */
}
0

Author Commented:
Ebosscher,
I need some help with the code , not some comments without it otherwise I CANN'T TEST how it works...!!!!as I am not that expert, but I would love to learn...provide me the code and I WILL run it if it works you got the poitns and I got help that's fair is n't it...?

Korsila

0

Author Commented:
smisk,
I WIL HAVE A GO WITH YOUR CODE AND WILL let you know as soon as I get it run/done...

cheers,
Korsila
0

Author Commented:
Is it ok for anyone if I would accept Smisk as the best answer here..as I got it working with my datafile and I reckon i would accept his asnwer as the good one..!!!

anybody is happy with that....? I Wwill wait for "dhanaspace1" for updating new codes and "Sapa" for implementing the code not commenting algorithm..!!!

Korsila
0

Commented:
Korsila,

You can basically swap out my "matches()" function with a call to "strstr()" which I had no idea existed (you learn something new every day! :)).

Try doing this switch :

fp = fopen(argv[1], "r");
if(fp != NULL) {
while(fgets(data, 80, fp))
if(matches(surname, data))
printf("%s", data);

fp = fopen(argv[1], "r");
if(fp != NULL) {
while(fgets(data, 80, fp))
if(strstr(data, surname))
printf("%s", data);

If that works for you (it does for me), use it and remove the function declaration of "matches()" from the program.  Note that the parameters are reversed in the call to strstr()...

Thanks,
Steve
0

Author Commented:
Dear Smis or Steve,

here is what I have done to swap your matches() to strstr()

/* Revisited... :) */
#include <stdio.h>
#include <fcntl.h>
#include <strings.h>

/*
** Returns 1 if the *target contains *src
*/
int strstr(char *src, char *target) {
int i;

for(i = 0; i < strlen(target); i++)
if(!strncmp(&target[i], src, strlen(src))) return(1);

return(0);
}

int main(int argc, char *argv[]) {
char surname[12];
/* char data[4][15] = { "smith", "willsmith", "bobby", "willsmithson" }; */
char data[80];
int i;
FILE *fp;

if(argc == 2) {

/*
** Read the surname to test
*/
printf("Enter a surname [max 12 chars] (example : smi) : ");
fgets(surname, 12, stdin);
surname[strlen(surname) - 1] = '\0';

/*
** Loop through the datafile printing matches
*/
fp = fopen(argv[1], "r");
if(fp != NULL) {
while(fgets(data, 80, fp))
if(strstr(data, surname))
printf("%s", data);

return(0);
}
else {
printf("Error accessing %s\n", argv[1]);
return(1);
}
fclose(fp);

}
else {
printf("Usage : %s <datafile>\n", argv[0]);
return(1);
}
}

however, it got error when I compiled the code:
line 10: Inconsistent type declaration: "strstr"
line 10: Inconsistent parameter list declaration for "strstr"

can you provide your new update code (whole program)
0

Author Commented:
Smisk,
that's better...but whenn I test with a large datfile which copntained aprroximately 5000 names..the program returns nothing...

or this is the cause of problem:
while(fgets(data, 80, fp))
you set up only 80 names or something..????
can you make it unlimited...both input name and name in datafile...
can I change these 2 line or even get rid of them:
char surname[12];
char data[80];

many thanks,
Korsila

0

Author Commented:
Smisk,
i have tried to change these 2 lines:
char surname[12];
char data[80];

to
char surname[120];
char data[500];

and while(fgets(data, 500, fp))
fgets(surname, 120, stdin);

however, the program still return nothing after i enter name like "smith" or "smi" from a large datfile whichj contained more than 5000 names...

what we should do next?
I thuink to get rid of the unlimited length of input name and names in datafile is sa solution..any suggestion ..we are almost there

korsila
0

Author Commented:
aha ha I have just found the real problem...

becus the input name is "lower case" but the names in datfile are upper case...!!!that's why the program couldn't generate any output...!!!

but is it going to work with a large datfile..??? please answer me..what i should do to set up an unlimited length of input name and datafile...(or maximum)
last question i hope..!!
many thanks,
Korsila
0

Commented:
korsila,

Hmm...  It works fine with a 75,000 line recordset for me.  Does the program print an error message or anything?  Keep in mind that this search IS case sensitive so "SMI" will not match "smi"!

Maybe you're on to something though.  Here's an explanation of those variables :

/*
** This is a character array of 12 characters that will
** hold the input.  This means that the input can only
** be 11 characters long at a maximum (one for the null).
** If you want to hold longer names, extend this as well
** as the "fgets(surname, 12, stdin);" to reflect the new
** length.
*/
char surname[12];

/*
** This is the maximum length of a line to read from
** the file.  It really shouldn't be a problem unless
** you have really long lines in the file...
*/
char line[80];

0

Author Commented:
aha ha I have just found the real problem...

becus the input name is "lower case" but the names in datfile are upper case...!!!that's why the program couldn't generate any output...!!!

but is it going to work with a large datfile..??? please answer me..what i should do to set up an unlimited length of input name and datafile...(or maximum)
last question i hope..!!
many thanks,
Korsila
0

Author Commented:
Smisk,
yeah , you are absolutely right , just a case sensitive though, I wil accept your answer anyway...!!!
weldone and a million thank for your great help...

Korsila
0

Author Commented:
0

Commented:
No problem...
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.