Solved

name matching algorithm

Posted on 2002-07-10
28
495 Views
Last Modified: 2010-04-15
Hi !!!

Does anyone know about the matching algorithm which can handle this situation:

for example..we saecrh for "smi" (surname)
and when we type "smi" and press enter, the program looks into the datfile (surname.dat) and returns
the list which begined with "smi***" or in between ***smi*** or end uped with "***smi"
nysmi
nismithy
smi
smid
smith
smithe
smithy
smithye
wysmi

etc..

or we type  "smith"
the matching algorithm generates the list of matches as following:

smith
smithey
smithy
smithye
willsmith
willsmithyoung
youngsmith


or  type "will"

youngwillsmith
will
willsmith
----------------------

if this algorithm doesn't exist , could anyone implement the simple code which overcome this problem above

many thanks,

Korsila
0
Comment
Question by:korsila
  • 16
  • 7
  • 2
  • +3
28 Comments
 
LVL 30

Expert Comment

by:Axter
ID: 7143829
Is this homework?
0
 
LVL 1

Expert Comment

by:smisk
ID: 7143845
/*
** How's this?
*/
#include <stdio.h>
#include <strings.h>

/*
** Returns 1 if the *target contains *src
*/
int matches(char *src, char *target) {
    int i;

    for(i = 0; i < strlen(target); i++)
        if(!strncmp(&target[i], src, strlen(src))) return(1);

    return(0);
}

int main(int argc, char *argv[]) {

    char *surname = "smi";
    char data[4][15] = { "smith", "willsmith", "bobby", "willsmithson" };
    int i;

    /*
    ** Loop through the array printing matches
    */
    for(i = 0; i < 4; i++)
        if(matches(surname, data[i]))
            printf("%s\n", data[i]);

    return(1);
}
0
 

Author Comment

by:korsila
ID: 7143881
aXTER,
Not a homeowrk, it's something I am interested in most of my work..!!!but as you might not know am just a beginer of C or other programming language apart from HTML...

cheers,
KORSILA
0
 
LVL 1

Expert Comment

by:smisk
ID: 7143890
Korsila,

  Yeah I looked around and you asked some questions similar to this (in SQL, etc.) so I figured it wasn't homework.  Try to read through and understand the code I gave.  Also, the loop in "matches" does too many iterations.  See if you can figure out how to change it to speed things up (and avoid possible memory problems).

Thanks,
Steve
0
 
LVL 5

Expert Comment

by:Sapa
ID: 7143891
Use standard strstr function:

NAME
       strstr - locate a substring

SYNOPSIS
       #include <string.h>

       char *strstr(const char *haystack, const char *needle);

DESCRIPTION
       The  strstr()  function  finds the first occurrence of the
       substring needle in the string haystack.  The  terminating
       `\0' characters are not compared.

RETURN VALUE
       The  strstr()  function returns a pointer to the beginning
       of the substring, or NULL if the substring is not found.

0
 

Author Comment

by:korsila
ID: 7143901
Smisk,

can you code and test it with datafile...(surname.dat)

surname.dat
             
              barra
              baramy
              nabarra
              nybarramy
               nysmi
               nismithy
               smi
               smid
               smith
               smithe
               smithy
               smithye
               will
               willsmith
               willsmithyoung
               wysmi
               youngsmith
               youngwillsmith
   
-------------------------------

so the output should look like this:

enter name to macth: smith ---(example)---chick eneter

return name mathed with above requirement..!!!!

can you do this....

cheers,
Korsila

0
 

Expert Comment

by:dhanaspace1
ID: 7143912
hi,
   u can not get more easy than this.....
 i strongly belive, it _must_ help u.
sooooooooooo, i may expect 100 pts.

i will work, perfect.



# include <stdio.h>  
# include <stdlib.h>    
# include <string.h>
 
# define TRUE 1
# define FALSE 0
  main (int c, char **argv)
{
  char t[128] = { 0x00, };
  char one[10] = { 0x00, }, two[10] =
  {
  0x00,};
  if (c != 3)
   
    {
      printf ("\nUsage: expression matcher\n\n");
      return 0;
    }
  /* call like */

  /* fp = fopen ("./surname.dat","r")
  *  fgets (...) ;  read each name
  *  then
  *    mask_match (name,"smi**" or "**smi**")...
  *    on success 1 on fail 0
  *    */
  printf ("\n[%d]\n", mask_match (argv[1], argv[2], 1));
  return 0;
}
int
do_match (char *str, char *regexp)
{
  char *p;
  for (p = regexp; *p && *str;)
   
    {
      switch (*p)
     
     {
     case '?':
       str++;
       p++;
       break;
     case '*':
       p++;
       if (!*p)
         return TRUE;
       while (*str)
         
         {
           while (*str && (toupper (*p) != toupper (*str)))
          str++;
           if (do_match (str, p))
          return TRUE;
           if (!*str)
          return FALSE;
           str++;
         }
       return FALSE;
     default:
       if (toupper (*str) != toupper (*p))
         
         {
           return FALSE;
         }
       str++;
       p++;
       break;
     }
    }
  if (!*p && !*str)
    return TRUE;
  if (!*p && (str[0] == '.') && (str[1] == 0))
    return (TRUE);
  if (!*str && *p == '?')
   
    {
      while (*p == '?')
     p++;
      return (!*p);
    }
  if (!*str && (*p == '*') && (p[1] == '\0'))
    return TRUE;
  return FALSE;
}
int
mask_match (char *str, char *regexp, int trans2)
{
  char *p = NULL;
  char pattern[1024] = { 0x00, }, string[1024] =
  {
  0x00,};
  int matched;
  if (str == NULL || regexp == NULL)
    return TRUE;
  strncpy (pattern, regexp, sizeof (pattern) - 1);
  for (p = pattern; *p; p++)
   
    {
      while (*p == '*' && (p[1] == '?' || p[1] == '*'))
     
     {
       memmove (p + 1, p + 2, strlen (p + 2));
       p[strlen (p + 2) + 1] = 0;
     }
    }
  if (!strcasecmp (pattern, "*"))
    return (TRUE);
  strncpy (string, str, sizeof (string) - 1);
  matched = do_match (string, pattern);
  return matched;
}


0
 
LVL 30

Expert Comment

by:Axter
ID: 7143932
Hi (dhanaspace1), welcome to EE.

All of the experts here, for the most part have learn from other experts as to the proper etiquette for posting answer.

An answer should not be posted as an answer, if other experts have previously posted possible answers as comments, and/or have already made contributions to the question.

There are many experts who never post answers as answer.  Instead, they post their answers as comments.

If you read the following link, you'll see why this is the preferred method for many of our valued experts, including myself.

http://www.experts-exchange.com/jsp/cmtyQuestAnswer.jsp

Hi (korsila):
Feel free to click the [Reject Answer] button near (Answer-poster's) response, even if it seems like a good answer.
Doing so will increase your chance of obtaining additional input from other experts.  Later, you can click the [Select Comment as Answer] button on any response.
0
 

Author Comment

by:korsila
ID: 7143960
First of all, I think Axter is right...
secondly,  the code that dhanaspace1 has posted it here didn't work...I got something instead of correct outpput of name macthes after running the code:
 "Usage: expression matcher"

so if you could figure out what it was and fix it for me..that would be ok though..!!
am glad load of experts are paying attendtion with my question and trying to help me out..!!!
thanks load..!!

Korsil;a
0
 

Author Comment

by:korsila
ID: 7143964
First of all, I think Axter is right...
secondly,  the code that dhanaspace1 has posted it here didn't work...I got something instead of correct outpput of name macthes after running the code:
 "Usage: expression matcher"

so if you could figure out what it was and fix it for me..that would be ok though..!!
am glad load of experts are paying attendtion with my question and trying to help me out..!!!
thanks load..!!

Korsil;a
0
 

Author Comment

by:korsila
ID: 7143972
make sure you got subroutine called "position()" which will handle the job above...just wanna make it tidy..!!!
Korsila

0
 

Author Comment

by:korsila
ID: 7144015
so 4 steps in "posiiton ()" subroutine -if you could :
1. exact match  , smit--->smit
2. beginning match, smit--> smity, smith, smithy
3. middle match, smit-->willsmither, yungsmitty
4. ending match, smit-->willsmit, yungsmit
 

hope that's not bad..!!!

thanks,
Korsila
0
 
LVL 1

Expert Comment

by:smisk
ID: 7144111
/* Revisited... :) */
#include <stdio.h>
#include <fcntl.h>
#include <strings.h>

/*
** Returns 1 if the *target contains *src
*/
int matches(char *src, char *target) {
    int i;

    for(i = 0; i < strlen(target); i++)
        if(!strncmp(&target[i], src, strlen(src))) return(1);

    return(0);
}

int main(int argc, char *argv[]) {
    char surname[12];
    /* char data[4][15] = { "smith", "willsmith", "bobby", "willsmithson" }; */
    char data[80];
    int i;
    FILE *fp;

    if(argc == 2) {

        /*
        ** Read the surname to test
        */
        printf("Enter a surname [max 12 chars] (example : smi) : ");
        fgets(surname, 12, stdin);
        surname[strlen(surname) - 1] = '\0';

        /*
        ** Loop through the datafile printing matches
        */
        fp = fopen(argv[1], "r");
        if(fp != NULL) {
            while(fgets(data, 80, fp))
                if(matches(surname, data))
                    printf("%s", data);

            return(0);
        }
        else {
            printf("Error accessing %s\n", argv[1]);
            return(1);
        }
        fclose(fp);

    }
    else {
        printf("Usage : %s <datafile>\n", argv[0]);
        return(1);
    }
}
0
 
LVL 6

Expert Comment

by:ebosscher
ID: 7144422
what was wrong with using Sapa's answer and using strstr?

if(strstr(szLookIn, szFind) != NULL)
{
  /* we found it */
}
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 

Author Comment

by:korsila
ID: 7144824
Ebosscher,
I need some help with the code , not some comments without it otherwise I CANN'T TEST how it works...!!!!as I am not that expert, but I would love to learn...provide me the code and I WILL run it if it works you got the poitns and I got help that's fair is n't it...?

Korsila

0
 

Author Comment

by:korsila
ID: 7144825
smisk,
I WIL HAVE A GO WITH YOUR CODE AND WILL let you know as soon as I get it run/done...

cheers,
Korsila
0
 

Author Comment

by:korsila
ID: 7144862
Is it ok for anyone if I would accept Smisk as the best answer here..as I got it working with my datafile and I reckon i would accept his asnwer as the good one..!!!

anybody is happy with that....? I Wwill wait for "dhanaspace1" for updating new codes and "Sapa" for implementing the code not commenting algorithm..!!!

Korsila
0
 
LVL 1

Expert Comment

by:smisk
ID: 7144956
Korsila,

  You can basically swap out my "matches()" function with a call to "strstr()" which I had no idea existed (you learn something new every day! :)).

Try doing this switch :

fp = fopen(argv[1], "r");
       if(fp != NULL) {
           while(fgets(data, 80, fp))
               if(matches(surname, data))
                   printf("%s", data);

Instead becomes :

fp = fopen(argv[1], "r");
       if(fp != NULL) {
           while(fgets(data, 80, fp))
               if(strstr(data, surname))
                   printf("%s", data);

If that works for you (it does for me), use it and remove the function declaration of "matches()" from the program.  Note that the parameters are reversed in the call to strstr()...

Thanks,
Steve
0
 

Author Comment

by:korsila
ID: 7145660
Dear Smis or Steve,

here is what I have done to swap your matches() to strstr()

/* Revisited... :) */
#include <stdio.h>
#include <fcntl.h>
#include <strings.h>

/*
** Returns 1 if the *target contains *src
*/
int strstr(char *src, char *target) {
   int i;

   for(i = 0; i < strlen(target); i++)
       if(!strncmp(&target[i], src, strlen(src))) return(1);

   return(0);
}

int main(int argc, char *argv[]) {
   char surname[12];
   /* char data[4][15] = { "smith", "willsmith", "bobby", "willsmithson" }; */
   char data[80];
   int i;
   FILE *fp;

   if(argc == 2) {

       /*
       ** Read the surname to test
       */
       printf("Enter a surname [max 12 chars] (example : smi) : ");
       fgets(surname, 12, stdin);
       surname[strlen(surname) - 1] = '\0';

       /*
       ** Loop through the datafile printing matches
       */
       fp = fopen(argv[1], "r");
       if(fp != NULL) {
           while(fgets(data, 80, fp))
               if(strstr(data, surname))
                   printf("%s", data);

           return(0);
       }
       else {
           printf("Error accessing %s\n", argv[1]);
           return(1);
       }
       fclose(fp);

   }
   else {
       printf("Usage : %s <datafile>\n", argv[0]);
       return(1);
   }
}


however, it got error when I compiled the code:
line 10: Inconsistent type declaration: "strstr"
line 10: Inconsistent parameter list declaration for "strstr"


can you provide your new update code (whole program)
0
 
LVL 1

Accepted Solution

by:
smisk earned 100 total points
ID: 7146058
Korsila,

Nononono...  The whole point of using strstr() is that it is already a function defined in the library so you don't need to define it.  Try this :

/* Revisited again... :) */
#include <stdio.h>
#include <fcntl.h>
#include <strings.h>

int main(int argc, char *argv[]) {
  char surname[12];
  char data[80];
  int i;
  FILE *fp;

  if(argc == 2) {

      /*
      ** Read the surname to test
      */
      printf("Enter a surname [max 12 chars] (example : smi) : ");
      fgets(surname, 12, stdin);
      surname[strlen(surname) - 1] = '\0';

      /*
      ** Loop through the datafile printing matches
      */
      fp = fopen(argv[1], "r");
      if(fp != NULL) {
          while(fgets(data, 80, fp))
              if(strstr(data, surname))
                  printf("%s", data);

          return(0);
      }
      else {
          printf("Error accessing %s\n", argv[1]);
          return(1);
      }
      fclose(fp);

  }
  else {
      printf("Usage : %s <datafile>\n", argv[0]);
      return(1);
  }
}
0
 

Author Comment

by:korsila
ID: 7146129
Smisk,
that's better...but whenn I test with a large datfile which copntained aprroximately 5000 names..the program returns nothing...

or this is the cause of problem:
while(fgets(data, 80, fp))
you set up only 80 names or something..????
can you make it unlimited...both input name and name in datafile...
can I change these 2 line or even get rid of them:
char surname[12];
 char data[80];

many thanks,
Korsila
 
0
 

Author Comment

by:korsila
ID: 7146160
Smisk,
i have tried to change these 2 lines:
char surname[12];
char data[80];

to
char surname[120];
char data[500];

and while(fgets(data, 500, fp))
fgets(surname, 120, stdin);

however, the program still return nothing after i enter name like "smith" or "smi" from a large datfile whichj contained more than 5000 names...

what we should do next?
I thuink to get rid of the unlimited length of input name and names in datafile is sa solution..any suggestion ..we are almost there

korsila
0
 

Author Comment

by:korsila
ID: 7146171
aha ha I have just found the real problem...

becus the input name is "lower case" but the names in datfile are upper case...!!!that's why the program couldn't generate any output...!!!

but is it going to work with a large datfile..??? please answer me..what i should do to set up an unlimited length of input name and datafile...(or maximum)
last question i hope..!!
many thanks,
Korsila
0
 
LVL 1

Expert Comment

by:smisk
ID: 7146177
korsila,

  Hmm...  It works fine with a 75,000 line recordset for me.  Does the program print an error message or anything?  Keep in mind that this search IS case sensitive so "SMI" will not match "smi"!

Maybe you're on to something though.  Here's an explanation of those variables :

/*
** This is a character array of 12 characters that will
** hold the input.  This means that the input can only
** be 11 characters long at a maximum (one for the null).
** If you want to hold longer names, extend this as well
** as the "fgets(surname, 12, stdin);" to reflect the new
** length.
*/
char surname[12];

/*
** This is the maximum length of a line to read from
** the file.  It really shouldn't be a problem unless
** you have really long lines in the file...
*/
char line[80];

0
 

Author Comment

by:korsila
ID: 7146192
aha ha I have just found the real problem...

becus the input name is "lower case" but the names in datfile are upper case...!!!that's why the program couldn't generate any output...!!!

but is it going to work with a large datfile..??? please answer me..what i should do to set up an unlimited length of input name and datafile...(or maximum)
last question i hope..!!
many thanks,
Korsila
0
 

Author Comment

by:korsila
ID: 7146207
Smisk,
yeah , you are absolutely right , just a case sensitive though, I wil accept your answer anyway...!!!
weldone and a million thank for your great help...

Korsila
0
 

Author Comment

by:korsila
ID: 7146222
glad to get your brilliant help..!!!
0
 
LVL 1

Expert Comment

by:smisk
ID: 7146223
No problem...
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

An Outlet in Cocoa is a persistent reference to a GUI control; it connects a property (a variable) to a control.  For example, it is common to create an Outlet for the text field GUI control and change the text that appears in this field via that Ou…
This tutorial is posted by Aaron Wojnowski, administrator at SDKExpert.net.  To view more iPhone tutorials, visit www.sdkexpert.net. This is a very simple tutorial on finding the user's current location easily. In this tutorial, you will learn ho…
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use while-loops in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use switch statements in the C programming language.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now