Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 333
  • Last Modified:

improved Simplex algorithm -Imladris

Dear Imladris,

from:
http://www.experts-exchange.com/jsp/qManageQuestion.jsp?ta=cprog&qid=20132118
I would like to improve name matching in simplex codes the one you implemented...
here are my improved requirements:

If AIEOU are not the first letter then change to A if not return AEIOU
-if Y not the first or last character change Y to A
-change PH -->F
-change M--> N
-remove all duplicated letters such as "smitt" should be smat (no another "t") similarly, "smmit" should be "smat" (not smmat) , or smiss shouold be smas

-change TH --> T
-change KN-> N, K-->C
-chnange DG-->G
-change PF-->F
-change RH-->R
-change WR-->R
If Z not the first character change Z-->S

change these suffix :
IX--> IC
EX-->EC
YE-->Y
EE-->Y
IE-->Y
DT-->D
RT-->D
RD-->D
NT-->N
ND-->N
EV-->EF

and that's it cann't think for more :)

You can have a look at  some of similar example codes from:
 http://www.experts-exchange.com/jsp/qManageQuestion.jsp?ta=cprog&qid=20116615
maybe you have an idea...

thanks

Korsila

p.s. hope this time it would work with my 100 names datafile...
 
0
korsila
Asked:
korsila
  • 10
  • 7
1 Solution
 
imladrisCommented:
I have a preliminary implementation of this. I have made the assumption that "PPPPPPH" should reduce to "F" (do multi P reduction to one P, followed by PH reduction to F (as opposed to the other way around: first PH reduction to F, then multi P reduction which will yield PF)).

Secondly there is a question. If the suffix tests are at the end of the processing (which seemed logical to me) many of them become redundant. For instance, a suffix if "IX" will never be encountered because the I will have already been changed to an "A". Ditto for "EX" and "EV". "YE", "EE" and "IE" all disappear because trailing vowels are already eliminated. So the question is, should these suffix tests be dropped? Or do you want to process the suffixes first?

Similarly if K is transformed into C, the test for KN will never be tripped. Should it be dropped? Or should the KN test be processed "before" the K test.

Lastly, again, for brand new algorithm work, something around 100 points is more appropriate.
0
 
korsilaAuthor Commented:
Imladris,
that's a useful comment
*** is my answer

I have a preliminary implementation of this. I have made the assumption that "PPPPPPH" should reduce
to "F" (do multi P reduction to one P, followed by PH reduction to F

*****do multi P reduction to one P, followed by PH reduction to F

****Change these suffix :
                 X--> C --get rid of E
                 V-->F
and so on...

check or  process the following suffixes first:
                YE-->Y
                 EE-->Y
                 IE-->Y

***Do the KN test be processed "before" the K test..

hope it 's clearer...
many thanks...

korsila
p.s. let see if you can come up with a good codes and  it works with my datfile i would consider more points for you..
0
 
imladrisCommented:
I'm not quite clear on this bit:

****Change these suffix :
                X--> C --get rid of E
                V-->F
and so on...

X and V were not in the original suffix list. Do you mean that you want suffix "EX" changed to "C" and suffix "EV" changed to "F" and for them to be processed beforehand like "YE", "EE" and "IE"?

Or should X's be changed to C's in general, along with V's to F's (what does the get rid of E mean in that case)?
0
Building an Effective Phishing Protection Program

Join Director of Product Management Todd OBoyle on April 26th as he covers the key elements of a phishing protection program. Whether you’re an old hat at phishing education or considering starting a program -- we'll discuss critical components that should be in any program.

 
korsilaAuthor Commented:
sorry I didn't make it clearer..

here the suffix which should be changed firstly....

                            change these suffix first :
                            X--> C
                            YE-->Y
                            EE-->Y
                            IE-->Y
                            DT-->D
                            RT-->D
                            RD-->D
                            NT-->N
                            ND-->N
                            V-->F


****Change these suffix :
                                           X--> C --get rid of E
                                           V-->F

means that i GET RID OF E from EX-->EC
AND EV-->EF so as you can see above , just to change X-->C and V-->F

korsila


0
 
korsilaAuthor Commented:
Hope it's a fair point

korsila
0
 
imladrisCommented:
Points is fine.

Here is the new simplex program:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>


int FullFlag;
char *Simplex(char *,char *);
void getinput(char *prompt,char n[]);

void main(int argc,char *argv[])
{   int matched,msave,usave,mc;
  char name[150],line[500],buf1[150],buf2[150];
  FILE *namefile,*mfile,*ufile;

  printf("Save matched names in matched.dat (y or n)?\n");
  gets(line);
  msave=(toupper(line[0])=='Y');
  printf("Save unmatched names in unmatched.dat (y or n)?\n");
  gets(line);
  usave=(toupper(line[0])=='Y');
  namefile=fopen("surnames.dat","r+"); //open file in read/write mode
  if(msave)mfile=fopen("matched.dat","w");
  if(usave)ufile=fopen("unmatched.dat","w");
  FullFlag=0;
  do
  {   printf("Please enter name to match:\n");
      gets(name);
        printf("Simplex: %s\n",Simplex(name,buf1));
      fseek(namefile,0L,0); //seek to start of file
      matched=0;
      mc=0;
      while(fgets(line,500,namefile)!=NULL)
      {   line[strlen(line)-1]='\0';
          if(strcmp(Simplex(name,buf1),Simplex(line,buf2))==0)
          {    printf("%s\n",line);
               ++mc;
               if(msave)
               {   if(matched==0)fprintf(mfile,"Match name %s\n",name);
                   fprintf(mfile,"%s\n",line);
               }
               matched=1;
          }
      }
      if(matched==1)
       {   if(msave)fprintf(mfile,"----------------------------------------\n");
           printf("Number of Matches: %d\n",mc);
       }
      else
      {   printf("No Matches Found.\n");
          if(usave)
          {   fprintf(ufile,"Match name %s\n",name);
              fprintf(ufile,"No Matches found\n");
              fprintf(ufile,"--------------------\n");
          }
      }
      printf("Compare again (y or n)?\n");
      gets(line);
  } while(toupper(line[0])=='Y');
  fclose(namefile);
  if(msave)fclose(mfile);
  if(usave)fclose(ufile);
}


// prompt user for input
// get input
// copy it safely into provided variable

void getinput(char *prompt,char n[])
{   char ipc[150];

  printf("%s:\n",prompt);
  gets(ipc);
  strncpy(n,ipc,19);
  n[19]='\0';
  return;
}


char vowel[]="aeiouyAEIOUY";

char *Simplex(char *name,char *buf)
{   int i,j,len,procvwl;
    char nc,*sfx,*temp;

    j=strlen(name);
    temp=(char *)malloc(strlen(name)+1);
    strcpy(temp,name);
    sfx=temp+(j-2);
    if(toupper(*(sfx+1))=='X')*(sfx+1)='C';
    else if(toupper(*(sfx+1))=='V')*(sfx+1)='F';
    else if(stricmp(sfx,"YE")==0)temp[j-1]='\0';
    else if(stricmp(sfx,"EE")==0 || stricmp(sfx,"IE")==0)
      {   temp[j-2]='Y';
        temp[j-1]='\0';
      }
    else if(stricmp(sfx,"DT")==0)temp[j-1]='\0';
    else if(stricmp(sfx,"RT")==0 || stricmp(sfx,"RD")==0)
      {   temp[j-2]='D';
        temp[j-1]='\0';
      }
    else if(stricmp(sfx,"NT")==0 || stricmp(sfx,"ND")==0)temp[j-1]='\0';
    procvwl=0;
      j=-1;
      len=strlen(temp);
    for(i=0; i<len; ++i)
      {   nc=toupper(temp[i]);
        if(strchr(vowel,nc)==NULL)
            {   procvwl=0;
            if(nc!=buf[j])
                  {   if(buf[j]=='P' && nc=='H')buf[j]='F';
                        else if(buf[j]=='T' && nc=='H')buf[j]='T';
                        else if(buf[j]=='K' && nc=='N')buf[j]='N';
                        else if(buf[j]=='D' && nc=='G')buf[j]='G';
                        else if(buf[j]=='P' && nc=='F')buf[j]='F';
                        else if(buf[j]=='R' && nc=='H')buf[j]='R';
                        else if(buf[j]=='W' && nc=='R')buf[j]='R';
                        else if(nc=='K' && toupper(temp[i+1])!='N')
                        {      if(buf[j]!='C')buf[++j]='C';
                        }
                        else if(nc=='M')
                        {   if(buf[j]!='N')buf[++j]='N';
                        }
                        else if(nc=='Z' && i!=0)
                        {      if(buf[j]!='S')buf[++j]='S';
                        }
                else buf[++j]=nc;
                  }
            }
        else
            {   if(i==0 || (nc=='Y' && i==len-1))buf[++j]=nc;
            else if(!procvwl)
                  {   procvwl=1;
                buf[++j]='A';
                  }
            }
    }
      if(buf[j]!='A')++j;
      buf[j]='\0';
      free(temp);
    return(buf);
}


If this doesn't work on the large file, we will have to investigate that. I'm guessing that it is not the size itself that is causing the problem, but something about the structure of the file. The program, for instance, makes no allowances for leading blanks, and things like that.

0
 
korsilaAuthor Commented:
Dear Imladris,

I couldn't be able to run it since after i compiled I got this message :
"Unsatified Symbols: stricmp (code)"

and I got another file from compiling simplex.c which was "simplex.o"

and then when I run it the message appeared telling "cannot execute"

------
do i need to change "stricmp" to "strncmp" or  whatelse I should do..


many thanks,
Korsila

 
0
 
imladrisCommented:
OK. Here is a version that does not rely on stricmp:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>


int FullFlag;
char *Simplex(char *,char *);
void getinput(char *prompt,char n[]);

void main(int argc,char *argv[])
{   int matched,msave,usave,mc;
  char name[150],line[500],buf1[150],buf2[150];
  FILE *namefile,*mfile,*ufile;

  printf("Save matched names in matched.dat (y or n)?\n");
  gets(line);
  msave=(toupper(line[0])=='Y');
  printf("Save unmatched names in unmatched.dat (y or n)?\n");
  gets(line);
  usave=(toupper(line[0])=='Y');
  namefile=fopen("surnames.dat","r+"); //open file in read/write mode
  if(msave)mfile=fopen("matched.dat","w");
  if(usave)ufile=fopen("unmatched.dat","w");
  FullFlag=0;
  do
  {   printf("Please enter name to match:\n");
      gets(name);
        printf("Simplex: %s\n",Simplex(name,buf1));
      fseek(namefile,0L,0); //seek to start of file
      matched=0;
      mc=0;
      while(fgets(line,500,namefile)!=NULL)
      {   line[strlen(line)-1]='\0';
          if(strcmp(Simplex(name,buf1),Simplex(line,buf2))==0)
          {    printf("%s\n",line);
               ++mc;
               if(msave)
               {   if(matched==0)fprintf(mfile,"Match name %s\n",name);
                   fprintf(mfile,"%s\n",line);
               }
               matched=1;
          }
      }
      if(matched==1)
       {   if(msave)fprintf(mfile,"----------------------------------------\n");
           printf("Number of Matches: %d\n",mc);
       }
      else
      {   printf("No Matches Found.\n");
          if(usave)
          {   fprintf(ufile,"Match name %s\n",name);
              fprintf(ufile,"No Matches found\n");
              fprintf(ufile,"--------------------\n");
          }
      }
      printf("Compare again (y or n)?\n");
      gets(line);
  } while(toupper(line[0])=='Y');
  fclose(namefile);
  if(msave)fclose(mfile);
  if(usave)fclose(ufile);
}


// prompt user for input
// get input
// copy it safely into provided variable

void getinput(char *prompt,char n[])
{   char ipc[150];

  printf("%s:\n",prompt);
  gets(ipc);
  strncpy(n,ipc,19);
  n[19]='\0';
  return;
}


char vowel[]="aeiouyAEIOUY";

char *Simplex(char *name,char *buf)
{   int i,j,len,procvwl;
    char nc,*sfx,*temp;

    j=strlen(name);
    temp=(char *)malloc(strlen(name)+1);
    strcpy(temp,name);
    sfx=temp+(j-2);
      *sfx=toupper(*sfx);
      *(sfx+1)=toupper(*(sfx+1));
    if(*(sfx+1)=='X')*(sfx+1)='C';
    else if(*(sfx+1)=='V')*(sfx+1)='F';
    else if(strcmp(sfx,"YE")==0)temp[j-1]='\0';
    else if(strcmp(sfx,"EE")==0 || strcmp(sfx,"IE")==0)
      {   temp[j-2]='Y';
        temp[j-1]='\0';
      }
    else if(strcmp(sfx,"DT")==0)temp[j-1]='\0';
    else if(strcmp(sfx,"RT")==0 || strcmp(sfx,"RD")==0)
      {   temp[j-2]='D';
        temp[j-1]='\0';
      }
    else if(strcmp(sfx,"NT")==0 || strcmp(sfx,"ND")==0)temp[j-1]='\0';
    procvwl=0;
      j=-1;
      len=strlen(temp);
    for(i=0; i<len; ++i)
      {   nc=toupper(temp[i]);
        if(strchr(vowel,nc)==NULL)
            {   procvwl=0;
            if(nc!=buf[j])
                  {   if(buf[j]=='P' && nc=='H')buf[j]='F';
                        else if(buf[j]=='T' && nc=='H')buf[j]='T';
                        else if(buf[j]=='K' && nc=='N')buf[j]='N';
                        else if(buf[j]=='D' && nc=='G')buf[j]='G';
                        else if(buf[j]=='P' && nc=='F')buf[j]='F';
                        else if(buf[j]=='R' && nc=='H')buf[j]='R';
                        else if(buf[j]=='W' && nc=='R')buf[j]='R';
                        else if(nc=='K' && toupper(temp[i+1])!='N')
                        {      if(buf[j]!='C')buf[++j]='C';
                        }
                        else if(nc=='M')
                        {   if(buf[j]!='N')buf[++j]='N';
                        }
                        else if(nc=='Z' && i!=0)
                        {      if(buf[j]!='S')buf[++j]='S';
                        }
                else buf[++j]=nc;
                  }
            }
        else
            {   if(i==0 || (nc=='Y' && i==len-1))buf[++j]=nc;
            else if(!procvwl)
                  {   procvwl=1;
                buf[++j]='A';
                  }
            }
    }
      if(buf[j]!='A')++j;
      buf[j]='\0';
      free(temp);
    return(buf);
}
0
 
korsilaAuthor Commented:
Dear imladris,
That's better
but it doesn't work with 100 names as I suggest be4...

again it let the code down..it returns only "no match found"

but it works with my 10 names testing  huhs

what should we do now..I manage to get the same format of datfile as you suggested,...

0
 
imladrisCommented:
How big is the "100 name" file? I think the simplest path forward will be if it is less than 200K, or if you can make one that is less, and e-mail that to me. E-Mail to imladris@infoserve.net

0
 
korsilaAuthor Commented:
100 name file is just 1K ....
and another one is 3K

I will email you now..

many thanks...

kORSILA
0
 
imladrisCommented:
Curiouser and curiouser,

I received base.dat and match.dat. I renamed base.dat to surnames.dat, and did a compare against "smith", and the program returned 16 matches.

What precisely did you try?

0
 
korsilaAuthor Commented:
Imladris,
I did try again and again
How come it didn't work with me..
I would better have another try...

0
 
korsilaAuthor Commented:
I have teasted it again and it still didn't work
so i am sending you the file test called "a.out"
hope you could test with the file I send tom you..

let me know how it goes..
in the mean time  will find out why it didn't work ..

many thanks,
Korsila
0
 
korsilaAuthor Commented:
Imladris...woooopy..
finally I've just found the mistake...
it was the space after each name in the datafile..
I have deleted the space after the name and then the program works :)
hurrrrr..u 're absolutely right, nothing wrong with the program but the datafile format..!!!!

cheers...

Korsila...

0
 
imladrisCommented:
Excellent! Thanx.
0
 
korsilaAuthor Commented:
do u have any suggestion to ignore the space after each name in datafile..forexample if i got 1000 names and I don't want to delete the space after name..what should I do, could u add a bit codes into the program to ignore the space be4 and after each name in datafile..I have tested the program with a huge datafile but it didn't work again..
so It is annoying to delete the space after each name for more than 1000 times...

any suggestion..if it's hard I would post a new question..but if  it's not could u add a bit of codes in here...

many thanks,

Korsila
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 10
  • 7
Tackle projects and never again get stuck behind a technical roadblock.
Join Now