Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

improved Simplex algorithm -Imladris

Posted on 2001-06-21
17
Medium Priority
?
326 Views
Last Modified: 2010-05-18
Dear Imladris,

from:
http://www.experts-exchange.com/jsp/qManageQuestion.jsp?ta=cprog&qid=20132118
I would like to improve name matching in simplex codes the one you implemented...
here are my improved requirements:

If AIEOU are not the first letter then change to A if not return AEIOU
-if Y not the first or last character change Y to A
-change PH -->F
-change M--> N
-remove all duplicated letters such as "smitt" should be smat (no another "t") similarly, "smmit" should be "smat" (not smmat) , or smiss shouold be smas

-change TH --> T
-change KN-> N, K-->C
-chnange DG-->G
-change PF-->F
-change RH-->R
-change WR-->R
If Z not the first character change Z-->S

change these suffix :
IX--> IC
EX-->EC
YE-->Y
EE-->Y
IE-->Y
DT-->D
RT-->D
RD-->D
NT-->N
ND-->N
EV-->EF

and that's it cann't think for more :)

You can have a look at  some of similar example codes from:
 http://www.experts-exchange.com/jsp/qManageQuestion.jsp?ta=cprog&qid=20116615
maybe you have an idea...

thanks

Korsila

p.s. hope this time it would work with my 100 names datafile...
 
0
Comment
Question by:korsila
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 10
  • 7
17 Comments
 
LVL 16

Expert Comment

by:imladris
ID: 6216364
I have a preliminary implementation of this. I have made the assumption that "PPPPPPH" should reduce to "F" (do multi P reduction to one P, followed by PH reduction to F (as opposed to the other way around: first PH reduction to F, then multi P reduction which will yield PF)).

Secondly there is a question. If the suffix tests are at the end of the processing (which seemed logical to me) many of them become redundant. For instance, a suffix if "IX" will never be encountered because the I will have already been changed to an "A". Ditto for "EX" and "EV". "YE", "EE" and "IE" all disappear because trailing vowels are already eliminated. So the question is, should these suffix tests be dropped? Or do you want to process the suffixes first?

Similarly if K is transformed into C, the test for KN will never be tripped. Should it be dropped? Or should the KN test be processed "before" the K test.

Lastly, again, for brand new algorithm work, something around 100 points is more appropriate.
0
 

Author Comment

by:korsila
ID: 6218777
Imladris,
that's a useful comment
*** is my answer

I have a preliminary implementation of this. I have made the assumption that "PPPPPPH" should reduce
to "F" (do multi P reduction to one P, followed by PH reduction to F

*****do multi P reduction to one P, followed by PH reduction to F

****Change these suffix :
                 X--> C --get rid of E
                 V-->F
and so on...

check or  process the following suffixes first:
                YE-->Y
                 EE-->Y
                 IE-->Y

***Do the KN test be processed "before" the K test..

hope it 's clearer...
many thanks...

korsila
p.s. let see if you can come up with a good codes and  it works with my datfile i would consider more points for you..
0
 
LVL 16

Expert Comment

by:imladris
ID: 6219108
I'm not quite clear on this bit:

****Change these suffix :
                X--> C --get rid of E
                V-->F
and so on...

X and V were not in the original suffix list. Do you mean that you want suffix "EX" changed to "C" and suffix "EV" changed to "F" and for them to be processed beforehand like "YE", "EE" and "IE"?

Or should X's be changed to C's in general, along with V's to F's (what does the get rid of E mean in that case)?
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:korsila
ID: 6222565
sorry I didn't make it clearer..

here the suffix which should be changed firstly....

                            change these suffix first :
                            X--> C
                            YE-->Y
                            EE-->Y
                            IE-->Y
                            DT-->D
                            RT-->D
                            RD-->D
                            NT-->N
                            ND-->N
                            V-->F


****Change these suffix :
                                           X--> C --get rid of E
                                           V-->F

means that i GET RID OF E from EX-->EC
AND EV-->EF so as you can see above , just to change X-->C and V-->F

korsila


0
 

Author Comment

by:korsila
ID: 6227079
Hope it's a fair point

korsila
0
 
LVL 16

Accepted Solution

by:
imladris earned 400 total points
ID: 6239689
Points is fine.

Here is the new simplex program:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>


int FullFlag;
char *Simplex(char *,char *);
void getinput(char *prompt,char n[]);

void main(int argc,char *argv[])
{   int matched,msave,usave,mc;
  char name[150],line[500],buf1[150],buf2[150];
  FILE *namefile,*mfile,*ufile;

  printf("Save matched names in matched.dat (y or n)?\n");
  gets(line);
  msave=(toupper(line[0])=='Y');
  printf("Save unmatched names in unmatched.dat (y or n)?\n");
  gets(line);
  usave=(toupper(line[0])=='Y');
  namefile=fopen("surnames.dat","r+"); //open file in read/write mode
  if(msave)mfile=fopen("matched.dat","w");
  if(usave)ufile=fopen("unmatched.dat","w");
  FullFlag=0;
  do
  {   printf("Please enter name to match:\n");
      gets(name);
        printf("Simplex: %s\n",Simplex(name,buf1));
      fseek(namefile,0L,0); //seek to start of file
      matched=0;
      mc=0;
      while(fgets(line,500,namefile)!=NULL)
      {   line[strlen(line)-1]='\0';
          if(strcmp(Simplex(name,buf1),Simplex(line,buf2))==0)
          {    printf("%s\n",line);
               ++mc;
               if(msave)
               {   if(matched==0)fprintf(mfile,"Match name %s\n",name);
                   fprintf(mfile,"%s\n",line);
               }
               matched=1;
          }
      }
      if(matched==1)
       {   if(msave)fprintf(mfile,"----------------------------------------\n");
           printf("Number of Matches: %d\n",mc);
       }
      else
      {   printf("No Matches Found.\n");
          if(usave)
          {   fprintf(ufile,"Match name %s\n",name);
              fprintf(ufile,"No Matches found\n");
              fprintf(ufile,"--------------------\n");
          }
      }
      printf("Compare again (y or n)?\n");
      gets(line);
  } while(toupper(line[0])=='Y');
  fclose(namefile);
  if(msave)fclose(mfile);
  if(usave)fclose(ufile);
}


// prompt user for input
// get input
// copy it safely into provided variable

void getinput(char *prompt,char n[])
{   char ipc[150];

  printf("%s:\n",prompt);
  gets(ipc);
  strncpy(n,ipc,19);
  n[19]='\0';
  return;
}


char vowel[]="aeiouyAEIOUY";

char *Simplex(char *name,char *buf)
{   int i,j,len,procvwl;
    char nc,*sfx,*temp;

    j=strlen(name);
    temp=(char *)malloc(strlen(name)+1);
    strcpy(temp,name);
    sfx=temp+(j-2);
    if(toupper(*(sfx+1))=='X')*(sfx+1)='C';
    else if(toupper(*(sfx+1))=='V')*(sfx+1)='F';
    else if(stricmp(sfx,"YE")==0)temp[j-1]='\0';
    else if(stricmp(sfx,"EE")==0 || stricmp(sfx,"IE")==0)
      {   temp[j-2]='Y';
        temp[j-1]='\0';
      }
    else if(stricmp(sfx,"DT")==0)temp[j-1]='\0';
    else if(stricmp(sfx,"RT")==0 || stricmp(sfx,"RD")==0)
      {   temp[j-2]='D';
        temp[j-1]='\0';
      }
    else if(stricmp(sfx,"NT")==0 || stricmp(sfx,"ND")==0)temp[j-1]='\0';
    procvwl=0;
      j=-1;
      len=strlen(temp);
    for(i=0; i<len; ++i)
      {   nc=toupper(temp[i]);
        if(strchr(vowel,nc)==NULL)
            {   procvwl=0;
            if(nc!=buf[j])
                  {   if(buf[j]=='P' && nc=='H')buf[j]='F';
                        else if(buf[j]=='T' && nc=='H')buf[j]='T';
                        else if(buf[j]=='K' && nc=='N')buf[j]='N';
                        else if(buf[j]=='D' && nc=='G')buf[j]='G';
                        else if(buf[j]=='P' && nc=='F')buf[j]='F';
                        else if(buf[j]=='R' && nc=='H')buf[j]='R';
                        else if(buf[j]=='W' && nc=='R')buf[j]='R';
                        else if(nc=='K' && toupper(temp[i+1])!='N')
                        {      if(buf[j]!='C')buf[++j]='C';
                        }
                        else if(nc=='M')
                        {   if(buf[j]!='N')buf[++j]='N';
                        }
                        else if(nc=='Z' && i!=0)
                        {      if(buf[j]!='S')buf[++j]='S';
                        }
                else buf[++j]=nc;
                  }
            }
        else
            {   if(i==0 || (nc=='Y' && i==len-1))buf[++j]=nc;
            else if(!procvwl)
                  {   procvwl=1;
                buf[++j]='A';
                  }
            }
    }
      if(buf[j]!='A')++j;
      buf[j]='\0';
      free(temp);
    return(buf);
}


If this doesn't work on the large file, we will have to investigate that. I'm guessing that it is not the size itself that is causing the problem, but something about the structure of the file. The program, for instance, makes no allowances for leading blanks, and things like that.

0
 

Author Comment

by:korsila
ID: 6242429
Dear Imladris,

I couldn't be able to run it since after i compiled I got this message :
"Unsatified Symbols: stricmp (code)"

and I got another file from compiling simplex.c which was "simplex.o"

and then when I run it the message appeared telling "cannot execute"

------
do i need to change "stricmp" to "strncmp" or  whatelse I should do..


many thanks,
Korsila

 
0
 
LVL 16

Expert Comment

by:imladris
ID: 6249585
OK. Here is a version that does not rely on stricmp:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>


int FullFlag;
char *Simplex(char *,char *);
void getinput(char *prompt,char n[]);

void main(int argc,char *argv[])
{   int matched,msave,usave,mc;
  char name[150],line[500],buf1[150],buf2[150];
  FILE *namefile,*mfile,*ufile;

  printf("Save matched names in matched.dat (y or n)?\n");
  gets(line);
  msave=(toupper(line[0])=='Y');
  printf("Save unmatched names in unmatched.dat (y or n)?\n");
  gets(line);
  usave=(toupper(line[0])=='Y');
  namefile=fopen("surnames.dat","r+"); //open file in read/write mode
  if(msave)mfile=fopen("matched.dat","w");
  if(usave)ufile=fopen("unmatched.dat","w");
  FullFlag=0;
  do
  {   printf("Please enter name to match:\n");
      gets(name);
        printf("Simplex: %s\n",Simplex(name,buf1));
      fseek(namefile,0L,0); //seek to start of file
      matched=0;
      mc=0;
      while(fgets(line,500,namefile)!=NULL)
      {   line[strlen(line)-1]='\0';
          if(strcmp(Simplex(name,buf1),Simplex(line,buf2))==0)
          {    printf("%s\n",line);
               ++mc;
               if(msave)
               {   if(matched==0)fprintf(mfile,"Match name %s\n",name);
                   fprintf(mfile,"%s\n",line);
               }
               matched=1;
          }
      }
      if(matched==1)
       {   if(msave)fprintf(mfile,"----------------------------------------\n");
           printf("Number of Matches: %d\n",mc);
       }
      else
      {   printf("No Matches Found.\n");
          if(usave)
          {   fprintf(ufile,"Match name %s\n",name);
              fprintf(ufile,"No Matches found\n");
              fprintf(ufile,"--------------------\n");
          }
      }
      printf("Compare again (y or n)?\n");
      gets(line);
  } while(toupper(line[0])=='Y');
  fclose(namefile);
  if(msave)fclose(mfile);
  if(usave)fclose(ufile);
}


// prompt user for input
// get input
// copy it safely into provided variable

void getinput(char *prompt,char n[])
{   char ipc[150];

  printf("%s:\n",prompt);
  gets(ipc);
  strncpy(n,ipc,19);
  n[19]='\0';
  return;
}


char vowel[]="aeiouyAEIOUY";

char *Simplex(char *name,char *buf)
{   int i,j,len,procvwl;
    char nc,*sfx,*temp;

    j=strlen(name);
    temp=(char *)malloc(strlen(name)+1);
    strcpy(temp,name);
    sfx=temp+(j-2);
      *sfx=toupper(*sfx);
      *(sfx+1)=toupper(*(sfx+1));
    if(*(sfx+1)=='X')*(sfx+1)='C';
    else if(*(sfx+1)=='V')*(sfx+1)='F';
    else if(strcmp(sfx,"YE")==0)temp[j-1]='\0';
    else if(strcmp(sfx,"EE")==0 || strcmp(sfx,"IE")==0)
      {   temp[j-2]='Y';
        temp[j-1]='\0';
      }
    else if(strcmp(sfx,"DT")==0)temp[j-1]='\0';
    else if(strcmp(sfx,"RT")==0 || strcmp(sfx,"RD")==0)
      {   temp[j-2]='D';
        temp[j-1]='\0';
      }
    else if(strcmp(sfx,"NT")==0 || strcmp(sfx,"ND")==0)temp[j-1]='\0';
    procvwl=0;
      j=-1;
      len=strlen(temp);
    for(i=0; i<len; ++i)
      {   nc=toupper(temp[i]);
        if(strchr(vowel,nc)==NULL)
            {   procvwl=0;
            if(nc!=buf[j])
                  {   if(buf[j]=='P' && nc=='H')buf[j]='F';
                        else if(buf[j]=='T' && nc=='H')buf[j]='T';
                        else if(buf[j]=='K' && nc=='N')buf[j]='N';
                        else if(buf[j]=='D' && nc=='G')buf[j]='G';
                        else if(buf[j]=='P' && nc=='F')buf[j]='F';
                        else if(buf[j]=='R' && nc=='H')buf[j]='R';
                        else if(buf[j]=='W' && nc=='R')buf[j]='R';
                        else if(nc=='K' && toupper(temp[i+1])!='N')
                        {      if(buf[j]!='C')buf[++j]='C';
                        }
                        else if(nc=='M')
                        {   if(buf[j]!='N')buf[++j]='N';
                        }
                        else if(nc=='Z' && i!=0)
                        {      if(buf[j]!='S')buf[++j]='S';
                        }
                else buf[++j]=nc;
                  }
            }
        else
            {   if(i==0 || (nc=='Y' && i==len-1))buf[++j]=nc;
            else if(!procvwl)
                  {   procvwl=1;
                buf[++j]='A';
                  }
            }
    }
      if(buf[j]!='A')++j;
      buf[j]='\0';
      free(temp);
    return(buf);
}
0
 

Author Comment

by:korsila
ID: 6249750
Dear imladris,
That's better
but it doesn't work with 100 names as I suggest be4...

again it let the code down..it returns only "no match found"

but it works with my 10 names testing  huhs

what should we do now..I manage to get the same format of datfile as you suggested,...

0
 
LVL 16

Expert Comment

by:imladris
ID: 6250436
How big is the "100 name" file? I think the simplest path forward will be if it is less than 200K, or if you can make one that is less, and e-mail that to me. E-Mail to imladris@infoserve.net

0
 

Author Comment

by:korsila
ID: 6252083
100 name file is just 1K ....
and another one is 3K

I will email you now..

many thanks...

kORSILA
0
 
LVL 16

Expert Comment

by:imladris
ID: 6256721
Curiouser and curiouser,

I received base.dat and match.dat. I renamed base.dat to surnames.dat, and did a compare against "smith", and the program returned 16 matches.

What precisely did you try?

0
 

Author Comment

by:korsila
ID: 6259669
Imladris,
I did try again and again
How come it didn't work with me..
I would better have another try...

0
 

Author Comment

by:korsila
ID: 6259742
I have teasted it again and it still didn't work
so i am sending you the file test called "a.out"
hope you could test with the file I send tom you..

let me know how it goes..
in the mean time  will find out why it didn't work ..

many thanks,
Korsila
0
 

Author Comment

by:korsila
ID: 6259817
Imladris...woooopy..
finally I've just found the mistake...
it was the space after each name in the datafile..
I have deleted the space after the name and then the program works :)
hurrrrr..u 're absolutely right, nothing wrong with the program but the datafile format..!!!!

cheers...

Korsila...

0
 
LVL 16

Expert Comment

by:imladris
ID: 6260009
Excellent! Thanx.
0
 

Author Comment

by:korsila
ID: 6261830
do u have any suggestion to ignore the space after each name in datafile..forexample if i got 1000 names and I don't want to delete the space after name..what should I do, could u add a bit codes into the program to ignore the space be4 and after each name in datafile..I have tested the program with a huge datafile but it didn't work again..
so It is annoying to delete the space after each name for more than 1000 times...

any suggestion..if it's hard I would post a new question..but if  it's not could u add a bit of codes in here...

many thanks,

Korsila
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

An Outlet in Cocoa is a persistent reference to a GUI control; it connects a property (a variable) to a control.  For example, it is common to create an Outlet for the text field GUI control and change the text that appears in this field via that Ou…
This tutorial is posted by Aaron Wojnowski, administrator at SDKExpert.net.  To view more iPhone tutorials, visit www.sdkexpert.net. This is a very simple tutorial on finding the user's current location easily. In this tutorial, you will learn ho…
The goal of this video is to provide viewers with basic examples to understand and use structures in the C programming language.
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use nested-loops in the C programming language.

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question