asked on

improved Simplex algorithm -Imladris

Dear Imladris,

from:
https://www.experts-exchange.com/jsp/qManageQuestion.jsp?ta=cprog&qid=20132118
I would like to improve name matching in simplex codes the one you implemented...
here are my improved requirements:

If AIEOU are not the first letter then change to A if not return AEIOU
-if Y not the first or last character change Y to A
-change PH -->F
-change M--> N
-remove all duplicated letters such as "smitt" should be smat (no another "t") similarly, "smmit" should be "smat" (not smmat) , or smiss shouold be smas

-change TH --> T
-change KN-> N, K-->C
-chnange DG-->G
-change PF-->F
-change RH-->R
-change WR-->R
If Z not the first character change Z-->S

change these suffix :
IX--> IC
EX-->EC
YE-->Y
EE-->Y
IE-->Y
DT-->D
RT-->D
RD-->D
NT-->N
ND-->N
EV-->EF

and that's it cann't think for more :)

You can have a look at some of similar example codes from:
https://www.experts-exchange.com/jsp/qManageQuestion.jsp?ta=cprog&qid=20116615
maybe you have an idea...

thanks

Korsila

p.s. hope this time it would work with my 100 names datafile...

imladris

I have a preliminary implementation of this. I have made the assumption that "PPPPPPH" should reduce to "F" (do multi P reduction to one P, followed by PH reduction to F (as opposed to the other way around: first PH reduction to F, then multi P reduction which will yield PF)).

Secondly there is a question. If the suffix tests are at the end of the processing (which seemed logical to me) many of them become redundant. For instance, a suffix if "IX" will never be encountered because the I will have already been changed to an "A". Ditto for "EX" and "EV". "YE", "EE" and "IE" all disappear because trailing vowels are already eliminated. So the question is, should these suffix tests be dropped? Or do you want to process the suffixes first?

Similarly if K is transformed into C, the test for KN will never be tripped. Should it be dropped? Or should the KN test be processed "before" the K test.

Lastly, again, for brand new algorithm work, something around 100 points is more appropriate.

korsila

ASKER

Imladris,
that's a useful comment
*** is my answer

I have a preliminary implementation of this. I have made the assumption that "PPPPPPH" should reduce
to "F" (do multi P reduction to one P, followed by PH reduction to F

*****do multi P reduction to one P, followed by PH reduction to F

****Change these suffix :
X--> C --get rid of E
V-->F
and so on...

check or process the following suffixes first:
YE-->Y
EE-->Y
IE-->Y

***Do the KN test be processed "before" the K test..

hope it 's clearer...
many thanks...

korsila
p.s. let see if you can come up with a good codes and it works with my datfile i would consider more points for you..

imladris

I'm not quite clear on this bit:

****Change these suffix :
X--> C --get rid of E
V-->F
and so on...

X and V were not in the original suffix list. Do you mean that you want suffix "EX" changed to "C" and suffix "EV" changed to "F" and for them to be processed beforehand like "YE", "EE" and "IE"?

Or should X's be changed to C's in general, along with V's to F's (what does the get rid of E mean in that case)?

korsila

ASKER

sorry I didn't make it clearer..

here the suffix which should be changed firstly....

change these suffix first :
X--> C
YE-->Y
EE-->Y
IE-->Y
DT-->D
RT-->D
RD-->D
NT-->N
ND-->N
V-->F

****Change these suffix :
X--> C --get rid of E
V-->F

means that i GET RID OF E from EX-->EC
AND EV-->EF so as you can see above , just to change X-->C and V-->F

korsila

korsila

ASKER

Hope it's a fair point

korsila

ASKER CERTIFIED SOLUTION

imladris

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

korsila

ASKER

Dear Imladris,

I couldn't be able to run it since after i compiled I got this message :
"Unsatified Symbols: stricmp (code)"

and I got another file from compiling simplex.c which was "simplex.o"

and then when I run it the message appeared telling "cannot execute"

------
do i need to change "stricmp" to "strncmp" or whatelse I should do..

many thanks,
Korsila

imladris

OK. Here is a version that does not rely on stricmp:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>

int FullFlag;
char *Simplex(char *,char *);
void getinput(char *prompt,char n[]);

void main(int argc,char *argv[])
{ int matched,msave,usave,mc;
char name[150],line[500],buf1[150],buf2[150];
FILE *namefile,*mfile,*ufile;

printf("Save matched names in matched.dat (y or n)?\n");
gets(line);
msave=(toupper(line[0])=='Y');
printf("Save unmatched names in unmatched.dat (y or n)?\n");
gets(line);
usave=(toupper(line[0])=='Y');
namefile=fopen("surnames.dat","r+"); //open file in read/write mode
if(msave)mfile=fopen("matched.dat","w");
if(usave)ufile=fopen("unmatched.dat","w");
FullFlag=0;
do
{ printf("Please enter name to match:\n");
gets(name);
       printf("Simplex: %s\n",Simplex(name,buf1));
fseek(namefile,0L,0); //seek to start of file
matched=0;
mc=0;
while(fgets(line,500,namefile)!=NULL)
{ line[strlen(line)-1]='\0';
if(strcmp(Simplex(name,buf1),Simplex(line,buf2))==0)
{ printf("%s\n",line);
++mc;
if(msave)
{ if(matched==0)fprintf(mfile,"Match name %s\n",name);
fprintf(mfile,"%s\n",line);
}
matched=1;
}
}
if(matched==1)
{ if(msave)fprintf(mfile,"----------------------------------------\n");
printf("Number of Matches: %d\n",mc);
}
else
{ printf("No Matches Found.\n");
if(usave)
{ fprintf(ufile,"Match name %s\n",name);
fprintf(ufile,"No Matches found\n");
fprintf(ufile,"--------------------\n");
}
}
printf("Compare again (y or n)?\n");
gets(line);
} while(toupper(line[0])=='Y');
fclose(namefile);
if(msave)fclose(mfile);
if(usave)fclose(ufile);
}

// prompt user for input
// get input
// copy it safely into provided variable

void getinput(char *prompt,char n[])
{ char ipc[150];

printf("%s:\n",prompt);
gets(ipc);
strncpy(n,ipc,19);
n[19]='\0';
return;
}

char vowel[]="aeiouyAEIOUY";

char *Simplex(char *name,char *buf)
{ int i,j,len,procvwl;
char nc,*sfx,*temp;

j=strlen(name);
temp=(char *)malloc(strlen(name)+1);
strcpy(temp,name);
sfx=temp+(j-2);
      *sfx=toupper(*sfx);
      *(sfx+1)=toupper(*(sfx+1));
if(*(sfx+1)=='X')*(sfx+1)='C';
else if(*(sfx+1)=='V')*(sfx+1)='F';
else if(strcmp(sfx,"YE")==0)temp[j-1]='\0';
else if(strcmp(sfx,"EE")==0 || strcmp(sfx,"IE")==0)
      { temp[j-2]='Y';
temp[j-1]='\0';
      }
else if(strcmp(sfx,"DT")==0)temp[j-1]='\0';
else if(strcmp(sfx,"RT")==0 || strcmp(sfx,"RD")==0)
      { temp[j-2]='D';
temp[j-1]='\0';
      }
else if(strcmp(sfx,"NT")==0 || strcmp(sfx,"ND")==0)temp[j-1]='\0';
procvwl=0;
      j=-1;
      len=strlen(temp);
for(i=0; i<len; ++i)
      { nc=toupper(temp[i]);
if(strchr(vowel,nc)==NULL)
            { procvwl=0;
if(nc!=buf[j])
                  { if(buf[j]=='P' && nc=='H')buf[j]='F';
                        else if(buf[j]=='T' && nc=='H')buf[j]='T';
                        else if(buf[j]=='K' && nc=='N')buf[j]='N';
                        else if(buf[j]=='D' && nc=='G')buf[j]='G';
                        else if(buf[j]=='P' && nc=='F')buf[j]='F';
                        else if(buf[j]=='R' && nc=='H')buf[j]='R';
                        else if(buf[j]=='W' && nc=='R')buf[j]='R';
                        else if(nc=='K' && toupper(temp[i+1])!='N')
                        {      if(buf[j]!='C')buf[++j]='C';
                        }
                        else if(nc=='M')
                        { if(buf[j]!='N')buf[++j]='N';
                        }
                        else if(nc=='Z' && i!=0)
                        {      if(buf[j]!='S')buf[++j]='S';
                        }
else buf[++j]=nc;
                  }
            }
else
            { if(i==0 || (nc=='Y' && i==len-1))buf[++j]=nc;
else if(!procvwl)
                  { procvwl=1;
buf[++j]='A';
                  }
            }
}
      if(buf[j]!='A')++j;
      buf[j]='\0';
      free(temp);
return(buf);
}

korsila

ASKER

Dear imladris,
That's better
but it doesn't work with 100 names as I suggest be4...

again it let the code down..it returns only "no match found"

but it works with my 10 names testing huhs

what should we do now..I manage to get the same format of datfile as you suggested,...

imladris

How big is the "100 name" file? I think the simplest path forward will be if it is less than 200K, or if you can make one that is less, and e-mail that to me. E-Mail to imladris@infoserve.net

korsila

ASKER

100 name file is just 1K ....
and another one is 3K

I will email you now..

many thanks...

kORSILA

imladris

Curiouser and curiouser,

I received base.dat and match.dat. I renamed base.dat to surnames.dat, and did a compare against "smith", and the program returned 16 matches.

What precisely did you try?

korsila

ASKER

Imladris,
I did try again and again
How come it didn't work with me..
I would better have another try...

korsila

ASKER

I have teasted it again and it still didn't work
so i am sending you the file test called "a.out"
hope you could test with the file I send tom you..

let me know how it goes..
in the mean time will find out why it didn't work ..

many thanks,
Korsila

korsila

ASKER

Imladris...woooopy..
finally I've just found the mistake...
it was the space after each name in the datafile..
I have deleted the space after the name and then the program works :)
hurrrrr..u 're absolutely right, nothing wrong with the program but the datafile format..!!!!

cheers...

Korsila...

imladris

Excellent! Thanx.

korsila

ASKER

do u have any suggestion to ignore the space after each name in datafile..forexample if i got 1000 names and I don't want to delete the space after name..what should I do, could u add a bit codes into the program to ignore the space be4 and after each name in datafile..I have tested the program with a huge datafile but it didn't work again..
so It is annoying to delete the space after each name for more than 1000 times...

any suggestion..if it's hard I would post a new question..but if it's not could u add a bit of codes in here...

many thanks,

Korsila