korsila
asked on
improved Simplex algorithm -Imladris
Dear Imladris,
from:
https://www.experts-exchange.com/jsp/qManageQuestion.jsp?ta=cprog&qid=20132118
I would like to improve name matching in simplex codes the one you implemented...
here are my improved requirements:
If AIEOU are not the first letter then change to A if not return AEIOU
-if Y not the first or last character change Y to A
-change PH -->F
-change M--> N
-remove all duplicated letters such as "smitt" should be smat (no another "t") similarly, "smmit" should be "smat" (not smmat) , or smiss shouold be smas
-change TH --> T
-change KN-> N, K-->C
-chnange DG-->G
-change PF-->F
-change RH-->R
-change WR-->R
If Z not the first character change Z-->S
change these suffix :
IX--> IC
EX-->EC
YE-->Y
EE-->Y
IE-->Y
DT-->D
RT-->D
RD-->D
NT-->N
ND-->N
EV-->EF
and that's it cann't think for more :)
You can have a look at some of similar example codes from:
https://www.experts-exchange.com/jsp/qManageQuestion.jsp?ta=cprog&qid=20116615
maybe you have an idea...
thanks
Korsila
p.s. hope this time it would work with my 100 names datafile...
from:
https://www.experts-exchange.com/jsp/qManageQuestion.jsp?ta=cprog&qid=20132118
I would like to improve name matching in simplex codes the one you implemented...
here are my improved requirements:
If AIEOU are not the first letter then change to A if not return AEIOU
-if Y not the first or last character change Y to A
-change PH -->F
-change M--> N
-remove all duplicated letters such as "smitt" should be smat (no another "t") similarly, "smmit" should be "smat" (not smmat) , or smiss shouold be smas
-change TH --> T
-change KN-> N, K-->C
-chnange DG-->G
-change PF-->F
-change RH-->R
-change WR-->R
If Z not the first character change Z-->S
change these suffix :
IX--> IC
EX-->EC
YE-->Y
EE-->Y
IE-->Y
DT-->D
RT-->D
RD-->D
NT-->N
ND-->N
EV-->EF
and that's it cann't think for more :)
You can have a look at some of similar example codes from:
https://www.experts-exchange.com/jsp/qManageQuestion.jsp?ta=cprog&qid=20116615
maybe you have an idea...
thanks
Korsila
p.s. hope this time it would work with my 100 names datafile...
ASKER
Imladris,
that's a useful comment
*** is my answer
I have a preliminary implementation of this. I have made the assumption that "PPPPPPH" should reduce
to "F" (do multi P reduction to one P, followed by PH reduction to F
*****do multi P reduction to one P, followed by PH reduction to F
****Change these suffix :
X--> C --get rid of E
V-->F
and so on...
check or process the following suffixes first:
YE-->Y
EE-->Y
IE-->Y
***Do the KN test be processed "before" the K test..
hope it 's clearer...
many thanks...
korsila
p.s. let see if you can come up with a good codes and it works with my datfile i would consider more points for you..
that's a useful comment
*** is my answer
I have a preliminary implementation of this. I have made the assumption that "PPPPPPH" should reduce
to "F" (do multi P reduction to one P, followed by PH reduction to F
*****do multi P reduction to one P, followed by PH reduction to F
****Change these suffix :
X--> C --get rid of E
V-->F
and so on...
check or process the following suffixes first:
YE-->Y
EE-->Y
IE-->Y
***Do the KN test be processed "before" the K test..
hope it 's clearer...
many thanks...
korsila
p.s. let see if you can come up with a good codes and it works with my datfile i would consider more points for you..
I'm not quite clear on this bit:
****Change these suffix :
X--> C --get rid of E
V-->F
and so on...
X and V were not in the original suffix list. Do you mean that you want suffix "EX" changed to "C" and suffix "EV" changed to "F" and for them to be processed beforehand like "YE", "EE" and "IE"?
Or should X's be changed to C's in general, along with V's to F's (what does the get rid of E mean in that case)?
****Change these suffix :
X--> C --get rid of E
V-->F
and so on...
X and V were not in the original suffix list. Do you mean that you want suffix "EX" changed to "C" and suffix "EV" changed to "F" and for them to be processed beforehand like "YE", "EE" and "IE"?
Or should X's be changed to C's in general, along with V's to F's (what does the get rid of E mean in that case)?
ASKER
sorry I didn't make it clearer..
here the suffix which should be changed firstly....
change these suffix first :
X--> C
YE-->Y
EE-->Y
IE-->Y
DT-->D
RT-->D
RD-->D
NT-->N
ND-->N
V-->F
****Change these suffix :
X--> C --get rid of E
V-->F
means that i GET RID OF E from EX-->EC
AND EV-->EF so as you can see above , just to change X-->C and V-->F
korsila
here the suffix which should be changed firstly....
change these suffix first :
X--> C
YE-->Y
EE-->Y
IE-->Y
DT-->D
RT-->D
RD-->D
NT-->N
ND-->N
V-->F
****Change these suffix :
X--> C --get rid of E
V-->F
means that i GET RID OF E from EX-->EC
AND EV-->EF so as you can see above , just to change X-->C and V-->F
korsila
ASKER
Hope it's a fair point
korsila
korsila
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Dear Imladris,
I couldn't be able to run it since after i compiled I got this message :
"Unsatified Symbols: stricmp (code)"
and I got another file from compiling simplex.c which was "simplex.o"
and then when I run it the message appeared telling "cannot execute"
------
do i need to change "stricmp" to "strncmp" or whatelse I should do..
many thanks,
Korsila
I couldn't be able to run it since after i compiled I got this message :
"Unsatified Symbols: stricmp (code)"
and I got another file from compiling simplex.c which was "simplex.o"
and then when I run it the message appeared telling "cannot execute"
------
do i need to change "stricmp" to "strncmp" or whatelse I should do..
many thanks,
Korsila
OK. Here is a version that does not rely on stricmp:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
int FullFlag;
char *Simplex(char *,char *);
void getinput(char *prompt,char n[]);
void main(int argc,char *argv[])
{ int matched,msave,usave,mc;
char name[150],line[500],buf1[1 50],buf2[1 50];
FILE *namefile,*mfile,*ufile;
printf("Save matched names in matched.dat (y or n)?\n");
gets(line);
msave=(toupper(line[0])==' Y');
printf("Save unmatched names in unmatched.dat (y or n)?\n");
gets(line);
usave=(toupper(line[0])==' Y');
namefile=fopen("surnames.d at","r+"); //open file in read/write mode
if(msave)mfile=fopen("matc hed.dat"," w");
if(usave)ufile=fopen("unma tched.dat" ,"w");
FullFlag=0;
do
{ printf("Please enter name to match:\n");
gets(name);
printf("Simplex: %s\n",Simplex(name,buf1));
fseek(namefile,0L,0); //seek to start of file
matched=0;
mc=0;
while(fgets(line,500,namef ile)!=NULL )
{ line[strlen(line)-1]='\0';
if(strcmp(Simplex(name,buf 1),Simplex (line,buf2 ))==0)
{ printf("%s\n",line);
++mc;
if(msave)
{ if(matched==0)fprintf(mfil e,"Match name %s\n",name);
fprintf(mfile,"%s\n",line) ;
}
matched=1;
}
}
if(matched==1)
{ if(msave)fprintf(mfile,"-- ---------- ---------- ---------- --------\n ");
printf("Number of Matches: %d\n",mc);
}
else
{ printf("No Matches Found.\n");
if(usave)
{ fprintf(ufile,"Match name %s\n",name);
fprintf(ufile,"No Matches found\n");
fprintf(ufile,"----------- ---------\ n");
}
}
printf("Compare again (y or n)?\n");
gets(line);
} while(toupper(line[0])=='Y ');
fclose(namefile);
if(msave)fclose(mfile);
if(usave)fclose(ufile);
}
// prompt user for input
// get input
// copy it safely into provided variable
void getinput(char *prompt,char n[])
{ char ipc[150];
printf("%s:\n",prompt);
gets(ipc);
strncpy(n,ipc,19);
n[19]='\0';
return;
}
char vowel[]="aeiouyAEIOUY";
char *Simplex(char *name,char *buf)
{ int i,j,len,procvwl;
char nc,*sfx,*temp;
j=strlen(name);
temp=(char *)malloc(strlen(name)+1);
strcpy(temp,name);
sfx=temp+(j-2);
*sfx=toupper(*sfx);
*(sfx+1)=toupper(*(sfx+1)) ;
if(*(sfx+1)=='X')*(sfx+1)= 'C';
else if(*(sfx+1)=='V')*(sfx+1)= 'F';
else if(strcmp(sfx,"YE")==0)tem p[j-1]='\0 ';
else if(strcmp(sfx,"EE")==0 || strcmp(sfx,"IE")==0)
{ temp[j-2]='Y';
temp[j-1]='\0';
}
else if(strcmp(sfx,"DT")==0)tem p[j-1]='\0 ';
else if(strcmp(sfx,"RT")==0 || strcmp(sfx,"RD")==0)
{ temp[j-2]='D';
temp[j-1]='\0';
}
else if(strcmp(sfx,"NT")==0 || strcmp(sfx,"ND")==0)temp[j -1]='\0';
procvwl=0;
j=-1;
len=strlen(temp);
for(i=0; i<len; ++i)
{ nc=toupper(temp[i]);
if(strchr(vowel,nc)==NULL)
{ procvwl=0;
if(nc!=buf[j])
{ if(buf[j]=='P' && nc=='H')buf[j]='F';
else if(buf[j]=='T' && nc=='H')buf[j]='T';
else if(buf[j]=='K' && nc=='N')buf[j]='N';
else if(buf[j]=='D' && nc=='G')buf[j]='G';
else if(buf[j]=='P' && nc=='F')buf[j]='F';
else if(buf[j]=='R' && nc=='H')buf[j]='R';
else if(buf[j]=='W' && nc=='R')buf[j]='R';
else if(nc=='K' && toupper(temp[i+1])!='N')
{ if(buf[j]!='C')buf[++j]='C ';
}
else if(nc=='M')
{ if(buf[j]!='N')buf[++j]='N ';
}
else if(nc=='Z' && i!=0)
{ if(buf[j]!='S')buf[++j]='S ';
}
else buf[++j]=nc;
}
}
else
{ if(i==0 || (nc=='Y' && i==len-1))buf[++j]=nc;
else if(!procvwl)
{ procvwl=1;
buf[++j]='A';
}
}
}
if(buf[j]!='A')++j;
buf[j]='\0';
free(temp);
return(buf);
}
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
int FullFlag;
char *Simplex(char *,char *);
void getinput(char *prompt,char n[]);
void main(int argc,char *argv[])
{ int matched,msave,usave,mc;
char name[150],line[500],buf1[1
FILE *namefile,*mfile,*ufile;
printf("Save matched names in matched.dat (y or n)?\n");
gets(line);
msave=(toupper(line[0])=='
printf("Save unmatched names in unmatched.dat (y or n)?\n");
gets(line);
usave=(toupper(line[0])=='
namefile=fopen("surnames.d
if(msave)mfile=fopen("matc
if(usave)ufile=fopen("unma
FullFlag=0;
do
{ printf("Please enter name to match:\n");
gets(name);
printf("Simplex: %s\n",Simplex(name,buf1));
fseek(namefile,0L,0); //seek to start of file
matched=0;
mc=0;
while(fgets(line,500,namef
{ line[strlen(line)-1]='\0';
if(strcmp(Simplex(name,buf
{ printf("%s\n",line);
++mc;
if(msave)
{ if(matched==0)fprintf(mfil
fprintf(mfile,"%s\n",line)
}
matched=1;
}
}
if(matched==1)
{ if(msave)fprintf(mfile,"--
printf("Number of Matches: %d\n",mc);
}
else
{ printf("No Matches Found.\n");
if(usave)
{ fprintf(ufile,"Match name %s\n",name);
fprintf(ufile,"No Matches found\n");
fprintf(ufile,"-----------
}
}
printf("Compare again (y or n)?\n");
gets(line);
} while(toupper(line[0])=='Y
fclose(namefile);
if(msave)fclose(mfile);
if(usave)fclose(ufile);
}
// prompt user for input
// get input
// copy it safely into provided variable
void getinput(char *prompt,char n[])
{ char ipc[150];
printf("%s:\n",prompt);
gets(ipc);
strncpy(n,ipc,19);
n[19]='\0';
return;
}
char vowel[]="aeiouyAEIOUY";
char *Simplex(char *name,char *buf)
{ int i,j,len,procvwl;
char nc,*sfx,*temp;
j=strlen(name);
temp=(char *)malloc(strlen(name)+1);
strcpy(temp,name);
sfx=temp+(j-2);
*sfx=toupper(*sfx);
*(sfx+1)=toupper(*(sfx+1))
if(*(sfx+1)=='X')*(sfx+1)=
else if(*(sfx+1)=='V')*(sfx+1)=
else if(strcmp(sfx,"YE")==0)tem
else if(strcmp(sfx,"EE")==0 || strcmp(sfx,"IE")==0)
{ temp[j-2]='Y';
temp[j-1]='\0';
}
else if(strcmp(sfx,"DT")==0)tem
else if(strcmp(sfx,"RT")==0 || strcmp(sfx,"RD")==0)
{ temp[j-2]='D';
temp[j-1]='\0';
}
else if(strcmp(sfx,"NT")==0 || strcmp(sfx,"ND")==0)temp[j
procvwl=0;
j=-1;
len=strlen(temp);
for(i=0; i<len; ++i)
{ nc=toupper(temp[i]);
if(strchr(vowel,nc)==NULL)
{ procvwl=0;
if(nc!=buf[j])
{ if(buf[j]=='P' && nc=='H')buf[j]='F';
else if(buf[j]=='T' && nc=='H')buf[j]='T';
else if(buf[j]=='K' && nc=='N')buf[j]='N';
else if(buf[j]=='D' && nc=='G')buf[j]='G';
else if(buf[j]=='P' && nc=='F')buf[j]='F';
else if(buf[j]=='R' && nc=='H')buf[j]='R';
else if(buf[j]=='W' && nc=='R')buf[j]='R';
else if(nc=='K' && toupper(temp[i+1])!='N')
{ if(buf[j]!='C')buf[++j]='C
}
else if(nc=='M')
{ if(buf[j]!='N')buf[++j]='N
}
else if(nc=='Z' && i!=0)
{ if(buf[j]!='S')buf[++j]='S
}
else buf[++j]=nc;
}
}
else
{ if(i==0 || (nc=='Y' && i==len-1))buf[++j]=nc;
else if(!procvwl)
{ procvwl=1;
buf[++j]='A';
}
}
}
if(buf[j]!='A')++j;
buf[j]='\0';
free(temp);
return(buf);
}
ASKER
Dear imladris,
That's better
but it doesn't work with 100 names as I suggest be4...
again it let the code down..it returns only "no match found"
but it works with my 10 names testing huhs
what should we do now..I manage to get the same format of datfile as you suggested,...
That's better
but it doesn't work with 100 names as I suggest be4...
again it let the code down..it returns only "no match found"
but it works with my 10 names testing huhs
what should we do now..I manage to get the same format of datfile as you suggested,...
How big is the "100 name" file? I think the simplest path forward will be if it is less than 200K, or if you can make one that is less, and e-mail that to me. E-Mail to imladris@infoserve.net
ASKER
100 name file is just 1K ....
and another one is 3K
I will email you now..
many thanks...
kORSILA
and another one is 3K
I will email you now..
many thanks...
kORSILA
Curiouser and curiouser,
I received base.dat and match.dat. I renamed base.dat to surnames.dat, and did a compare against "smith", and the program returned 16 matches.
What precisely did you try?
I received base.dat and match.dat. I renamed base.dat to surnames.dat, and did a compare against "smith", and the program returned 16 matches.
What precisely did you try?
ASKER
Imladris,
I did try again and again
How come it didn't work with me..
I would better have another try...
I did try again and again
How come it didn't work with me..
I would better have another try...
ASKER
I have teasted it again and it still didn't work
so i am sending you the file test called "a.out"
hope you could test with the file I send tom you..
let me know how it goes..
in the mean time will find out why it didn't work ..
many thanks,
Korsila
so i am sending you the file test called "a.out"
hope you could test with the file I send tom you..
let me know how it goes..
in the mean time will find out why it didn't work ..
many thanks,
Korsila
ASKER
Imladris...woooopy..
finally I've just found the mistake...
it was the space after each name in the datafile..
I have deleted the space after the name and then the program works :)
hurrrrr..u 're absolutely right, nothing wrong with the program but the datafile format..!!!!
cheers...
Korsila...
finally I've just found the mistake...
it was the space after each name in the datafile..
I have deleted the space after the name and then the program works :)
hurrrrr..u 're absolutely right, nothing wrong with the program but the datafile format..!!!!
cheers...
Korsila...
Excellent! Thanx.
ASKER
do u have any suggestion to ignore the space after each name in datafile..forexample if i got 1000 names and I don't want to delete the space after name..what should I do, could u add a bit codes into the program to ignore the space be4 and after each name in datafile..I have tested the program with a huge datafile but it didn't work again..
so It is annoying to delete the space after each name for more than 1000 times...
any suggestion..if it's hard I would post a new question..but if it's not could u add a bit of codes in here...
many thanks,
Korsila
so It is annoying to delete the space after each name for more than 1000 times...
any suggestion..if it's hard I would post a new question..but if it's not could u add a bit of codes in here...
many thanks,
Korsila
Secondly there is a question. If the suffix tests are at the end of the processing (which seemed logical to me) many of them become redundant. For instance, a suffix if "IX" will never be encountered because the I will have already been changed to an "A". Ditto for "EX" and "EV". "YE", "EE" and "IE" all disappear because trailing vowels are already eliminated. So the question is, should these suffix tests be dropped? Or do you want to process the suffixes first?
Similarly if K is transformed into C, the test for KN will never be tripped. Should it be dropped? Or should the KN test be processed "before" the K test.
Lastly, again, for brand new algorithm work, something around 100 points is more appropriate.