Solved

improved Simplex algorithm -Imladris

Posted on 2001-06-21
17
315 Views
Last Modified: 2010-05-18
Dear Imladris,

from:
http://www.experts-exchange.com/jsp/qManageQuestion.jsp?ta=cprog&qid=20132118
I would like to improve name matching in simplex codes the one you implemented...
here are my improved requirements:

If AIEOU are not the first letter then change to A if not return AEIOU
-if Y not the first or last character change Y to A
-change PH -->F
-change M--> N
-remove all duplicated letters such as "smitt" should be smat (no another "t") similarly, "smmit" should be "smat" (not smmat) , or smiss shouold be smas

-change TH --> T
-change KN-> N, K-->C
-chnange DG-->G
-change PF-->F
-change RH-->R
-change WR-->R
If Z not the first character change Z-->S

change these suffix :
IX--> IC
EX-->EC
YE-->Y
EE-->Y
IE-->Y
DT-->D
RT-->D
RD-->D
NT-->N
ND-->N
EV-->EF

and that's it cann't think for more :)

You can have a look at  some of similar example codes from:
 http://www.experts-exchange.com/jsp/qManageQuestion.jsp?ta=cprog&qid=20116615
maybe you have an idea...

thanks

Korsila

p.s. hope this time it would work with my 100 names datafile...
 
0
Comment
Question by:korsila
  • 10
  • 7
17 Comments
 
LVL 16

Expert Comment

by:imladris
Comment Utility
I have a preliminary implementation of this. I have made the assumption that "PPPPPPH" should reduce to "F" (do multi P reduction to one P, followed by PH reduction to F (as opposed to the other way around: first PH reduction to F, then multi P reduction which will yield PF)).

Secondly there is a question. If the suffix tests are at the end of the processing (which seemed logical to me) many of them become redundant. For instance, a suffix if "IX" will never be encountered because the I will have already been changed to an "A". Ditto for "EX" and "EV". "YE", "EE" and "IE" all disappear because trailing vowels are already eliminated. So the question is, should these suffix tests be dropped? Or do you want to process the suffixes first?

Similarly if K is transformed into C, the test for KN will never be tripped. Should it be dropped? Or should the KN test be processed "before" the K test.

Lastly, again, for brand new algorithm work, something around 100 points is more appropriate.
0
 

Author Comment

by:korsila
Comment Utility
Imladris,
that's a useful comment
*** is my answer

I have a preliminary implementation of this. I have made the assumption that "PPPPPPH" should reduce
to "F" (do multi P reduction to one P, followed by PH reduction to F

*****do multi P reduction to one P, followed by PH reduction to F

****Change these suffix :
                 X--> C --get rid of E
                 V-->F
and so on...

check or  process the following suffixes first:
                YE-->Y
                 EE-->Y
                 IE-->Y

***Do the KN test be processed "before" the K test..

hope it 's clearer...
many thanks...

korsila
p.s. let see if you can come up with a good codes and  it works with my datfile i would consider more points for you..
0
 
LVL 16

Expert Comment

by:imladris
Comment Utility
I'm not quite clear on this bit:

****Change these suffix :
                X--> C --get rid of E
                V-->F
and so on...

X and V were not in the original suffix list. Do you mean that you want suffix "EX" changed to "C" and suffix "EV" changed to "F" and for them to be processed beforehand like "YE", "EE" and "IE"?

Or should X's be changed to C's in general, along with V's to F's (what does the get rid of E mean in that case)?
0
 

Author Comment

by:korsila
Comment Utility
sorry I didn't make it clearer..

here the suffix which should be changed firstly....

                            change these suffix first :
                            X--> C
                            YE-->Y
                            EE-->Y
                            IE-->Y
                            DT-->D
                            RT-->D
                            RD-->D
                            NT-->N
                            ND-->N
                            V-->F


****Change these suffix :
                                           X--> C --get rid of E
                                           V-->F

means that i GET RID OF E from EX-->EC
AND EV-->EF so as you can see above , just to change X-->C and V-->F

korsila


0
 

Author Comment

by:korsila
Comment Utility
Hope it's a fair point

korsila
0
 
LVL 16

Accepted Solution

by:
imladris earned 100 total points
Comment Utility
Points is fine.

Here is the new simplex program:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>


int FullFlag;
char *Simplex(char *,char *);
void getinput(char *prompt,char n[]);

void main(int argc,char *argv[])
{   int matched,msave,usave,mc;
  char name[150],line[500],buf1[150],buf2[150];
  FILE *namefile,*mfile,*ufile;

  printf("Save matched names in matched.dat (y or n)?\n");
  gets(line);
  msave=(toupper(line[0])=='Y');
  printf("Save unmatched names in unmatched.dat (y or n)?\n");
  gets(line);
  usave=(toupper(line[0])=='Y');
  namefile=fopen("surnames.dat","r+"); //open file in read/write mode
  if(msave)mfile=fopen("matched.dat","w");
  if(usave)ufile=fopen("unmatched.dat","w");
  FullFlag=0;
  do
  {   printf("Please enter name to match:\n");
      gets(name);
        printf("Simplex: %s\n",Simplex(name,buf1));
      fseek(namefile,0L,0); //seek to start of file
      matched=0;
      mc=0;
      while(fgets(line,500,namefile)!=NULL)
      {   line[strlen(line)-1]='\0';
          if(strcmp(Simplex(name,buf1),Simplex(line,buf2))==0)
          {    printf("%s\n",line);
               ++mc;
               if(msave)
               {   if(matched==0)fprintf(mfile,"Match name %s\n",name);
                   fprintf(mfile,"%s\n",line);
               }
               matched=1;
          }
      }
      if(matched==1)
       {   if(msave)fprintf(mfile,"----------------------------------------\n");
           printf("Number of Matches: %d\n",mc);
       }
      else
      {   printf("No Matches Found.\n");
          if(usave)
          {   fprintf(ufile,"Match name %s\n",name);
              fprintf(ufile,"No Matches found\n");
              fprintf(ufile,"--------------------\n");
          }
      }
      printf("Compare again (y or n)?\n");
      gets(line);
  } while(toupper(line[0])=='Y');
  fclose(namefile);
  if(msave)fclose(mfile);
  if(usave)fclose(ufile);
}


// prompt user for input
// get input
// copy it safely into provided variable

void getinput(char *prompt,char n[])
{   char ipc[150];

  printf("%s:\n",prompt);
  gets(ipc);
  strncpy(n,ipc,19);
  n[19]='\0';
  return;
}


char vowel[]="aeiouyAEIOUY";

char *Simplex(char *name,char *buf)
{   int i,j,len,procvwl;
    char nc,*sfx,*temp;

    j=strlen(name);
    temp=(char *)malloc(strlen(name)+1);
    strcpy(temp,name);
    sfx=temp+(j-2);
    if(toupper(*(sfx+1))=='X')*(sfx+1)='C';
    else if(toupper(*(sfx+1))=='V')*(sfx+1)='F';
    else if(stricmp(sfx,"YE")==0)temp[j-1]='\0';
    else if(stricmp(sfx,"EE")==0 || stricmp(sfx,"IE")==0)
      {   temp[j-2]='Y';
        temp[j-1]='\0';
      }
    else if(stricmp(sfx,"DT")==0)temp[j-1]='\0';
    else if(stricmp(sfx,"RT")==0 || stricmp(sfx,"RD")==0)
      {   temp[j-2]='D';
        temp[j-1]='\0';
      }
    else if(stricmp(sfx,"NT")==0 || stricmp(sfx,"ND")==0)temp[j-1]='\0';
    procvwl=0;
      j=-1;
      len=strlen(temp);
    for(i=0; i<len; ++i)
      {   nc=toupper(temp[i]);
        if(strchr(vowel,nc)==NULL)
            {   procvwl=0;
            if(nc!=buf[j])
                  {   if(buf[j]=='P' && nc=='H')buf[j]='F';
                        else if(buf[j]=='T' && nc=='H')buf[j]='T';
                        else if(buf[j]=='K' && nc=='N')buf[j]='N';
                        else if(buf[j]=='D' && nc=='G')buf[j]='G';
                        else if(buf[j]=='P' && nc=='F')buf[j]='F';
                        else if(buf[j]=='R' && nc=='H')buf[j]='R';
                        else if(buf[j]=='W' && nc=='R')buf[j]='R';
                        else if(nc=='K' && toupper(temp[i+1])!='N')
                        {      if(buf[j]!='C')buf[++j]='C';
                        }
                        else if(nc=='M')
                        {   if(buf[j]!='N')buf[++j]='N';
                        }
                        else if(nc=='Z' && i!=0)
                        {      if(buf[j]!='S')buf[++j]='S';
                        }
                else buf[++j]=nc;
                  }
            }
        else
            {   if(i==0 || (nc=='Y' && i==len-1))buf[++j]=nc;
            else if(!procvwl)
                  {   procvwl=1;
                buf[++j]='A';
                  }
            }
    }
      if(buf[j]!='A')++j;
      buf[j]='\0';
      free(temp);
    return(buf);
}


If this doesn't work on the large file, we will have to investigate that. I'm guessing that it is not the size itself that is causing the problem, but something about the structure of the file. The program, for instance, makes no allowances for leading blanks, and things like that.

0
 

Author Comment

by:korsila
Comment Utility
Dear Imladris,

I couldn't be able to run it since after i compiled I got this message :
"Unsatified Symbols: stricmp (code)"

and I got another file from compiling simplex.c which was "simplex.o"

and then when I run it the message appeared telling "cannot execute"

------
do i need to change "stricmp" to "strncmp" or  whatelse I should do..


many thanks,
Korsila

 
0
 
LVL 16

Expert Comment

by:imladris
Comment Utility
OK. Here is a version that does not rely on stricmp:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>


int FullFlag;
char *Simplex(char *,char *);
void getinput(char *prompt,char n[]);

void main(int argc,char *argv[])
{   int matched,msave,usave,mc;
  char name[150],line[500],buf1[150],buf2[150];
  FILE *namefile,*mfile,*ufile;

  printf("Save matched names in matched.dat (y or n)?\n");
  gets(line);
  msave=(toupper(line[0])=='Y');
  printf("Save unmatched names in unmatched.dat (y or n)?\n");
  gets(line);
  usave=(toupper(line[0])=='Y');
  namefile=fopen("surnames.dat","r+"); //open file in read/write mode
  if(msave)mfile=fopen("matched.dat","w");
  if(usave)ufile=fopen("unmatched.dat","w");
  FullFlag=0;
  do
  {   printf("Please enter name to match:\n");
      gets(name);
        printf("Simplex: %s\n",Simplex(name,buf1));
      fseek(namefile,0L,0); //seek to start of file
      matched=0;
      mc=0;
      while(fgets(line,500,namefile)!=NULL)
      {   line[strlen(line)-1]='\0';
          if(strcmp(Simplex(name,buf1),Simplex(line,buf2))==0)
          {    printf("%s\n",line);
               ++mc;
               if(msave)
               {   if(matched==0)fprintf(mfile,"Match name %s\n",name);
                   fprintf(mfile,"%s\n",line);
               }
               matched=1;
          }
      }
      if(matched==1)
       {   if(msave)fprintf(mfile,"----------------------------------------\n");
           printf("Number of Matches: %d\n",mc);
       }
      else
      {   printf("No Matches Found.\n");
          if(usave)
          {   fprintf(ufile,"Match name %s\n",name);
              fprintf(ufile,"No Matches found\n");
              fprintf(ufile,"--------------------\n");
          }
      }
      printf("Compare again (y or n)?\n");
      gets(line);
  } while(toupper(line[0])=='Y');
  fclose(namefile);
  if(msave)fclose(mfile);
  if(usave)fclose(ufile);
}


// prompt user for input
// get input
// copy it safely into provided variable

void getinput(char *prompt,char n[])
{   char ipc[150];

  printf("%s:\n",prompt);
  gets(ipc);
  strncpy(n,ipc,19);
  n[19]='\0';
  return;
}


char vowel[]="aeiouyAEIOUY";

char *Simplex(char *name,char *buf)
{   int i,j,len,procvwl;
    char nc,*sfx,*temp;

    j=strlen(name);
    temp=(char *)malloc(strlen(name)+1);
    strcpy(temp,name);
    sfx=temp+(j-2);
      *sfx=toupper(*sfx);
      *(sfx+1)=toupper(*(sfx+1));
    if(*(sfx+1)=='X')*(sfx+1)='C';
    else if(*(sfx+1)=='V')*(sfx+1)='F';
    else if(strcmp(sfx,"YE")==0)temp[j-1]='\0';
    else if(strcmp(sfx,"EE")==0 || strcmp(sfx,"IE")==0)
      {   temp[j-2]='Y';
        temp[j-1]='\0';
      }
    else if(strcmp(sfx,"DT")==0)temp[j-1]='\0';
    else if(strcmp(sfx,"RT")==0 || strcmp(sfx,"RD")==0)
      {   temp[j-2]='D';
        temp[j-1]='\0';
      }
    else if(strcmp(sfx,"NT")==0 || strcmp(sfx,"ND")==0)temp[j-1]='\0';
    procvwl=0;
      j=-1;
      len=strlen(temp);
    for(i=0; i<len; ++i)
      {   nc=toupper(temp[i]);
        if(strchr(vowel,nc)==NULL)
            {   procvwl=0;
            if(nc!=buf[j])
                  {   if(buf[j]=='P' && nc=='H')buf[j]='F';
                        else if(buf[j]=='T' && nc=='H')buf[j]='T';
                        else if(buf[j]=='K' && nc=='N')buf[j]='N';
                        else if(buf[j]=='D' && nc=='G')buf[j]='G';
                        else if(buf[j]=='P' && nc=='F')buf[j]='F';
                        else if(buf[j]=='R' && nc=='H')buf[j]='R';
                        else if(buf[j]=='W' && nc=='R')buf[j]='R';
                        else if(nc=='K' && toupper(temp[i+1])!='N')
                        {      if(buf[j]!='C')buf[++j]='C';
                        }
                        else if(nc=='M')
                        {   if(buf[j]!='N')buf[++j]='N';
                        }
                        else if(nc=='Z' && i!=0)
                        {      if(buf[j]!='S')buf[++j]='S';
                        }
                else buf[++j]=nc;
                  }
            }
        else
            {   if(i==0 || (nc=='Y' && i==len-1))buf[++j]=nc;
            else if(!procvwl)
                  {   procvwl=1;
                buf[++j]='A';
                  }
            }
    }
      if(buf[j]!='A')++j;
      buf[j]='\0';
      free(temp);
    return(buf);
}
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:korsila
Comment Utility
Dear imladris,
That's better
but it doesn't work with 100 names as I suggest be4...

again it let the code down..it returns only "no match found"

but it works with my 10 names testing  huhs

what should we do now..I manage to get the same format of datfile as you suggested,...

0
 
LVL 16

Expert Comment

by:imladris
Comment Utility
How big is the "100 name" file? I think the simplest path forward will be if it is less than 200K, or if you can make one that is less, and e-mail that to me. E-Mail to imladris@infoserve.net

0
 

Author Comment

by:korsila
Comment Utility
100 name file is just 1K ....
and another one is 3K

I will email you now..

many thanks...

kORSILA
0
 
LVL 16

Expert Comment

by:imladris
Comment Utility
Curiouser and curiouser,

I received base.dat and match.dat. I renamed base.dat to surnames.dat, and did a compare against "smith", and the program returned 16 matches.

What precisely did you try?

0
 

Author Comment

by:korsila
Comment Utility
Imladris,
I did try again and again
How come it didn't work with me..
I would better have another try...

0
 

Author Comment

by:korsila
Comment Utility
I have teasted it again and it still didn't work
so i am sending you the file test called "a.out"
hope you could test with the file I send tom you..

let me know how it goes..
in the mean time  will find out why it didn't work ..

many thanks,
Korsila
0
 

Author Comment

by:korsila
Comment Utility
Imladris...woooopy..
finally I've just found the mistake...
it was the space after each name in the datafile..
I have deleted the space after the name and then the program works :)
hurrrrr..u 're absolutely right, nothing wrong with the program but the datafile format..!!!!

cheers...

Korsila...

0
 
LVL 16

Expert Comment

by:imladris
Comment Utility
Excellent! Thanx.
0
 

Author Comment

by:korsila
Comment Utility
do u have any suggestion to ignore the space after each name in datafile..forexample if i got 1000 names and I don't want to delete the space after name..what should I do, could u add a bit codes into the program to ignore the space be4 and after each name in datafile..I have tested the program with a huge datafile but it didn't work again..
so It is annoying to delete the space after each name for more than 1000 times...

any suggestion..if it's hard I would post a new question..but if  it's not could u add a bit of codes in here...

many thanks,

Korsila
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Suggested Solutions

Have you thought about creating an iPhone application (app), but didn't even know where to get started? Here's how: ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Important pre-programming comments: I’ve never tri…
This is a short and sweet, but (hopefully) to the point article. There seems to be some fundamental misunderstanding about the function prototype for the "main" function in C and C++, more specifically what type this function should return. I see so…
The goal of this video is to provide viewers with basic examples to understand opening and writing to files in the C programming language.
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use for-loops in the C programming language.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now