Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

A Parser for C Language

Posted on 1999-08-02
5
Medium Priority
?
324 Views
Last Modified: 2010-04-15
I am in the process of developing a parser for the C language along with some of my own customization . I want to know if i could get some readymade parsr available so that the customization could be done.

Thanks in advance
Karthik
0
Comment
Question by:karthikr
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
5 Comments
 
LVL 2

Accepted Solution

by:
yairy earned 40 total points
ID: 1264152
runs well under BC5.00


class Porter {
private:
  // data member for stop words

  string Clean(string);
  int hasSuffix(string, string, string&);
  int vowel(char , char);
  int measure(string);
  int containsVowel(string);
  int cvc(string);
  string step1(string);
  string step2(string);
  string step3(string);
  string step4(string);
  string step5(string);
  string stripPrefixes (string);
  string stripSuffixes(string);

  // stop word check
  bool isStopWord(string);

public:
  // constructor initializes the stop word list
  Porter();
  string stripAffixes( string);
};



Porter::Porter() {

}

string Porter::Clean(string str) {
  int last = str.length();
  string temp = "";
 
  for ( int i=0; i < last; i++ ) {
    if ( isalnum(str[i]))
      temp += str[i];
  }
 
  return temp;
} //clean


int Porter::hasSuffix(string word, string suffix, string& stem) {

  string tmp = "";

  if (word.length() <= suffix.length())
        return 0;
  if (suffix.length() > 1)
    if (word[word.length()-2] != suffix[suffix.length()-2])
      return 0;
 
  stem = "";
 
  for ( int i=0; i<word.length()-suffix.length(); i++)
    stem += word[i];
     tmp = stem;
     
     for ( int i=0; i<suffix.length(); i++ )
       tmp += suffix[i];

     if ( tmp.compare(word) == 0)
       return 1;
     else
       return 0;
}

int Porter::vowel( char ch, char prev ) {
    switch ( ch ) {
    case 'a':
    case 'e':
    case 'i':
    case 'o':
    case 'u':
                  return 1;
    case 'y': {

      switch ( prev ) {
    case 'a':
    case 'e':
    case 'i':
    case 'o':
    case 'u':
                   return 0;
    default:
      return 1;
    }
  }
  default :
    return 0;
  }
}

int Porter::measure(string stem) {

    int i=0;
    int count = 0;
    int length = stem.length();

    while ( i < length ) {
       for ( ; i < length ; i++ ) {
           if ( i > 0 ) {
              if ( vowel(stem[i],stem[i-1]) )
                 break;
           }
           else {  
              if ( vowel(stem[i],'a') )
                break;
           }
       }

       for ( i++ ; i < length ; i++ ) {
           if ( i > 0 ) {
              if ( !vowel(stem[i],stem[i-1]) )
                  break;
              }
           else {  
              if ( !vowel(stem[i],'?') )
                 break;
           }
       }
      if ( i < length ) {
         count++;
         i++;
      }
    } //while
   
    return count;
  }

int Porter::containsVowel(string word) {
   
    for (int i=0 ; i < word.length(); i++ )
      if ( i > 0 ) {
      if ( vowel(word[i],word[i-1]) )
        return 1;
         }
      else {  
      if ( vowel(word[0],'a') )
        return 1;
      }

    return 0;
  }


int Porter::cvc( string str ) {
  int length=str.length();
 
  if ( length < 3 )
    return 0;
 
  if ( (!vowel(str[length-1],str[length-2]) )
       && (str[length-1] != 'w') && (str[length-1] != 'x') && (str[length-1] != 'y')
       && (vowel(str[length-2],str[length-3])) ) {
   
    if (length == 3) {
      if (!vowel(str[0],'?'))
      return 1;
      else
      return 0;
    }
    else {
      if (!vowel(str[length-3],str[length-4]) )
      return 1;
      else
      return 0;
    }
  }  
 
  return 0;
}

string Porter::step1(string str) {

  string stem;

  if ( str[str.length()-1] == 's' ) {
    if ( (hasSuffix( str, "sses", stem )) || (hasSuffix( str, "ies", stem)) ){
      string tmp = "";
      for (int i=0; i<(str.length()-2); i++)
      tmp += str[i];
      str = tmp;
    }
    else {
        if ( ( str.length() == 1 ) && ( str[str.length()-1] == 's' ) ) {
          str = "";
          return str;
        }
        if ( str[str.length()-2] != 's' ) {
          string tmp = "";
          for (int i=0; i<str.length()-1; i++)
            tmp += str[i];
          str = tmp;
        }
    }  
  }
 
     if ( hasSuffix( str,"eed",stem ) ) {
       if ( measure( stem ) > 0 ) {
       string tmp = "";
       for (int i=0; i<str.length()-1; i++)
         tmp += str[i];
       str = tmp;
       }
     }
     else {  
       if (  (hasSuffix( str,"ed",stem )) || (hasSuffix( str,"ing",stem )) ) {
       if (containsVowel( stem ))  {
        
         string tmp = "";
         for ( int i = 0; i < stem.length(); i++)
                  tmp += str[i];
              str = tmp;
              if ( str.length() == 1 )
                 return str;
            
              if ( ( hasSuffix( str,"at",stem) ) || ( hasSuffix( str,"bl",stem ) ) || ( hasSuffix( str,"iz",stem) ) ) {
            str += "e";

              }
              else {  
            int length = str.length();
            if ( (str[length-1] == str[length-2])
                 && (str[length-1] != 'l') && (str[length-1] != 's') && (str[length-1] != 'z') ) {
             
                    tmp = "";
                    for (int i=0; i<str.length()-1; i++)
                  tmp += str[i];
                    str = tmp;
            }
                 else
               if ( measure( str ) == 1 ) {
                       if ( cvc(str) )
                   str += "e";
               }
              }
       }
       }
     }
     
     if ( hasSuffix(str,"y",stem) )
       if ( containsVowel( stem) ) {
       string tmp = "";
       for (int i=0; i<str.length()-1; i++ )
               tmp += str[i];
       str = tmp + "i";
       }
     return str;
  }



string Porter::step2( string str ) {

     string suffixes[22][2] = { { "ational", "ate" },
                                    { "tional",  "tion" },
                                    { "enci",    "ence" },
                                    { "anci",    "ance" },
                                    { "izer",    "ize" },
                                    { "iser",    "ize" },
                                    { "abli",    "able" },
                                    { "alli",    "al" },
                                    { "entli",   "ent" },
                                    { "eli",     "e" },
                                    { "ousli",   "ous" },
                                    { "ization", "ize" },
                                    { "isation", "ize" },
                                    { "ation",   "ate" },
                                    { "ator",    "ate" },
                                    { "alism",   "al" },
                                    { "iveness", "ive" },
                                    { "fulness", "ful" },
                                    { "ousness", "ous" },
                                    { "aliti",   "al" },
                                    { "iviti",   "ive" },
                                    { "biliti",  "ble" }};
     string stem;

     for ( int index = 0 ; index < 22; index++ ) {
         if ( hasSuffix ( str, suffixes[index][0], stem ) ) {
            if ( measure ( stem ) > 0 ) {
               str = stem + suffixes[index][1];
               return str;
            }
         }
     }

     return str;
  }




string Porter::step3( string str ) {

        string suffixes[8][2] = { { "icate", "ic" },
                                       { "ative", "" },
                                       { "alize", "al" },
                                       { "alise", "al" },
                                       { "iciti", "ic" },
                                       { "ical",  "ic" },
                                       { "ful",   "" },
                                       { "ness",  "" }};
        string stem;

        for ( int index = 0 ; index<8; index++ ) {
            if ( hasSuffix ( str, suffixes[index][0], stem ))
               if ( measure ( stem ) > 0 ) {
                  str = stem + (suffixes[index][1]);
                  return str;
               }
        }
        return str;
  }



string Porter::step4( string str ) {

     string suffixes[21] = { "al", "ance", "ence", "er", "ic", "able", "ible", "ant", "ement", "ment", "ent", "sion", "tion", "ou", "ism", "ate", "iti", "ous", "ive", "ize", "ise"};

     string stem;
       
     for ( int index = 0 ; index<21; index++ ) {
         if ( hasSuffix ( str, suffixes[index], stem ) ) {
           
            if ( measure ( stem ) > 1 ) {
               str = stem;
               return str;
            }
         }
     }
     return str;
  }



string Porter::step5( string str ) {

    if ( str[str.length()-1] == 'e' ) {
      if ( measure(str) > 1 ) {/* measure(str)==measure(stem) if ends in vowel */
      string tmp = "";
      for ( int i=0; i<str.length()-1; i++ )
        tmp += str[i];
      str = tmp;
      }
      else
      if ( measure(str) == 1 ) {
        string stem = "";
        for ( int i=0; i<str.length()-1; i++ )
          stem += str[i];

        if ( !cvc(stem) )
          str = stem;
      }
    }
   
    if (str.length() == 1)
      return str;
    if ( (str[str.length()-1] == 'l') && (str[str.length()-2] == 'l') && (measure(str) > 1) )
      if ( measure(str) > 1 ) {/* measure(str)==measure(stem) if ends in vowel */
      string tmp = "";
           for ( int i=0; i<(str.length()-1); i++ )
               tmp += str[i];
           str = tmp;
        }
     return str;
  }
 

string Porter::stripPrefixes ( string str) {

    string prefixes[9] = { "kilo", "micro", "milli", "intra", "ultra", "mega", "nano", "pico", "pseudo"};
    int pos;
    int last = 9;
    for ( int i=0 ; i<last; i++ ) {
      pos = str.find(prefixes[i]);
      if (pos == 0) {
      string temp = "";
      for ( int j=0 ; j<(str.length()-prefixes[i].length()); j++ )
        temp += str[j+ (prefixes[i].length()) ];
      return temp;
      }
    }
   
    return str;
  }
 
 

string Porter::stripSuffixes(string str) {
    str = step1( str );
    if ( str.length() >= 1 )
      str = step2( str );
    if ( str.length() >= 1 )
      str = step3( str );
    if ( str.length() >= 1 )
      str = step4( str );
    if ( str.length() >= 1 )
      str = step5( str );
    return str;
  }


string Porter::stripAffixes( string str ) {


for(int i=0; i<str.length(); ++i)
      str[i] = tolower(str[i]);

    str = Clean(str);

    if (( str != "" ) && (str.length() > 2)) {
      str = stripPrefixes(str);
     
      if (str != "" )
      str = stripSuffixes(str);
    }  
   
    return str;
} //stripAffixes


0
 
LVL 8

Expert Comment

by:shlomoy
ID: 1264153
you can use the good-old well known YACC (also the lex tool to complement the work) in order to build your own parser.

yacc - yet another compiler-compiler

from the unix man pages:
The yacc command converts a context-free grammar into a  set of  tables  for  a simple automaton that executes an LALR(1) parsing algorithm.  The grammar may be ambiguous;  specified precedence rules are used to break ambiguities.
0
 
LVL 8

Expert Comment

by:shlomoy
ID: 1264154
BISON is also a nice tool.

0
 
LVL 2

Expert Comment

by:curri
ID: 1264155
There are free, lex & Yacc specifications for C. you can find them at:

http://www.lysator.liu.se/c/ANSI-C-grammar-y.html   (yacc)
http://www.lysator.liu.se/c/ANSI-C-grammar-l.html    (lex)

And use lex/yacc to generate your parser.

BTW Bison is essentially GNU's version of Yacc (with some improvements, I think). And I think flex is GNU's version of lex. And both are free !!

0
 
LVL 8

Expert Comment

by:shlomoy
ID: 1264156
yep.
curri supports my case. :-)
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Have you thought about creating an iPhone application (app), but didn't even know where to get started? Here's how: ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Important pre-programming comments: I’ve never tri…
An Outlet in Cocoa is a persistent reference to a GUI control; it connects a property (a variable) to a control.  For example, it is common to create an Outlet for the text field GUI control and change the text that appears in this field via that Ou…
The goal of this video is to provide viewers with basic examples to understand and use pointers in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use switch statements in the C programming language.

670 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question