Link to home
Start Free TrialLog in
Avatar of kuntilanak
kuntilanakFlag for United States of America

asked on

parsing errors in yacc

Say I have the following code (attached in the code snippet), I did not include everything but it's not all needed:

I want to track errors whenever a 'dcl' doesn't end with a semicolon(';") and I want to continue to parse the next rule, but the code did not work and failed. As an example : If I give the input:

int abc
int ab;

I want this code to report that there was a syntax error in line 1, because the dcl doesn't end with a semi colon and after that it continues to parse the next line, which is int ab; and that will not give me an error.

why doesn't it work?? please let me know where to fix this issue? I've been debugging this for hours but can't find the answer.. and there's no shift/reduction problem in my code

UPDATE:

Another configuration I set up, which still doesn't work

prog : prog prog_dcl
     | prog func
     | /* empty */
;

prog_dcl : dcl ';'
               | dcl error '\n' {yyerrok; yyclearin; fprintf(stderr, "No semicolon after declarations\n");}




prog : prog dcl ';'
     | prog error {yyerrok; yyclearin; fprintf(stderr, "No semicolon at end of dcl\n");}
     | prog func
     | /* empty */
;
        
 
dcl  : type var_decl comma_vardcl
     | type ID '(' parm_types ')' id_parmtypes
     | EXTERN type ID '(' parm_types ')' id_parmtypes
     | VOID ID '(' parm_types ')' id_parmtypes
     | EXTERN VOID ID '(' parm_types ')' id_parmtypes
;
 
 
comma_vardcl : comma_vardcl ',' var_decl
             | /*epsilon*/ 
 
type : CHAR
     | INT

Open in new window

Avatar of Mysidia
Mysidia
Flag of United States of America image

Shouldn't that be

| prog decl error  

?
What's it doing that you don't expect it to do?

And is there any way that "int abc"  is expanding as a func,  possibly part of a func followed by an error "token" ?
Avatar of kuntilanak

ASKER

I already tried prog dcl error as well and it doesn't work..

I think that int abc won't be recognized as a function, as you can see from the definition dcl, int abc match type var dcl comma_vardcl where comma_vardcl is epsilon.. this is really weird..
I think that this really goes back to yyerror trying to match 3 tokens, but that is already handled by yyerrok, as far as my knowledge goes..

so the way it parses (I think) is INT ID(abc) INT then it generates a syntax error and then goes back to the normal state and then it see's another ID, which is the ab, then creates another error as it doesn't match any of the rule...
Avatar of Infinity08
Below you'll find some simple example code built to match your case. It is a stand-alone application, so you just need to run yacc on it, and then compile it into an executable.

A correct input file would contain for example :

        d;d;fd;

(2 declarations, one function and another declaration)

An incorrect input file would contain for example :

        d;dd;fd;

(3 declarations of which the second is missing a ;, one function and another declaration)

Run this, and play with it to get an idea of how error handling works (modify it, and add functionalities etc.). Once you fully understand this simple example, you can implement the same in your actual code.
%{
#include <stdio.h>
 
int yylex();
%}
 
%start prog
 
%%
 
prog  : prog dcl ';'   { printf("declaration found\n"); }
      | prog dcl error { printf("ERROR : missing ; !\n"); }
      | prog func      { printf("function found\n"); }
      | /* empty */    { printf("empty rule matched\n"); }
      ;
 
dcl   : 'd'
      ;
 
func  : 'f'
      ;
 
%%
 
void yyerror(char *s) {
  printf("ERROR : parse error !\n");
}
 
int yywrap() {
  return 1;
}
 
FILE *yyin;
 
int yylex() {
  return fgetc(yyin);
}
 
int main(int argc, char *argv[]) {
  yyin = fopen("test.in", "r");
  yyparse();
  fclose(yyin);
  return 0;
}

Open in new window

I just found where the conflicts occur in, as the parser is stupid enough that it can 't differentiate between to use the dcl rule or the func, here's my func rule:

as you can see, when I give the input :

int abc

it reads it as a function and not a dcl without a ;. as the rule that it has conflicts in dcl is:

dcl  : type var_decl comma_vardcl ';'

where var_decl can be replaced by an ID, so therefore it reads as type ID as well, so how can I tell the parser which rules to use, in this case I want the parser to interpret int abc as a dcl not a function

func   :  type ID '(' parm_types ')' '{' type_vardcl inside_stmt '}'
	|  VOID ID '(' parm_types ')' '{' type_vardcl inside_stmt '}'
;

Open in new window

>> in this case I want the parser to interpret int abc as a dcl not a function

How do you want to make that distinction ? If the input is erroneous, one of the most difficult things is to find out what the real error is. Maybe the user forgot a ; or maybe he forgot to add the function body, or maybe there are a few extra tokens that aren't supposed to be there or ... All of these are possible, and determining which is probable is very difficult ...

Furthermore, note that you're not dealing with a simple grammar (like the one I used in my example), so there are lots of things that add complications, especially for error handling.

I would really suggest to get a good grasp of error handling using a simple grammar before trying to apply that knowledge on your actual grammar.
I see, so there's no way that we can tell the parser to choose which ones is. I know even human (me) can't even determine which rule to handle it with, as it can be both.. the most logical thing that people normally do is to interpret int abc as an error in a dcl rule not a function, therefore it's a missing semi colon. How do I do that though?

Here's my modified grammar which still doesn't work, the parser still likes to go to the func production rule, instead of the dcl.. I don't know why and what preference it is taking : (
prog : prog dcl
     | prog func
     | /* empty */
;
        
 
dcl  : type ID comma_vardcl ';'
     | type ID  '[' INTCON  ']' comma_vardcl ';'
     | type ID comma_vardcl error {yyerrok; yyclearin; fprintf(stderr, "Missing semicolon at the end of declaration\n");}
     | type ID '(' parm_types ')' id_parmtypes ';'
     | EXTERN type ID '(' parm_types ')' id_parmtypes ';'
     | VOID ID '(' parm_types ')' id_parmtypes ';'
     | EXTERN VOID ID '(' parm_types ')' id_parmtypes ';'
;
 
 
comma_vardcl : comma_vardcl ',' var_decl
             | /*epsilon*/ 

Open in new window

>> I see, so there's no way that we can tell the parser to choose which ones is.

Yes you can, by restructuring the grammar and/or adding appropriate actions.
With restructuring the grammar, I mean in such a way that it better fits your error handling methodology. In other words, you decide which cause is the most likely for a given error, and then find out the best way to construct the grammar rules for that.


>> the parser still likes to go to the func production rule

Well, you have to realize that the parser generated by Yacc has made a lot of modifications/optimizations to the grammar rules. So, it most likely combined the identical parts of the dcl and func rules, only splitting it up when they become different.


If you want a missing ; after "type ID" to be reported, then one way of doing that is to have "type ID" as a separate rule, and use that everywhere else (both in the dcl and func rules).

For example, the first set of rules below doesn't work as required - the second does. Here's some sample correct and wrong inputs :

        correct : dc;dc;dcfdc;
        incorrect : dc;dcdc;dcfdc;

------ doesn't work : ------
 
prog  : prog dcl ';'   { printf("declaration found\n"); }
      | prog dcl error { printf("ERROR : missing ; !\n"); }
      | prog func      { printf("function found\n"); }
      | /* empty */    { printf("empty rule matched\n"); }
      ;
 
dcl   : 'd' 'c'
      ;
 
func  : 'd' 'c' 'f'
      ;
 
 
 
------ works : ------
 
prog  : prog dcl ';'   { printf("declaration found\n"); }
      | prog func      { printf("function found\n"); }
      | /* empty */    { printf("empty rule matched\n"); }
      ;
 
dcl   : dcl_sub
      ;
 
func  : dcl_sub 'f'
      ;
 
dcl_sub : 'd' 'c'
        | 'd' 'c' error { printf("ERROR : missing ; !\n"); }
        ;

Open in new window

the problem arise here, a function can have type ID but in a declaration it can be type ID or type ID  '[' INTCON  ']' . You see it here of my original code again:

If a dcl can only have a type ID then it would be easy to do as what you say...



prog : prog dcl
     | prog func
     | /* empty */
;
        
 
dcl  : type var_decl comma_vardcl ';'
     | type ID '(' parm_types ')' id_parmtypes ';'
     | EXTERN type ID '(' parm_types ')' id_parmtypes ';'
     | VOID ID '(' parm_types ')' id_parmtypes ';'
     | EXTERN VOID ID '(' parm_types ')' id_parmtypes ';'
;
 
 
comma_vardcl : comma_vardcl ',' var_decl
             | /*epsilon*/ 
 
 
id_parmtypes : ',' ID '(' parm_types ')' id_parmtypes
             | /*epsilon*/
 
var_decl  :  ID  '[' INTCON  ']' 
          |  ID
;
 
 
func   :  type ID '(' parm_types ')' '{' type_vardcl inside_stmt '}'
       |  VOID ID '(' parm_types ')' '{' type_vardcl inside_stmt '}'
;

Open in new window

>> If a dcl can only have a type ID then it would be easy to do as what you say...

As I said : you need to simplify things first to get a better understanding before taking on the complicated stuff.

My example might seem simple, but it's perfectly applicable in your situation. You just have to extend it to all possible uses of "type ID" in that specific case. That includes the ones that are followed by '[' or whatever else.

Note that in my simple example, <'d' 'c'> stands for <type ID> and <'f'> stands for anything that might come after it to make it a valid rule other than a ';'.
In this code, which I believe is the main important thing here, why don't you put error '\n' ? and you're not using the yyclearin or yyerrok?
dcl_sub : 'd' 'c'
        | 'd' 'c' error { printf("ERROR : missing ; !\n"); }
        ;

Open in new window

here's my latest updated code which seems no to work as it doesn't continue to parse at the right place
prog : prog dcl ';'
     | prog func
     | /* empty */
;
        
 
dcl  : type_id comma_vardcl 
     | type_id '[' INTCON  ']'comma_vardcl 
     | type_id '(' parm_types ')' id_parmtypes 
     | EXTERN type_id '(' parm_types ')' id_parmtypes 
     | VOID ID '(' parm_types ')' id_parmtypes 
     | EXTERN VOID ID '(' parm_types ')' id_parmtypes 
;
 
comma_vardcl : comma_vardcl ',' var_decl
             | /*epsilon*/ 
 
var_decl  :  ID  '[' INTCON  ']' 
          |  ID
;
 
func   :  type_id '(' parm_types ')' '{' type_vardcl inside_stmt '}'
       |  VOID ID '(' parm_types ')' '{' type_vardcl inside_stmt '}'
;
 
type_id : type ID
	 | type ID error { printf("ERROR : missing ; !\n"); }

Open in new window

>> why don't you put error '\n' ?

As I said earlier : '\n' isn't part of any rules, so why would you add it there ? I bet whitespace is even ignored by the lexer, right ?


>> and you're not using the yyclearin or yyerrok?

Because it's not needed here.


>> as it doesn't continue to parse at the right place

That's normal, isn't it ? You decided that the mistake was a missing ;. When you catch that error, you have to make sure that the parser gets back to the right state afterwards. ie. it has to be able to "fall through" or otherwise get to a stable state (it will try to match the same rule again). In this case, it's quite straightforward, since prog is that stable state. You just have to make sure that it gets back there, as well as recognize the valid input.

Error recovery is very complicated, and I would really suggest to start with simple examples, and then expand them. It's difficult enough like that - you don't want to get lost in details when starting out.
well yes, I understand that, for example if this is the input as a test

(1)int abc
(2)
(3)int ab


it should print out a failure in the first line and then continue to parse and go back to the normal state in the third line.. however it's not the case here.. here's what I got :

Line 3: syntax error : Missing a semicolon!

So it's saying that int ab is also an error? Therefore I make a conclusion that the parser is not recovering at the right state... that's my problem here..
I think it's parsing as it's supposed to be, it continues to parse for the next thing in the rule.. however it turns out that it reports the line number falsely, in the example above:

(1)int abc
(2)
(3)int ab


Line 3: syntax error : Missing a semicolon!

it should have print:

Line 1: syntax error : Missing a semicolon!
Line 3: syntax error : Missing a semicolon!

but it seems like it just combines it to 1 statement, I am sure this has something to do with the parser calling yyerror only once, and I am not really sure why.. so how do I fix this now?  From my intuition
after reading (1)int abc then the parses figures that it's an error, so it should call yyerror at that time and then print out the message, then it should return back to it's normal state and then process line 2, which is just a blank space, then after seeing int ab it calls yyerror again and print the error message.

Seems what is happening now is it's not going to the normal state after reading the first line, therefore it only calls the yyerror once and resulting in only 1 error message, so please help to find a solution to this. I have no idea or so whatever why it's only calling the yyerror once
in your code

you said that dcl: dcl_sub, the fact is that.. it's not just that

the grammar rule says that it is: dcl_sub something else

so I bet we can't use your approach
 
------ works : ------
 
prog  : prog dcl ';'   { printf("declaration found\n"); }
      | prog func      { printf("function found\n"); }
      | /* empty */    { printf("empty rule matched\n"); }
      ;
 
dcl   : dcl_sub
      ;
 
func  : dcl_sub 'f'
      ;
 
dcl_sub : 'd' 'c'
        | 'd' 'c' error { printf("ERROR : missing ; !\n"); }
        ;

Open in new window

>> So it's saying that int ab is also an error?

Isn't it ? There's no semicolon at the end of the line ...


>> but it seems like it just combines it to 1 statement

That is because of the way the error recovery rules are structured ... after encountering an error, they make the parser skip over several tokens, until it can match a rule again. Since there are two errors in a row here, it skips over both, until it can resume normal parsing after that.

Again, how do you want the parser to realize that there are two errors here ? Or just one, or three, or ... ?
You can almost certainly not use a newline to help in that decision as explained in your other question.

Just a note : try giving this input to your favorite C compiler :

        int main(void) {
            int a
            int b
            int c
            return 0;
        }

To a human, it's clear that there are 3 errors here : the semicolon was forgotten three times.
Let's see what your compiler thinks of it, and what errors it reports.

Mine (gcc) has this to say :

   test.c In function `main':
3 test.c syntax error before "int"

And that's it. Just one error message, stating that something is wrong just before the third line (ie. at the end of the second line, ie. a missing semicolon).
okay, so I think there's nothing I can do about it then.. I 've modified the code to be something like this.. and it's not saying that there, it's acting weird again
prog : prog dcl ';'
     | prog func
     | /* empty */
;
        
dcl  : type ID comma_vardcl
     | type ID '[' INTCON ']' comma_vardcl
     | dcl_func id_parmtypes 
     | EXTERN dcl_func id_parmtypes 
;
 
dcl_func : type ID '(' parm_types ')'
	   | VOID ID '(' parm_types ')'
	   | VOID ID '(' parm_types error {fprintf(stderr, "Expected ')' \n");}
	   | type ID '(' parm_types error {fprintf(stderr, "Expected ')' \n");}
	   | type ID '(' error {fprintf(stderr, "Something wrong with parameter type, can't be empty\n");}
	   | VOID ID '(' error {fprintf(stderr, "Something wrong with parameter type, can't be empty\n");}
	   | type ID error {fprintf(stderr, "Missing semicolon\n");}
	   | VOID ID error {fprintf(stderr, "Missing semicolon\n");}
	  
;
 
comma_vardcl : comma_vardcl ',' var_decl
             | /*epsilon*/ 
;
 
id_parmtypes : ',' ID '(' parm_types ')' id_parmtypes
             | /*epsilon*/
;
 
var_decl  :  ID  '[' INTCON  ']' 
          |  ID
;
 
 
type : CHAR
     | INT
;
 
parm_types : VOID
           | type ID  '[' ']' inside_parmtypes
           | type ID  inside_parmtypes
;
 
inside_parmtypes : ',' type ID '[' ']' inside_parmtypes
                 | ',' type ID inside_parmtypes
                 | /*epsilon*/
;
 
func   : dcl_func '{' type_vardcl inside_stmt '}'
;
 
type_vardcl : type_vardcl type var_decl comma_vardcl1 ';'
	      | type_vardcl type var_decl comma_vardcl1 error {fprintf(stderr, "Missing semicolon\n");}
            | /*epsilon*/
;
 
comma_vardcl1 : comma_vardcl ',' var_decl
		 | comma_vardcl ',' error {fprintf(stderr, "Comma without declaration\n");}
              | /*epsilon*/

Open in new window

when given an input

int a;
int b;
int c;

it only reports:

Line 2: syntax error: Missing semicolon

I don't know what's happening to the third line
and is that normal?
>> so I think there's nothing I can do about it then

You can always do something about it (anything is possible), but it will mean a lot of effort - most likely more than it's worth.


>> Line 2: syntax error: Missing semicolon

line 2 does have a semicolon, doesn't it ? Is this wrong error message your problem ? Or is something else the problem ?
from my code above, when given an input:

int main(void) {
            int a
            int b
            int c
            return 0;
        }

it gives me error at line 2 missing semicolon, please take a look at the code and let me know what you think
lets look at this part of the code first:

func   : dcl_func '{' type_vardcl inside_stmt '}'
;

type_vardcl : type_vardcl type var_decl comma_vardcl1 ';'
           | type_vardcl type var_decl comma_vardcl1 error {fprintf(stderr, "Missing semicolon\n");}
            | /*epsilon*/
;

comma_vardcl1 : comma_vardcl ',' var_decl
              | /*epsilon*/
;

var_decl  :  ID  '[' INTCON  ']'
          |  ID
;


if given this input:

    int main(void) {
            int a
            int b
            int c
            return 0;
        }


why doesn't it go to this error:

| type_vardcl type var_decl comma_vardcl1 error {fprintf(stderr, "Missing semicolon\n");}

One reason might be because that comma_vardcl1 contains an epsilon in it and when it reads that it already shows an error there , so how do I fix this?

okay, sorry to post again, here's my problem.. just please look at this one and discard all the above.

Problem, when given input:

int main(void) {
            int a
            int b
            int c
            return 0;
        }

It does not give me the missing semicolon error in the rule type_vardcl. Can anyone please tell me why and how to fix this?
func   : dcl_func '{' inside_type_vardcl inside_stmt '}'
;
 
dcl_func : type ID '(' parm_types ')'
	  | VOID ID '(' parm_types ')'  
;
 
parm_types : VOID
           | type ID  '[' ']' inside_parmtypes
           | type ID  inside_parmtypes
;
 
inside_type_vardcl : inside_type_vardcl type_vardcl
		     | /*epsilon*/
;
 
type_vardcl : type var_decl comma_vardcl ';'
	     | type var_decl comma_vardcl error {fprintf(stderr, "Missing semicolon\n");}
;
 
comma_vardcl : comma_vardcl ',' var_decl
             | /*epsilon*/ 
;
 
var_decl  :  ID  '[' INTCON  ']' 
          |  ID
;

Open in new window

one other weird thing, when given an input:

extern int test(void)

it can report a missing semicolon error

but if it's given

int test(void)

it doesn't report an error..

weird...

prog : prog dcl
     | prog func
     | /* empty */
;
        
dcl  : type_vardcl
     | dcl_func id_parmtypes ';'
     | EXTERN dcl_func id_parmtypes ';'
     | dcl_func id_parmtypes error {fprintf(stderr, "Missing semicolon\n");}
     | EXTERN dcl_func id_parmtypes error {fprintf(stderr, "Missing semicolon\n");}
;

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Infinity08
Infinity08
Flag of Belgium image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
so what should I do in order to fix this?
Re-structuring the grammar rules sounds like it would do. Make sure that the missing semicolon error can only occur in one rule (as opposed to 3 currently), and catch the error there.
Yes I know that, the only way is to restructure the grammar, no other way. But still, how do I restructure the grammar in this case? I tried like a gazillion of different ways and still can't find it
The idea is that the moment a semicolon is expected, only one rule can be active (and that's the rule where you'll add the error handling). The easiest way to achieve that is to make sure that the token(s) that are valid right before the semicolon are all read in the same rule. In this case, right before the semicolon, an identifier token should be read. And after that, it has the choice between a ',' ';' or '['. The point where it splits between these three has to be within the same rule.
>>right before the semicolon, an identifier token should be readright before the semicolon, >>an identifier token should be read

it's not just an identifier, take a look at the rule below:

so it's all after a declaration it should expect an error, therefore I should put the error in the prog error {fprintf(stderr, "missing a semicolon\n");}

right? I tried this since the first place and it doesn't work




prog : prog dcl
     : prog func
     : /*empty*/
 
dcl : type var_decl { ',' var_decl }  | [ extern ] type id '(' parm_types ')' { ',' id '(' parm_types ')' }  | [ extern ] void id '(' parm_types ')' { ',' id '(' parm_types ')' }

Open in new window

Put it like this : the error handling should be in a choke point - a point where several branches of the parse rules converge again. If it's not possible to have such a "choke point", then you can have more than one location where the same error is handled. That again makes things more complicated though.

You should have a mental picture of how the parse rules interact. Mostly it starts out as a tree, with merging branches, making it more into a graph. Having this mental picture before you at all times will help greatly in finding the right ways to structure the rules, and the right locations to add error handling.

Note that this becomes increasingly difficult the larger the grammar is that the parser has to handle (obviously).

Apologies that I keep things generic, but I don't have the time to really get into your specific case as I explained earlier. Maybe later tonight, but I can't promise anything.
I get all what you're saying. However I am really clueless on how this specific thing should be fixed, it's not that I am too lazy to do it but I've tried everything I could and it still doesn't work. So please can you work this one out for me? An example worth a millions to me as I mostly learned through example
I even tried to simplify my grammer till this point . The below code even doesn't work.. why??
prog : prog dcl ';'
	prog dcl error {fprintf(stderr, "Missing semicolon\n");}
     | /* empty */
;
        
dcl  : type ID
     | type ID '[' INTCON  ']' 
     
;
 
type : CHAR
     | INT
;

Open in new window

>> The below code even doesn't work.. why??

For the same reason. After reading an 'ID', the parser expects either a ';' (as per the first prog rule) or a '[' (as per the second dcl rule).

For this simplified grammar, the parser will do something like below (pseudo code) :
token = getNextToken();
if ((token == CHAR) || (token == INT)) {
  token = getNextToken();
  if (token == ID) {
    token = getNextToken();
    if (token == ';') {
      // ALL DONE : successfully parsed !
    }
    else if (token == '[') {
      token = getNextToken();
      if (token == INTCON) {
        token = getNextToken();
        if (token == ']') {
          token = getNextToken();
          if (token == ';') {
            // ALL DONE : successfully parsed !
          }
          else {
            // parse error !!
          }
        }
        else {
          // parse error !!
        }
      }
      else {
        // parse error !!
      }
    }
    else {
      // parse error !!
    }
  }
  else {
    // parse error !!
  }
}
else {
  // parse error !!
}

Open in new window

wow..I don't expect to use that huge of code.. anyway I think I solved the problem by doing a :

prog : prog dcl ';'
          prog func
          prog error
          /*epsilon*/


regarding this part of the rule'

what kind of syntax error could happen there?
expr   : '-' expr %prec UNARY
       | '!' expr %prec UNARY
       | expr '+' expr 
       | expr '-' expr
       | expr '*' expr
       | expr '/' expr
       | expr EQ expr
       | expr NE expr
       | expr LE expr
       | expr '<' expr
       | expr GE expr
       | expr '>' expr
       | expr AND_OP expr
       | expr OR_OP expr
       | ID opt_idexpr
       | '(' expr ')'
       | INTCON
       | CHARCON
       | STRINGCON
;

Open in new window

So, it first checks if the token is a ';', then if it's a '[', and if it's neither, it generates a parser error. This parser error will occur in the dcl rule, not in the prog rule (where you have your error handling code).

You can do something like the following to resolve it - a few options (not necessarily all optimal, but just to give you some ideas) :
prog : prog dcl ';'
     | prog dcl '[' INTCON  ']' ';'
     | prog dcl error { fprintf(stderr, "Missing semicolon\n"); }
     | /* empty */
     ;
        
dcl  : type ID
     ;
 
type : CHAR
     | INT
     ;
 
===========================================================================
 
prog : prog dcl_sim ';'
     | prog dcl_sim dcl_arr ';'
     | prog dcl_sim error { fprintf(stderr, "Missing semicolon\n"); }
     | prog dcl_sim dcl_arr error { fprintf(stderr, "Missing semicolon\n"); }
     | /* empty */
     ;
 
dcl_sim : type ID
        ;
 
dcl_arr : '[' INTCON ']'
        ;
 
type : CHAR
     | INT
     ;

Open in new window

>> wow..I don't expect to use that huge of code..

That was just for illustrating what the yacc generated code would do behind the scenes.


>> what kind of syntax error could happen there?

Anything that is unexpected ;) An unexpected token. A missing token, premature end of input, ...
If I do something like this:

it will generate a lot of reduce/reduce conflict...
expr   : '-' expr %prec UNARY
       | '!' expr %prec UNARY
       | expr '+' expr 
       | expr '-' expr
       | expr '*' expr
       | expr '/' expr
       | expr EQ expr
       | expr NE expr
       | expr LE expr
       | expr '<' expr
       | expr GE expr
       | expr '>' expr
       | expr AND_OP expr
       | expr OR_OP expr
       | ID opt_idexpr
       | '(' expr ')'
       | INTCON
       | CHARCON
       | STRINGCON
       | error {fprintf(stderr, "invalid expression\n");}
;

Open in new window

another thing that won't work out.. remember that inside the declaration there's not only 2 option, there's a lot.. so doing that would be like putting everything in the prog side rule
>> so doing that would be like putting everything in the prog side rule

Of course. What I showed are just some of the techniques you can use. Depending on the specific grammar you are using and the specific errors you want to catch, you will of course have to use certain techniques to restructure the rules rather than others ... There's no general rule, but as I said earlier : understanding how the parser works will help you a lot.

This is not a simple subject (as I've said earlier), and it will involve some effort to get it working right :)