We help IT Professionals succeed at work.

Piping to lex/bison application

letharion
letharion asked
on
691 Views
Last Modified: 2012-05-06
I've written a application using flex and bison to parse some data.

In psuedo code, I'd like to do this:

string A
program Parser
A = Parser(datatostdin)

The purpose of this is to automate parsing of large amounts of data.
I've been thinking that there should be atleast to ways to do this.
A) I could somehow compile the C generated directly into my own C app, instead of compiling it to it's own standalone program
B) I could pipe data to and from the parser

Have you done anything similar, and which way would you then recommend?
Will either of them not work as I think? Are there other ways to accomplish this?
Comment
Watch Question

Author

Commented:
It would appear that popen could be used to accomplish B.
There also seems to be problems with it: http://www.opengroup.org/onlinepubs/007908799/xsh/popen.html
CERTIFIED EXPERT
Top Expert 2009
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION

Author

Commented:
Ah. Interesting. Then there's atleast three ways ;)

This seems to generally be the best one, if not the simplest to implement.
CERTIFIED EXPERT
Top Expert 2009

Commented:
Play around with it a bit (use my code above for example), and see if you get it to work :) If it works, you can apply the same principles in your own code, and thus keep everything in the same executable without any piping or other workarounds :)

Author

Commented:
Yes, that's why I thought it was the best solution :)
Is there any end to your knowledge Infinity? ;)
CERTIFIED EXPERT
Top Expert 2009

Commented:
>> Is there any end to your knowledge Infinity? ;)

lol. I'd like to refer to my nickname to answer that ... But I'm afraid there is definitely a limit to my knowledge. Fortunately so, because if there weren't, it would be kind of boring, since there wouldn't be anything to learn any more.

It makes me think of one of my favorite quotes : "As the island of our knowledge grows, so does the shore of our ignorance" (John A. Wheeler)

Author

Commented:
Well said :)
I did a dirty hack for the time being:

popen(echo datatoparse | parser >> File);

Which works well.
Do you mind if I leave this open for a while, until I get to fixing it up and maybe I have further questions on your suggestion?
CERTIFIED EXPERT
Top Expert 2009

Commented:
No problem.

Author

Commented:
Mysteriously but not to my surprise, my piping solution stopped working. I figure I should use a "proper" solution instead of fixing up a bad one, so here I am.

I have one question, that I may have understood the answer to previously and now forgotten, or I never realised it was a problem.

How would I interact with these new in/output functions? Reading from a memory buffer is excellent, but how do I tell the lexer where that memory buffer is? Can I cc the lexer to a library, and that gives me a "start" function to call which will accept a pointer to the memory?
CERTIFIED EXPERT
Top Expert 2009

Commented:
Since you write the three functions yourself, you can pretty much make them do whatever you want, including specifying the memory buffer of your choice. See the example code I posted.

Author

Commented:
Yes, ofc :) But I have program A that generates the content for the said memory buffer, and program B that runs the lexer. Program B doesn't immediately have access to any memory buffers that belong to program A.

Or am I missing something fundamental about your example?
CERTIFIED EXPERT
Top Expert 2009

Commented:
I thought the idea was not to have two separate programs, but to combine them into one ?
If not, you can still use the piping solution, or otherwise transfer the data from one program to the other.

Author

Commented:
Absolutely, that is the idea :) I must have been unclear

That's why I asked "Can I cc the lexer to a library?"

How do I combine them into one?
CERTIFIED EXPERT
Top Expert 2009

Commented:
>> That's why I asked "Can I cc the lexer to a library?"

Yes. I already responded to that here : http:#24467125


>> How do I combine them into one?

Whatever you prefer ... You can use the library approach if you want. Or you can integrate it in the existing code, or ...

Author

Commented:
Flex/Bison produces valid C code, so I can just include their work in my regular files.
It's not harder than that I guess. That was probably obvious to me while I was working with the programs, but I just didn't realise that now.
CERTIFIED EXPERT
Top Expert 2009

Commented:
>> Flex/Bison produces valid C code, so I can just include their work in my regular files.
>> It's not harder than that I guess.

Indeed :) You just need to provide the proper interface to use the generated lexer/parser, but that's not very complicated.

Author

Commented:
After including the compiled files into the rest of my project, and attempted to run the previous "main()" function now just named "test()" from my project, the parser seems to enter a while(true) {}.
I also get:
PFtIF.lex(8): warning: statement is unreachable
PFtIF.lex(9): warning: statement is unreachable
PFtIF.lex(10): warning: statement is unreachable
PFtIF.lex(11): warning: statement is unreachable
PFtIF.lex(12): warning: statement is unreachable
on the below code.

I started digging in lex.yy.c to figure out what was happening, I post the relevant lines below too.
I tried following the execution path, but I'm not sure what happens. yy_act becomes 8, which doesn't make any sense?
#define INITIAL 0
#define YY_END_OF_BUFFER 8
#define YY_STATE_EOF(state) (YY_END_OF_BUFFER + state + 1)
 
 
#define YY_USER_ACTION
#endif
 
/* Code executed at the end of each rule. */
#ifndef YY_BREAK
#define YY_BREAK break;
#endif
 
#define YY_RULE_SETUP \
   YY_USER_ACTION
 
/** The main scanner function which does all the work.
 */
YY_DECL
{
   register yy_state_type yy_current_state;
   register char *yy_cp, *yy_bp;
   register int yy_act;
 
#line 6 "PFtIF.lex"
 
#line 648 "lex.yy.c"
 
   if ( !(yy_init) )
      {
      (yy_init) = 1;
 
#ifdef YY_USER_INIT
      YY_USER_INIT;
#endif
 
      if ( ! (yy_start) )
         (yy_start) = 1;   /* first start state */
 
      if ( ! yyin )
         yyin = stdin;
 
      if ( ! yyout )
         yyout = stdout;
 
      if ( ! YY_CURRENT_BUFFER ) {
         yyensure_buffer_stack ();
         YY_CURRENT_BUFFER_LVALUE =
            yy_create_buffer(yyin,YY_BUF_SIZE );
      }
 
      yy_load_buffer_state( );
      }
 
   while ( 1 ) {  /* loops until end-of-file is reached */
      yy_cp = (yy_c_buf_p);
 
      /* Support of yytext. */
      *yy_cp = (yy_hold_char);
 
      /* yy_bp points to the position in yy_ch_buf of the start of
       * the current run.
       */
      yy_bp = yy_cp;
 
      yy_current_state = (yy_start);
yy_match:
      do {
         register YY_CHAR yy_c = yy_ec[YY_SC_TO_UI(*yy_cp)];
         if ( yy_accept[yy_current_state] ) {
            (yy_last_accepting_state) = yy_current_state;
            (yy_last_accepting_cpos) = yy_cp;
         }
         while ( yy_chk[yy_base[yy_current_state] + yy_c] != yy_current_state ) {
            yy_current_state = (int) yy_def[yy_current_state];
            if ( yy_current_state >= 24 )
               yy_c = yy_meta[(unsigned int) yy_c];
         }
         yy_current_state = yy_nxt[yy_base[yy_current_state] + (unsigned int) yy_c];
         ++yy_cp;
      }
      while ( yy_base[yy_current_state] != 29 );
 
yy_find_action:
      yy_act = yy_accept[yy_current_state];
      if ( yy_act == 0 )
         { /* have to back up */
         yy_cp = (yy_last_accepting_cpos);
         yy_current_state = (yy_last_accepting_state);
         yy_act = yy_accept[yy_current_state];
         }
      printf("%d ", yy_act);  Hits once, yy_act is 8
      YY_DO_BEFORE_ACTION;
 
do_action:  /* This label is used only to access EOF actions. */
      printf("Test"); fflush(NULL);
 
      switch ( yy_act )
   { /* beginning of action switch */
         case 0: /* must back up */
         printf("0"); fflush(NULL);
         /* undo the effects of YY_DO_BEFORE_ACTION */
         *yy_cp = (yy_hold_char);
         yy_cp = (yy_last_accepting_cpos);
         yy_current_state = (yy_last_accepting_state);
         goto yy_find_action;
 
case 1:
YY_RULE_SETUP
#line 7 "PFtIF.lex"
return RETURN;
   YY_BREAK
case 2:
YY_RULE_SETUP
#line 8 "PFtIF.lex"
yylval.number = atof(yytext + 3); return CONSTANT;
   YY_BREAK
case 3:
YY_RULE_SETUP
#line 9 "PFtIF.lex"
yylval.number = atof(yytext); return OPERAND;
   YY_BREAK
case 4:
YY_RULE_SETUP
#line 10 "PFtIF.lex"
yylval.number = atof(yytext); return OPERAND;
   YY_BREAK
case 5:
YY_RULE_SETUP
#line 11 "PFtIF.lex"
yylval.number = atof(yytext); return OPERATOR;
   YY_BREAK
case 6:
/* rule 6 can match eol */
YY_RULE_SETUP
case 7:
YY_RULE_SETUP
#line 13 "PFtIF.lex"
ECHO;
   YY_BREAK
#line 767 "lex.yy.c"
case YY_STATE_EOF(INITIAL):
   yyterminate();
 
   case YY_END_OF_BUFFER:
      {
      /* Amount of text matched not including the EOB char. */
      int yy_amount_of_matched_text = (int) (yy_cp - (yytext_ptr)) - 1;
 
      /* Undo the effects of YY_DO_BEFORE_ACTION. */
      *yy_cp = (yy_hold_char);
      YY_RESTORE_YY_MORE_OFFSET

Open in new window

Author

Commented:
Hmm, if I don't include lex.yy.c, everything compiles fine, but I get "undefined symbol: _Z5yylexv", during runtime.

If I do include the file, I get the "statement unrechable" warnings.

Author

Commented:
That last post can probably be regarded as irrelevant, but got me thinking, what/which files are supposed to be included?
CERTIFIED EXPERT
Top Expert 2009

Commented:
In the code you posted, the '#endif' on line 7 - what #if does it correspond to ?


>> PFtIF.lex(8): warning: statement is unreachable

I assume PFtIF.lex is the code you posted ? Does the line 8 in the error message correspond to the line 8 in the code you posted ? If not, which line does it correspond to ?

Author

Commented:
I'm gonna try to find the conditions that prompt the error message, but meanwhile, here is the code generated.
PFtIF.tgz.zip
CERTIFIED EXPERT
Top Expert 2009

Commented:
It would be more useful if you could actually post the complete PFtIF.lex that caused that error.

But in any case, the reason for the unreachable statement warnings is because of this generated code where you have break's after a return statement (ie. the break can never be reached) :
case 1:
YY_RULE_SETUP
#line 7 "PFtIF.lex"
return RETURN;
	YY_BREAK
case 2:
YY_RULE_SETUP
#line 8 "PFtIF.lex"
yylval.number = atof(yytext + 3); return CONSTANT;
	YY_BREAK
case 3:
YY_RULE_SETUP
#line 9 "PFtIF.lex"
yylval.number = atof(yytext); return OPERAND;
	YY_BREAK
case 4:
YY_RULE_SETUP
#line 10 "PFtIF.lex"
yylval.number = atof(yytext); return OPERAND;
	YY_BREAK
case 5:
YY_RULE_SETUP
#line 11 "PFtIF.lex"
yylval.number = atof(yytext); return OPERATOR;
	YY_BREAK

Open in new window

Author

Commented:
Well, you pretty much have it there :)
But here it is in it's entirety

%{
#include <stdio.h>
#include "PFtIF.tab.h"
%}

%%
99                         return RETURN;
10\ [+-]?[0-9]*"."[0-9]+   yylval.number = atof(yytext + 3); return CONSTANT;
[1-9][0-9][0-9]            yylval.number = atof(yytext); return OPERAND;
[7-9][0-9]                 yylval.number = atof(yytext); return OPERAND;
[0-6][0-9]*                yylval.number = atof(yytext); return OPERATOR;
[ \t\n]                    /* Ignore */;
%%

Doh!
ofc, as usual.
Hmm, nothing to worry about the most likely, but if it's not a bug, then why are the breaks there?
CERTIFIED EXPERT
Top Expert 2009

Commented:
First of all, you probably want to put {}'s around those statements, like :

        10\ [+-]?[0-9]*"."[0-9]+   { yylval.number = atof(yytext + 3); return CONSTANT; }


>> then why are the breaks there?

To make sure that the different case statements are separated. Some compilers (especially C++ compilers) warn about unreachable statements. You can get rid of the warnings by re-defining YY_BREAK to nothing :

        #define YY_BREAK

But if you do that, then make absolutely sure that each rule ends in either a return or a break statement !!

Author

Commented:
The way I understand things, in the below code the function nextSymbol() should be called repeatedly to feed data to the lexer.

I call lexerMain, which calls yyparse(), which I thought somewhere along the line would cause nextSymbol() to be called. However, this never happens.
The output is only "Lexer was called", and then the program apperas to get stuck in an eternal loop.
I have yet to determine where exactly that problem is.

Have I done some obviously wrong with the code below, or is that just to little data to tell?
#undef input
#define input() nextSymbol()
 
int nextSymbol() {
   printf("nextSymbol was called");
   fflush(NULL);
   return 0;
}
 
int lexerMain(float* prog) {
   printf("Lexar was called\n");
   progArray = prog;
   tos = stack; /* tos points to the top of stack */
   sp = stack; /* initialize sp */
   yyparse();
   return 0;
}

Open in new window

Author

Commented:
Looking closer at the source, it appears that I must re-define yyin:

#ifdef YY_STDINIT
    yyin = stdin;
    yyout = stdout;
#else
    yyin = (FILE *) 0;
    yyout = (FILE *) 0;
#endif
CERTIFIED EXPERT
Top Expert 2009

Commented:
Did you check the code I posted in http:#24047760 ? Does it work for you ? If it does, then continue reading. If not, then let's work on that first  (start by explaining how it doesn't work) ...


>> Have I done some obviously wrong with the code below, or is that just to little data to tell?

Did you put this code in the correct location(s), as in the example I mentioned ?

Author

Commented:
I think the problem is that you are using lex, while I'm using flex. (Which is my bad, as I have clearly stated in the question that I'm using lex)

I'm currently reading:
http://books.google.se/books?id=YrzpxNYegEkC&pg=PA155&lpg=PA155&dq=redefining+flex+input&source=bl&ots=shHPMoXmdb&sig=6UoZubcG5ub9S_1VID6C6zzuq5s&hl=sv&ei=rHdLStS6CcHz_AaI0eCFCQ&sa=X&oi=book_result&ct=result&resnum=1
Which claims that input() can not be redefined with Flex.

Instead, I should redefine YY_INPUT, which seems to make sense, looking at the code for YY_INPUT.

I didn't realise you intended me to use your code, I thought it was only meant as "inspiration".
/* Gets input and stuffs it into "buf".  number of characters read, or YY_NULL,
 * is returned in "result".
 */
#ifndef YY_INPUT
#define YY_INPUT(buf,result,max_size) \
   if ( YY_CURRENT_BUFFER_LVALUE->yy_is_interactive ) \
      { \
      int c = '*'; \
      int n; \
      for ( n = 0; n < max_size && \
              (c = getc( yyin )) != EOF && c != '\n'; ++n ) \
         buf[n] = (char) c; \
      if ( c == '\n' ) \
         buf[n++] = (char) c; \
      if ( c == EOF && ferror( yyin ) ) \
         YY_FATAL_ERROR( "input in flex scanner failed" ); \
      result = n; \
      } \
   else \
      { \
      errno=0; \
      while ( (result = fread(buf, 1, max_size, yyin))==0 && ferror(yyin)) \
         { \
         if( errno != EINTR) \
            { \
            YY_FATAL_ERROR( "input in flex scanner failed" ); \
            break; \
            } \
         errno=0; \
         clearerr(yyin); \
         } \
      }\
\
 
#endif

Open in new window

Author

Commented:
The "infinte loop" problem seems to be that the program is waiting for input.

However, this occurs even if I directly replce the YY_INPUT with my nextSymbol, which doesn't make any sense to me. Trying to step into nextSymbol with gdb "fails" because the program simply stops and waits for me to send input with stdin.

Author

Commented:
This question has strayed quite a bit from it's topic :)
I'm accepting your original reply, and have started a new more accurate topic here:

https://www.experts-exchange.com/index.jsp?qid=24536755
CERTIFIED EXPERT
Top Expert 2009

Commented:
>> Which claims that input() can not be redefined with Flex.

That's correct :) Sorry, I thought this was for lex heh.


>> Instead, I should redefine YY_INPUT, which seems to make sense, looking at the code for YY_INPUT.

That's indeed how you do it in flex. But you seem to have it covered :)


>> I didn't realise you intended me to use your code, I thought it was only meant as "inspiration".

That's what it was intended for indeed ;)  But it should be a fully working sample code (for lex), and if you see it working, you'll probably understand better how it works ... Which is why I suggested running it.

Author

Commented:
I took your code and put it in a file, ran:
lex file.lex
cc lex.yy.c
lex.yy.c:532:24: error: macro "input" passed 1 arguments, but takes just 0
lex.yy.c:1106:28: error: macro "input" passed 1 arguments, but takes just 0
lex.yy.c:1109: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token

So somethings not quite right
CERTIFIED EXPERT
Top Expert 2009

Commented:
>> So somethings not quite right

Probably due to the lex version you're using ... Don't worry about it. Since you're using flex, there's not really a point to spend time on it heh :)
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.