Solved

Piping to lex/bison application

Posted on 2009-04-02
33
624 Views
Last Modified: 2012-05-06
I've written a application using flex and bison to parse some data.

In psuedo code, I'd like to do this:

string A
program Parser
A = Parser(datatostdin)

The purpose of this is to automate parsing of large amounts of data.
I've been thinking that there should be atleast to ways to do this.
A) I could somehow compile the C generated directly into my own C app, instead of compiling it to it's own standalone program
B) I could pipe data to and from the parser

Have you done anything similar, and which way would you then recommend?
Will either of them not work as I think? Are there other ways to accomplish this?
0
Comment
Question by:letharion
  • 19
  • 14
33 Comments
 
LVL 6

Author Comment

by:letharion
Comment Utility
It would appear that popen could be used to accomplish B.
There also seems to be problems with it: http://www.opengroup.org/onlinepubs/007908799/xsh/popen.html
0
 
LVL 53

Accepted Solution

by:
Infinity08 earned 500 total points
Comment Utility
lex defines three macro's : input (for getting the next input character), unput (for putting back a character on the input stream) and output (to write the next output character).
By default, they will read from the yyin file stream, and write to the yyout file stream. These file streams can be overridden to point to non-default file streams.
But, you can also override the 3 macro's mentioned above to make the lexer get its I/O from elsewhere (like from a memory buffer for example).

Here's a basic example (you'll need to improve/extend it, and make it more robust etc.) (it's untested, as I don't have lex available here) :
%{

#include <stdio.h>
 

#undef input

#define input() myInput()

int myInput();
 

#undef unput

#define unput(c) myUnput(c)

void myUnput(int c);

%}
 

%%
 

.    printf("char: %s\n", yytext);
 

%%
 

char buf[1024] = "abc";

char *pos = buf;
 

int myInput() {

  return *pos++;

}
 

void myUnput(int c) {

  *(--pos) = c;

}

Open in new window

0
 
LVL 6

Author Comment

by:letharion
Comment Utility
Ah. Interesting. Then there's atleast three ways ;)

This seems to generally be the best one, if not the simplest to implement.
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
Play around with it a bit (use my code above for example), and see if you get it to work :) If it works, you can apply the same principles in your own code, and thus keep everything in the same executable without any piping or other workarounds :)
0
 
LVL 6

Author Comment

by:letharion
Comment Utility
Yes, that's why I thought it was the best solution :)
Is there any end to your knowledge Infinity? ;)
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
>> Is there any end to your knowledge Infinity? ;)

lol. I'd like to refer to my nickname to answer that ... But I'm afraid there is definitely a limit to my knowledge. Fortunately so, because if there weren't, it would be kind of boring, since there wouldn't be anything to learn any more.

It makes me think of one of my favorite quotes : "As the island of our knowledge grows, so does the shore of our ignorance" (John A. Wheeler)
0
 
LVL 6

Author Comment

by:letharion
Comment Utility
Well said :)
I did a dirty hack for the time being:

popen(echo datatoparse | parser >> File);

Which works well.
Do you mind if I leave this open for a while, until I get to fixing it up and maybe I have further questions on your suggestion?
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
No problem.
0
 
LVL 6

Author Comment

by:letharion
Comment Utility
Mysteriously but not to my surprise, my piping solution stopped working. I figure I should use a "proper" solution instead of fixing up a bad one, so here I am.

I have one question, that I may have understood the answer to previously and now forgotten, or I never realised it was a problem.

How would I interact with these new in/output functions? Reading from a memory buffer is excellent, but how do I tell the lexer where that memory buffer is? Can I cc the lexer to a library, and that gives me a "start" function to call which will accept a pointer to the memory?
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
Since you write the three functions yourself, you can pretty much make them do whatever you want, including specifying the memory buffer of your choice. See the example code I posted.
0
 
LVL 6

Author Comment

by:letharion
Comment Utility
Yes, ofc :) But I have program A that generates the content for the said memory buffer, and program B that runs the lexer. Program B doesn't immediately have access to any memory buffers that belong to program A.

Or am I missing something fundamental about your example?
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
I thought the idea was not to have two separate programs, but to combine them into one ?
If not, you can still use the piping solution, or otherwise transfer the data from one program to the other.
0
 
LVL 6

Author Comment

by:letharion
Comment Utility
Absolutely, that is the idea :) I must have been unclear

That's why I asked "Can I cc the lexer to a library?"

How do I combine them into one?
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
>> That's why I asked "Can I cc the lexer to a library?"

Yes. I already responded to that here : http:#24467125


>> How do I combine them into one?

Whatever you prefer ... You can use the library approach if you want. Or you can integrate it in the existing code, or ...
0
 
LVL 6

Author Comment

by:letharion
Comment Utility
Flex/Bison produces valid C code, so I can just include their work in my regular files.
It's not harder than that I guess. That was probably obvious to me while I was working with the programs, but I just didn't realise that now.
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
>> Flex/Bison produces valid C code, so I can just include their work in my regular files.
>> It's not harder than that I guess.

Indeed :) You just need to provide the proper interface to use the generated lexer/parser, but that's not very complicated.
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 6

Author Comment

by:letharion
Comment Utility
After including the compiled files into the rest of my project, and attempted to run the previous "main()" function now just named "test()" from my project, the parser seems to enter a while(true) {}.
I also get:
PFtIF.lex(8): warning: statement is unreachable
PFtIF.lex(9): warning: statement is unreachable
PFtIF.lex(10): warning: statement is unreachable
PFtIF.lex(11): warning: statement is unreachable
PFtIF.lex(12): warning: statement is unreachable
on the below code.

I started digging in lex.yy.c to figure out what was happening, I post the relevant lines below too.
I tried following the execution path, but I'm not sure what happens. yy_act becomes 8, which doesn't make any sense?
#define INITIAL 0

#define YY_END_OF_BUFFER 8

#define YY_STATE_EOF(state) (YY_END_OF_BUFFER + state + 1)
 
 

#define YY_USER_ACTION

#endif
 

/* Code executed at the end of each rule. */

#ifndef YY_BREAK

#define YY_BREAK break;

#endif
 

#define YY_RULE_SETUP \

   YY_USER_ACTION
 

/** The main scanner function which does all the work.

 */

YY_DECL

{

   register yy_state_type yy_current_state;

   register char *yy_cp, *yy_bp;

   register int yy_act;
 

#line 6 "PFtIF.lex"
 

#line 648 "lex.yy.c"
 

   if ( !(yy_init) )

      {

      (yy_init) = 1;
 

#ifdef YY_USER_INIT

      YY_USER_INIT;

#endif
 

      if ( ! (yy_start) )

         (yy_start) = 1;   /* first start state */
 

      if ( ! yyin )

         yyin = stdin;
 

      if ( ! yyout )

         yyout = stdout;
 

      if ( ! YY_CURRENT_BUFFER ) {

         yyensure_buffer_stack ();

         YY_CURRENT_BUFFER_LVALUE =

            yy_create_buffer(yyin,YY_BUF_SIZE );

      }
 

      yy_load_buffer_state( );

      }
 

   while ( 1 ) {  /* loops until end-of-file is reached */

      yy_cp = (yy_c_buf_p);
 

      /* Support of yytext. */

      *yy_cp = (yy_hold_char);
 

      /* yy_bp points to the position in yy_ch_buf of the start of

       * the current run.

       */

      yy_bp = yy_cp;
 

      yy_current_state = (yy_start);

yy_match:

      do {

         register YY_CHAR yy_c = yy_ec[YY_SC_TO_UI(*yy_cp)];

         if ( yy_accept[yy_current_state] ) {

            (yy_last_accepting_state) = yy_current_state;

            (yy_last_accepting_cpos) = yy_cp;

         }

         while ( yy_chk[yy_base[yy_current_state] + yy_c] != yy_current_state ) {

            yy_current_state = (int) yy_def[yy_current_state];

            if ( yy_current_state >= 24 )

               yy_c = yy_meta[(unsigned int) yy_c];

         }

         yy_current_state = yy_nxt[yy_base[yy_current_state] + (unsigned int) yy_c];

         ++yy_cp;

      }

      while ( yy_base[yy_current_state] != 29 );
 

yy_find_action:

      yy_act = yy_accept[yy_current_state];

      if ( yy_act == 0 )

         { /* have to back up */

         yy_cp = (yy_last_accepting_cpos);

         yy_current_state = (yy_last_accepting_state);

         yy_act = yy_accept[yy_current_state];

         }

      printf("%d ", yy_act);  Hits once, yy_act is 8

      YY_DO_BEFORE_ACTION;
 

do_action:  /* This label is used only to access EOF actions. */

      printf("Test"); fflush(NULL);
 

      switch ( yy_act )

   { /* beginning of action switch */

         case 0: /* must back up */

         printf("0"); fflush(NULL);

         /* undo the effects of YY_DO_BEFORE_ACTION */

         *yy_cp = (yy_hold_char);

         yy_cp = (yy_last_accepting_cpos);

         yy_current_state = (yy_last_accepting_state);

         goto yy_find_action;
 

case 1:

YY_RULE_SETUP

#line 7 "PFtIF.lex"

return RETURN;

   YY_BREAK

case 2:

YY_RULE_SETUP

#line 8 "PFtIF.lex"

yylval.number = atof(yytext + 3); return CONSTANT;

   YY_BREAK

case 3:

YY_RULE_SETUP

#line 9 "PFtIF.lex"

yylval.number = atof(yytext); return OPERAND;

   YY_BREAK

case 4:

YY_RULE_SETUP

#line 10 "PFtIF.lex"

yylval.number = atof(yytext); return OPERAND;

   YY_BREAK

case 5:

YY_RULE_SETUP

#line 11 "PFtIF.lex"

yylval.number = atof(yytext); return OPERATOR;

   YY_BREAK

case 6:

/* rule 6 can match eol */

YY_RULE_SETUP

case 7:

YY_RULE_SETUP

#line 13 "PFtIF.lex"

ECHO;

   YY_BREAK

#line 767 "lex.yy.c"

case YY_STATE_EOF(INITIAL):

   yyterminate();
 

   case YY_END_OF_BUFFER:

      {

      /* Amount of text matched not including the EOB char. */

      int yy_amount_of_matched_text = (int) (yy_cp - (yytext_ptr)) - 1;
 

      /* Undo the effects of YY_DO_BEFORE_ACTION. */

      *yy_cp = (yy_hold_char);

      YY_RESTORE_YY_MORE_OFFSET

Open in new window

0
 
LVL 6

Author Comment

by:letharion
Comment Utility
Hmm, if I don't include lex.yy.c, everything compiles fine, but I get "undefined symbol: _Z5yylexv", during runtime.

If I do include the file, I get the "statement unrechable" warnings.
0
 
LVL 6

Author Comment

by:letharion
Comment Utility
That last post can probably be regarded as irrelevant, but got me thinking, what/which files are supposed to be included?
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
In the code you posted, the '#endif' on line 7 - what #if does it correspond to ?


>> PFtIF.lex(8): warning: statement is unreachable

I assume PFtIF.lex is the code you posted ? Does the line 8 in the error message correspond to the line 8 in the code you posted ? If not, which line does it correspond to ?
0
 
LVL 6

Author Comment

by:letharion
Comment Utility
I'm gonna try to find the conditions that prompt the error message, but meanwhile, here is the code generated.
PFtIF.tgz.zip
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
It would be more useful if you could actually post the complete PFtIF.lex that caused that error.

But in any case, the reason for the unreachable statement warnings is because of this generated code where you have break's after a return statement (ie. the break can never be reached) :
case 1:

YY_RULE_SETUP

#line 7 "PFtIF.lex"

return RETURN;

	YY_BREAK

case 2:

YY_RULE_SETUP

#line 8 "PFtIF.lex"

yylval.number = atof(yytext + 3); return CONSTANT;

	YY_BREAK

case 3:

YY_RULE_SETUP

#line 9 "PFtIF.lex"

yylval.number = atof(yytext); return OPERAND;

	YY_BREAK

case 4:

YY_RULE_SETUP

#line 10 "PFtIF.lex"

yylval.number = atof(yytext); return OPERAND;

	YY_BREAK

case 5:

YY_RULE_SETUP

#line 11 "PFtIF.lex"

yylval.number = atof(yytext); return OPERATOR;

	YY_BREAK

Open in new window

0
 
LVL 6

Author Comment

by:letharion
Comment Utility
Well, you pretty much have it there :)
But here it is in it's entirety

%{
#include <stdio.h>
#include "PFtIF.tab.h"
%}

%%
99                         return RETURN;
10\ [+-]?[0-9]*"."[0-9]+   yylval.number = atof(yytext + 3); return CONSTANT;
[1-9][0-9][0-9]            yylval.number = atof(yytext); return OPERAND;
[7-9][0-9]                 yylval.number = atof(yytext); return OPERAND;
[0-6][0-9]*                yylval.number = atof(yytext); return OPERATOR;
[ \t\n]                    /* Ignore */;
%%

Doh!
ofc, as usual.
Hmm, nothing to worry about the most likely, but if it's not a bug, then why are the breaks there?
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
First of all, you probably want to put {}'s around those statements, like :

        10\ [+-]?[0-9]*"."[0-9]+   { yylval.number = atof(yytext + 3); return CONSTANT; }


>> then why are the breaks there?

To make sure that the different case statements are separated. Some compilers (especially C++ compilers) warn about unreachable statements. You can get rid of the warnings by re-defining YY_BREAK to nothing :

        #define YY_BREAK

But if you do that, then make absolutely sure that each rule ends in either a return or a break statement !!
0
 
LVL 6

Author Comment

by:letharion
Comment Utility
The way I understand things, in the below code the function nextSymbol() should be called repeatedly to feed data to the lexer.

I call lexerMain, which calls yyparse(), which I thought somewhere along the line would cause nextSymbol() to be called. However, this never happens.
The output is only "Lexer was called", and then the program apperas to get stuck in an eternal loop.
I have yet to determine where exactly that problem is.

Have I done some obviously wrong with the code below, or is that just to little data to tell?
#undef input

#define input() nextSymbol()
 

int nextSymbol() {

   printf("nextSymbol was called");

   fflush(NULL);

   return 0;

}
 

int lexerMain(float* prog) {

   printf("Lexar was called\n");

   progArray = prog;

   tos = stack; /* tos points to the top of stack */

   sp = stack; /* initialize sp */

   yyparse();

   return 0;

}

Open in new window

0
 
LVL 6

Author Comment

by:letharion
Comment Utility
Looking closer at the source, it appears that I must re-define yyin:

#ifdef YY_STDINIT
    yyin = stdin;
    yyout = stdout;
#else
    yyin = (FILE *) 0;
    yyout = (FILE *) 0;
#endif
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
Did you check the code I posted in http:#24047760 ? Does it work for you ? If it does, then continue reading. If not, then let's work on that first  (start by explaining how it doesn't work) ...


>> Have I done some obviously wrong with the code below, or is that just to little data to tell?

Did you put this code in the correct location(s), as in the example I mentioned ?
0
 
LVL 6

Author Comment

by:letharion
Comment Utility
I think the problem is that you are using lex, while I'm using flex. (Which is my bad, as I have clearly stated in the question that I'm using lex)

I'm currently reading:
http://books.google.se/books?id=YrzpxNYegEkC&pg=PA155&lpg=PA155&dq=redefining+flex+input&source=bl&ots=shHPMoXmdb&sig=6UoZubcG5ub9S_1VID6C6zzuq5s&hl=sv&ei=rHdLStS6CcHz_AaI0eCFCQ&sa=X&oi=book_result&ct=result&resnum=1
Which claims that input() can not be redefined with Flex.

Instead, I should redefine YY_INPUT, which seems to make sense, looking at the code for YY_INPUT.

I didn't realise you intended me to use your code, I thought it was only meant as "inspiration".
/* Gets input and stuffs it into "buf".  number of characters read, or YY_NULL,

 * is returned in "result".

 */

#ifndef YY_INPUT

#define YY_INPUT(buf,result,max_size) \

   if ( YY_CURRENT_BUFFER_LVALUE->yy_is_interactive ) \

      { \

      int c = '*'; \

      int n; \

      for ( n = 0; n < max_size && \

              (c = getc( yyin )) != EOF && c != '\n'; ++n ) \

         buf[n] = (char) c; \

      if ( c == '\n' ) \

         buf[n++] = (char) c; \

      if ( c == EOF && ferror( yyin ) ) \

         YY_FATAL_ERROR( "input in flex scanner failed" ); \

      result = n; \

      } \

   else \

      { \

      errno=0; \

      while ( (result = fread(buf, 1, max_size, yyin))==0 && ferror(yyin)) \

         { \

         if( errno != EINTR) \

            { \

            YY_FATAL_ERROR( "input in flex scanner failed" ); \

            break; \

            } \

         errno=0; \

         clearerr(yyin); \

         } \

      }\

\
 

#endif

Open in new window

0
 
LVL 6

Author Comment

by:letharion
Comment Utility
The "infinte loop" problem seems to be that the program is waiting for input.

However, this occurs even if I directly replce the YY_INPUT with my nextSymbol, which doesn't make any sense to me. Trying to step into nextSymbol with gdb "fails" because the program simply stops and waits for me to send input with stdin.
0
 
LVL 6

Author Comment

by:letharion
Comment Utility
This question has strayed quite a bit from it's topic :)
I'm accepting your original reply, and have started a new more accurate topic here:

http://www.experts-exchange.com/index.jsp?qid=24536755
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
>> Which claims that input() can not be redefined with Flex.

That's correct :) Sorry, I thought this was for lex heh.


>> Instead, I should redefine YY_INPUT, which seems to make sense, looking at the code for YY_INPUT.

That's indeed how you do it in flex. But you seem to have it covered :)


>> I didn't realise you intended me to use your code, I thought it was only meant as "inspiration".

That's what it was intended for indeed ;)  But it should be a fully working sample code (for lex), and if you see it working, you'll probably understand better how it works ... Which is why I suggested running it.
0
 
LVL 6

Author Comment

by:letharion
Comment Utility
I took your code and put it in a file, ran:
lex file.lex
cc lex.yy.c
lex.yy.c:532:24: error: macro "input" passed 1 arguments, but takes just 0
lex.yy.c:1106:28: error: macro "input" passed 1 arguments, but takes just 0
lex.yy.c:1109: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token

So somethings not quite right
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
>> So somethings not quite right

Probably due to the lex version you're using ... Don't worry about it. Since you're using flex, there's not really a point to spend time on it heh :)
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

The purpose of this article is to fix the unknown display problem in Linux Mint operating system. After installing the OS if you see Display monitor is not recognized then we can install "MESA" utilities to fix this problem or we can install additio…
Displaying an arrayList in a listView using the default adapter is rarely the best solution. To get full control of your display data, and to be able to refresh it after editing, requires the use of a custom adapter.
The goal of this video is to provide viewers with basic examples to understand opening and reading files in the C programming language.
The goal of this video is to provide viewers with basic examples to understand how to create, access, and change arrays in the C programming language.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now