unknown_
asked on
pass code to lexer
Hello,
How can I tweak the code [code snippet] in order to be able to read an ascii code/text and therefore pass the code/text to the lexer in order to return back a token on each call ?
At the moment it has a function which reads a file.
Thanks in advance for any help !!!
How can I tweak the code [code snippet] in order to be able to read an ascii code/text and therefore pass the code/text to the lexer in order to return back a token on each call ?
At the moment it has a function which reads a file.
Thanks in advance for any help !!!
Not sure what you are asking. Can you clarify?
What do you mean "read an ascii code/text" ?
By default, lex/flex reads from the FILE * yyin, which is set to STDIN. Are you able to run your lexer on a file or on standard console input?
mylexer < testfile.txt
What do you mean "read an ascii code/text" ?
By default, lex/flex reads from the FILE * yyin, which is set to STDIN. Are you able to run your lexer on a file or on standard console input?
mylexer < testfile.txt
ASKER
I mean how my lexer can actually read the input file and do the appropriate tokenization ??
cuz at the moment the code describes the lexical elements of a c-like language
cuz at the moment the code describes the lexical elements of a c-like language
You need a main() that calls the lexer.
Have you learned about yylex() yet? yylex() is how the lexer is called. If you are working towards a full compiler, eventually you won't call yylex() directly, but for now, you need to.
Where is your main program?
If you don't write an explicit main, lex/flex will provide one for you, which will simply call the lexer once and return.
Have you even compiled this yet?
I ran it through flex and gcc and it has C syntax errors.
Have you learned about yylex() yet? yylex() is how the lexer is called. If you are working towards a full compiler, eventually you won't call yylex() directly, but for now, you need to.
Where is your main program?
If you don't write an explicit main, lex/flex will provide one for you, which will simply call the lexer once and return.
Have you even compiled this yet?
I ran it through flex and gcc and it has C syntax errors.
Your section of C code in between the %{ %} delimiters won't compile in its current form. You are missing a closing bracket for your read_file() function.
I can't test the rest of it because I don't have your token declarations. Where are your tokens declared (like PLUS, MINUS, K_IF, K_ELSE) ? Normally you declare/define them in your parser grammar (yacc/bison) and use #include in your lex grammar to pull them in. If you aren't using a parser grammar yet, then you need to define them otherwise.
I can't test the rest of it because I don't have your token declarations. Where are your tokens declared (like PLUS, MINUS, K_IF, K_ELSE) ? Normally you declare/define them in your parser grammar (yacc/bison) and use #include in your lex grammar to pull them in. If you aren't using a parser grammar yet, then you need to define them otherwise.
ASKER
I haven't started working on the yacc/bison yet, but it should be something like that, right ?
extern "C"
{
int yyparse(void);
int yylex(void);
int yywrap()
{
return 1;
}
}
extern int yydebug;
main()
{
yydebug=1;
yyparse();
}
statement:
expression
| VARIABLE '=' expression
;
expression: INTEGER
|
VARIABLE
|
exp '+' exp
|
exp '-' exp
|
exp '*' exp
|
exp '/' exp
| '(' expression ')'
;
operators_punctuation: '+'
| '-'
| '*'
| '('
| ')'
| ','
| ';'
| '='
| '/'
| '%'
| '||'
| '|'
| '&&'
| '&'
;
Yes, but back to my original response, you need to declare your tokens to compile your lexer. Right now, your lexer does not compile. Just because you can run lex/flex on it does not mean it generated a valid C program. You have to compile the program that lex generates.
lex lex.l
cc lex.yy.c
I recommend taking a step back, focus on fixing your lexer properly, without any fancy stuff like opening a file, etc. and just make it work. I think you get ahead of yourself by adding tokens to your grammar before your code compiles. If you have a grammar with 1 token, and invalid C code, its not worth anything except maybe to satisfy your professor's visual check. :)
I would start by declaring all of your tokens as #define in your lexer grammar, like so. Until you declare your tokens, your lexer will never compile into a valid executable program.
lex lex.l
cc lex.yy.c
I recommend taking a step back, focus on fixing your lexer properly, without any fancy stuff like opening a file, etc. and just make it work. I think you get ahead of yourself by adding tokens to your grammar before your code compiles. If you have a grammar with 1 token, and invalid C code, its not worth anything except maybe to satisfy your professor's visual check. :)
I would start by declaring all of your tokens as #define in your lexer grammar, like so. Until you declare your tokens, your lexer will never compile into a valid executable program.
%{
#define PLUS 1
#define MINUS 2
// ...
#define K_IF 100
#define K_ELSE 101
%}
ASKER
So if i interpreted correctly your response you meant something like that, right ?
if so, the lexer at the moment in order to be complete it needs the yylex or not ? :s
if so, the lexer at the moment in order to be complete it needs the yylex or not ? :s
%{
#define COMMENT 1
#define VARIABLE 2
#define INTEGER 3
#define FLOAT 4
#define STRING 5
#define K_IF 6
#define K_ELSE 7
#define K_WHILE 8
#define K_INT 9
#define K_VOID 10
#define K_RETURN 11
#define K_FLOAT 12
#define PLUS 13
#define MINUS 14
#define TIMES 15
#define SLASH 16
#define LPAREN 17
#define RPAREN 18
#define SEMICOLON 19
#define COMMA 20
#define EQL 21
#define OR 22
#define OR2 23
#define AND 24
#define AND2 25
%}
LETTER [a-zA-Z_]
DIGIT [0-9]
LETTERDIGIT [a-zA-Z0-9_]
SIGN [-+]
STRINGCONSTANT \"[^"\n]*["\n]
CHARCONSTANT \'[^'\n]*\'
RANKSPEC \[[,]*\]
INTEGER {digit}+
VARIABLE [a-z_]({LETTERDIGIT})*
%%
"+" { return PLUS; }
"-" { return MINUS; }
"*" { return TIMES; }
"/" { return SLASH; }
"(" { return LPAREN; }
")" { return RPAREN; }
";" { return SEMICOLON; }
"," { return COMMA; }
"=" { return EQL; }
"|" { return OR; }
"||" { return OR2; }
"&" { return AND; }
"&&" { return AND2; }
"if" { return K_IF; }
"else" { return K_ELSE; }
"do" { return K_DO; }
"int" { return K_INT; }
"return" { return K_RETURN; }
"void" { return K_VOID; }
"float" { return K_FLOAT; }
"while" { return WHILESYM; }
{LETTER}{LETTERDIGIT}* {
yylval.id = new Identifier(yytext);
yylval.id->line = line;
return(IDENTIFIER);
}
{VARIABLE}* {
yylval.lit = new Liter(yytext);
yylval.lit->type = t_var;
return(LITERAL);
}
{SIGN}?{DIGIT}+"."{DIGIT}+ {
yylval.lit = new Literal(yytext);
yylval.lit->type = t_float;
return(LITERAL);
}
{SIGN}?{DIGIT}+ {
yylval.lit = new Literal(yytext);
yylval.lit->type = t_int32;
return(LITERAL);
}
{STRINGCONSTANT} {
yylval.lit = new Literal(yytext);
yylval.lit->type = t_string;
return(LITERAL);
}
[ \t\n\r] /* skip whitespace */
. { printf("Unknown character [%c]\n",yytext[0]);
return UNKNOWN; }
%%
int yywrap(void){return 1;}
>So if i interpreted correctly your response you meant something like that, right ?
Yes, you are now on the right track. Now what happens when you generate your C program and try to compile it? Did you try it? You'll see that you didn't declare LITERAL and IDENTIFIER tokens.
>if so, the lexer at the moment in order to be complete it needs the yylex or not ? :s
Just so you are clear, your lexer IS yylex(), however you do need to call it explicitly somewhere from a main if you want to consume and print tokens. This could be in your main()
int token;
while(token = yylex()) {
printf("lexed token: %d\n", token);
}
Yes, you are now on the right track. Now what happens when you generate your C program and try to compile it? Did you try it? You'll see that you didn't declare LITERAL and IDENTIFIER tokens.
>if so, the lexer at the moment in order to be complete it needs the yylex or not ? :s
Just so you are clear, your lexer IS yylex(), however you do need to call it explicitly somewhere from a main if you want to consume and print tokens. This could be in your main()
int token;
while(token = yylex()) {
printf("lexed token: %d\n", token);
}
Also, please look at the snippet below that is in your program. This was a sample I gave you in another question, but you added it with no changes to your grammar and expected it to work. You cannot do that. My sample had types such as Identifier, Literal, t_literal, t_float, t_string. You haven't declared/defined those types. Remove all that code for now, so we can fix the lexer. Just leave empty rules with a token return value.
Make SURE to declare your TOKENS! :)
Make SURE to declare your TOKENS! :)
{LETTER}{LETTERDIGIT}* {
yylval.id = new Identifier(yytext); <-- REMOVE LINE
yylval.id->line = line; <-- REMOVE LINE
return(IDENTIFIER);
}
{VARIABLE}* {
yylval.lit = new Liter(yytext); <-- REMOVE LINE
yylval.lit->type = t_var; <-- REMOVE LINE
return(LITERAL);
}
{SIGN}?{DIGIT}+"."{DIGIT}+ {
yylval.lit = new Literal(yytext); <-- REMOVE LINE
yylval.lit->type = t_float; <-- REMOVE LINE
return(LITERAL);
}
{SIGN}?{DIGIT}+ {
yylval.lit = new Literal(yytext); <-- REMOVE LINE
yylval.lit->type = t_int32; <-- REMOVE LINE
return(LITERAL);
}
{STRINGCONSTANT} {
yylval.lit = new Literal(yytext); <-- REMOVE LINE
yylval.lit->type = t_string; <-- REMOVE LINE
return(LITERAL);
}
ASKER
what about now ?
%{
#define COMMENT 1
#define VARIABLE 2
#define INTEGER 3
#define FLOAT 4
#define STRING 5
#define K_IF 6
#define K_ELSE 7
#define K_WHILE 8
#define K_INT 9
#define K_VOID 10
#define K_RETURN 11
#define K_FLOAT 12
#define PLUS 13
#define MINUS 14
#define TIMES 15
#define SLASH 16
#define LPAREN 17
#define RPAREN 18
#define SEMICOLON 19
#define COMMA 20
#define EQL 21
#define OR 22
#define OR2 23
#define AND 24
#define AND2 25
#define LITERAL 26
#define IDENTIFIER 27
%}
LETTER [a-zA-Z_]
DIGIT [0-9]
LETTERDIGIT [a-zA-Z0-9_]
SIGN [-+]
STRINGCONSTANT \"[^"\n]*["\n]
CHARCONSTANT \'[^'\n]*\'
RANKSPEC \[[,]*\]
INTEGER {digit}+
VARIABLE [a-z_]({LETTERDIGIT})*
COMMENT "/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/"
%%
"+" { return PLUS; }
"-" { return MINUS; }
"*" { return TIMES; }
"/" { return SLASH; }
"(" { return LPAREN; }
")" { return RPAREN; }
";" { return SEMICOLON; }
"," { return COMMA; }
"=" { return EQL; }
"|" { return OR; }
"||" { return OR2; }
"&" { return AND; }
"&&" { return AND2; }
"if" { return K_IF; }
"else" { return K_ELSE; }
"do" { return K_DO; }
"int" { return K_INT; }
"return" { return K_RETURN; }
"void" { return K_VOID; }
"float" { return K_FLOAT; }
"while" { return WHILESYM; }
{LETTER}{LETTERDIGIT}* {
yylval.id = new Identifier(yytext);
yylval.id->line = line;
return(IDENTIFIER);
}
{VARIABLE}* {
yylval.lit = new Liter(yytext);
yylval.lit->type = t_var;
return(LITERAL);
}
{COMMENT} {
yylval.lit = new Liter(yytext);
yylval.lit->type = t_comment;
return(LITERAL);
}
{SIGN}?{DIGIT}+"."{DIGIT}+ {
yylval.lit = new Literal(yytext);
yylval.lit->type = t_float;
return(LITERAL);
}
{SIGN}?{DIGIT}+ {
yylval.lit = new Literal(yytext);
yylval.lit->type = t_int32;
return(LITERAL);
}
{STRINGCONSTANT} {
yylval.lit = new Literal(yytext);
yylval.lit->type = t_string;
return(LITERAL);
}
[ \t\n\r] /* skip whitespace */
. { printf("Unknown character [%c]\n",yytext[0]);
return UNKNOWN; }
%%
int main(void){
int token;
while(token = yylex()) {
printf("lexed token: %d\n", token);
}
int yywrap(void){return 1;}
ASKER
[skip my previous post]
ok i removed them !
ok i removed them !
%{
#define COMMENT 1
#define VARIABLE 2
#define INTEGER 3
#define FLOAT 4
#define STRING 5
#define K_IF 6
#define K_ELSE 7
#define K_WHILE 8
#define K_INT 9
#define K_VOID 10
#define K_RETURN 11
#define K_FLOAT 12
#define PLUS 13
#define MINUS 14
#define TIMES 15
#define SLASH 16
#define LPAREN 17
#define RPAREN 18
#define SEMICOLON 19
#define COMMA 20
#define EQL 21
#define OR 22
#define OR2 23
#define AND 24
#define AND2 25
#define LITERAL 26
#define IDENTIFIER 27
%}
LETTER [a-zA-Z_]
DIGIT [0-9]
LETTERDIGIT [a-zA-Z0-9_]
SIGN [-+]
STRINGCONSTANT \"[^"\n]*["\n]
CHARCONSTANT \'[^'\n]*\'
RANKSPEC \[[,]*\]
INTEGER {digit}+
VARIABLE [a-z_]({LETTERDIGIT})*
COMMENT "/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/"
%%
"+" { return PLUS; }
"-" { return MINUS; }
"*" { return TIMES; }
"/" { return SLASH; }
"(" { return LPAREN; }
")" { return RPAREN; }
";" { return SEMICOLON; }
"," { return COMMA; }
"=" { return EQL; }
"|" { return OR; }
"||" { return OR2; }
"&" { return AND; }
"&&" { return AND2; }
"if" { return K_IF; }
"else" { return K_ELSE; }
"do" { return K_DO; }
"int" { return K_INT; }
"return" { return K_RETURN; }
"void" { return K_VOID; }
"float" { return K_FLOAT; }
"while" { return WHILESYM; }
{LETTER}{LETTERDIGIT}* {
return(IDENTIFIER);
}
{VARIABLE}* {
return(LITERAL);
}
{COMMENT} {
return(LITERAL);
}
{SIGN}?{DIGIT}+"."{DIGIT}+ {
return(LITERAL);
}
{SIGN}?{DIGIT}+ {
return(LITERAL);
}
{STRINGCONSTANT} {
return(LITERAL);
}
[ \t\n\r] /* skip whitespace */
. { printf("Unknown character [%c]\n",yytext[0]);
return UNKNOWN; }
%%
int main(void){
int token;
while(token = yylex()) {
printf("lexed token: %d\n", token);
}
int yywrap(void){return 1;}
ASKER
The integer rule could be :
{SIGN}?{DIGIT}+ {
printf("INTEGER\n"); sscanf(yytext,"%d", &(yyval.value));
return(LITERAL);
}
Is that right ?
{SIGN}?{DIGIT}+ {
printf("INTEGER\n"); sscanf(yytext,"%d", &(yyval.value));
return(LITERAL);
}
Is that right ?
Have you tried compiling your lexer yet?
ASKER
i haven't tried yet cuz i don't know how to do so on mac osx :s
%{
#define COMMENT 1
#define VARIABLE 2
#define INTEGER 3
#define FLOAT 4
#define STRING 5
#define T_IF 6
#define T_ELSE 7
#define T_WHILE 8
#define T_INT 9
#define T_VOID 10
#define T_RETURN 11
#define T_FLOAT 12
#define PLUS 13
#define MINUS 14
#define TIMES 15
#define SLASH 16
#define LPAREN 17
#define RPAREN 18
#define SEMICOLON 19
#define COMMA 20
#define EQL 21
#define OR 22
#define OR2 23
#define AND 24
#define AND2 25
#define LITERAL 26
#define IDENTIFIER 27
%}
LETTER [a-zA-Z_]
DIGIT [0-9]
LETTERDIGIT [a-zA-Z0-9_]
SIGN [-+]
STRINGCONSTANT \"[^"\n]*["\n]
CHARCONSTANT \'[^'\n]*\'
RANKSPEC \[[,]*\]
INTEGER {digit}+
VARIABLE [a-z_]({LETTERDIGIT})*
COMMENT "/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/"
%%
"+" { return PLUS; }
"-" { return MINUS; }
"*" { return TIMES; }
"/" { return SLASH; }
"(" { return LPAREN; }
")" { return RPAREN; }
";" { return SEMICOLON; }
"," { return COMMA; }
"=" { return EQL; }
"|" { return OR; }
"||" { return OR2; }
"&" { return AND; }
"&&" { return AND2; }
"if" { return T_IF; }
"else" { return T_ELSE; }
"do" { return T_DO; }
"int" { return T_INT; }
"return" { return T_RETURN; }
"void" { return T_VOID; }
"float" { return T_FLOAT; }
"while" { return T_WHILE; }
{LETTER}{LETTERDIGIT}* {
return(IDENTIFIER);
}
{VARIABLE}* {
printf("VARIABLE\n"); yylval.lexeme=(char*)malloc(yyleng+1);
strcpy(yyval.lexeme, yytext);
/*return T_VARIABLE*/
return(LITERAL);
}
{COMMENT} {
return(LITERAL);
}
{SIGN}?{DIGIT}+"."{DIGIT}+ {
printf("FLOAT\n"); sscanf(yytext,"%d", &(yyval.value));
return(LITERAL);
}
{SIGN}?{DIGIT}+ {
printf("INTEGER\n"); sscanf(yytext,"%d", &(yyval.value));
return(LITERAL);
}
{STRINGCONSTANT} {
return(LITERAL);
}
[ \t\n\r] /* skip whitespace */
. { printf("Unknown character [%c]\n",yytext[0]);
return UNKNOWN; }
%%
int main(void){
int token;
while(token = yylex()) {
printf("lexed token: %d\n", token);
}
int yywrap(void){return 1;}
Doesn't OS X have standard lex, or flex? gcc? If you want much more help from me you better figure out how to run your tools, or switch OS ;).
Try this:
> lex mygrammar.l
It should produce an output file, look for a .c file
Then do:
> cc lex.yy.c
Try this:
> lex mygrammar.l
It should produce an output file, look for a .c file
Then do:
> cc lex.yy.c
ASKER
i did the lex mygrammar.l but it returned back four errors:
mygrammar.l:115: bad character: }
mygrammar.l:117: name defined twice
mygrammar.l:119: bad character: }
mygrammar.l:120: premature EOF
Why do i get these errors ? :S
mygrammar.l:115: bad character: }
mygrammar.l:117: name defined twice
mygrammar.l:119: bad character: }
mygrammar.l:120: premature EOF
Why do i get these errors ? :S
%{
#define COMMENT 1
#define VARIABLE 2
#define INTEGER 3
#define FLOAT 4
#define STRING 5
#define T_IF 6
#define T_ELSE 7
#define T_WHILE 8
#define T_INT 9
#define T_VOID 10
#define T_RETURN 11
#define T_FLOAT 12
#define PLUS 13
#define MINUS 14
#define TIMES 15
#define SLASH 16
#define LPAREN 17
#define RPAREN 18
#define SEMICOLON 19
#define COMMA 20
#define EQL 21
#define OR 22
#define OR2 23
#define AND 24
#define AND2 25
#define LITERAL 26
#define IDENTIFIER 27
%}
LETTER [a-zA-Z_]
DIGIT [0-9]
LETTERDIGIT [a-zA-Z0-9_]
SIGN [-+]
STRINGCONSTANT \"[^"\n]*["\n]
CHARCONSTANT \'[^'\n]*\'
RANKSPEC \[[,]*\]
INTEGER {digit}+
VARIABLE [a-z_]({LETTERDIGIT})*
COMMENT "/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/"
%%
"+" { return PLUS; }
"-" { return MINUS; }
"*" { return TIMES; }
"/" { return SLASH; }
"(" { return LPAREN; }
")" { return RPAREN; }
";" { return SEMICOLON; }
"," { return COMMA; }
"=" { return EQL; }
"|" { return OR; }
"||" { return OR2; }
"&" { return AND; }
"&&" { return AND2; }
"if" { return T_IF; }
"else" { return T_ELSE; }
"do" { return T_DO; }
"int" { return T_INT; }
"return" { return T_RETURN; }
"void" { return T_VOID; }
"float" { return T_FLOAT; }
"while" { return T_WHILE; }
{LETTER}{LETTERDIGIT}* {
return(IDENTIFIER);
}
{VARIABLE}* {
printf("VARIABLE\n"); yylval.lexeme=(char*)malloc(yyleng+1);
strcpy(yyval.lexeme, yytext);
/*return T_VARIABLE*/
return(LITERAL);
}
{COMMENT} {
return(LITERAL);
}
{SIGN}?{DIGIT}+"."{DIGIT}+ {
printf("FLOAT\n"); sscanf(yytext,"%d", &(yyval.value));
return(LITERAL);
}
{SIGN}?{DIGIT}+ {
printf("INTEGER\n"); sscanf(yytext,"%d", &(yyval.value));
return(LITERAL);
}
{STRINGCONSTANT} {
return(LITERAL);
}
[ \t\n\r] /* skip whitespace */
. { printf("Unknown character [%c]\n",yytext[0]);
return UNKNOWN; }
%%
int main(void){
int token;
while(token = yylex()) {
printf("lexed token: %d\n", token);
}
}
int yywrap(void){
return 1;
}
Your formatting is messed up.
Your %% needs to be at the beginning of a line, for one.
Your %% needs to be at the beginning of a line, for one.
Your formatting is really important in a grammar like this.
You should not have whitespace (space or tables) at the beginning of a line with a regular expression / rule on it.
Edit your whole file and for all rules, take out whitespace, like this:
I did it on my local copy and lex compiles it.
You should not have whitespace (space or tables) at the beginning of a line with a regular expression / rule on it.
Edit your whole file and for all rules, take out whitespace, like this:
I did it on my local copy and lex compiles it.
You have:
LETTER [a-zA-Z_]
DIGIT [0-9]
Fix as:
LETTER [a-zA-Z_]
DIGIT [0-9]
You have:
{LETTER}{LETTERDIGIT}* {
return(IDENTIFIER);
}
Fix as:
{LETTER}{LETTERDIGIT}* {
return(IDENTIFIER);
}
When I said tables I meant tabs
ASKER
ok now the lex mygrammar.l works!! but the cc lex.yy.c returns multiple errors:
mygrammar.l: In function yylex:
mygrammar.l:80: error: yylval undeclared (first use in this function)
mygrammar.l:80: error: (Each undeclared identifier is reported only once
mygrammar.l:80: error: for each function it appears in.)
mygrammar.l:81: error: yyval undeclared (first use in this function)
mygrammar.l: At top level:
mygrammar.l:118: error: syntax error before % token
About the yylval and yyval i tried defining them as #define YYVAL but it didn't work
mygrammar.l: In function yylex:
mygrammar.l:80: error: yylval undeclared (first use in this function)
mygrammar.l:80: error: (Each undeclared identifier is reported only once
mygrammar.l:80: error: for each function it appears in.)
mygrammar.l:81: error: yyval undeclared (first use in this function)
mygrammar.l: At top level:
mygrammar.l:118: error: syntax error before % token
About the yylval and yyval i tried defining them as #define YYVAL but it didn't work
%{
#define COMMENT 1
#define VARIABLE 2
#define INTEGER 3
#define FLOAT 4
#define STRING 5
#define T_IF 6
#define T_ELSE 7
#define T_WHILE 8
#define T_INT 9
#define T_VOID 10
#define T_DO 11
#define T_RETURN 12
#define T_FLOAT 13
#define PLUS 14
#define MINUS 15
#define TIMES 16
#define SLASH 17
#define LPAREN 18
#define RPAREN 19
#define SEMICOLON 20
#define COMMA 21
#define EQL 22
#define OR 23
#define OR2 24
#define AND 25
#define AND2 26
#define LITERAL 27
#define IDENTIFIER 28
#define UNKNOWN 29
%}
LETTER [a-zA-Z_]
DIGIT [0-9]
LETTERDIGIT [a-zA-Z0-9_]
SIGN [-+]
STRINGCONSTANT \"[^"\n]*["\n]
CHARCONSTANT \'[^'\n]*\'
RANKSPEC \[[,]*\]
INTEGER {digit}+
VARIABLE [a-z_]({LETTERDIGIT})*
COMMENT "/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/"
%%
"+" { return PLUS; }
"-" { return MINUS; }
"*" { return TIMES; }
"/" { return SLASH; }
"(" { return LPAREN; }
")" { return RPAREN; }
";" { return SEMICOLON; }
"," { return COMMA; }
"=" { return EQL; }
"|" { return OR; }
"||" { return OR2; }
"&" { return AND; }
"&&" { return AND2; }
"if" { return T_IF; }
"else" { return T_ELSE; }
"do" { return T_DO; }
"int" { return T_INT; }
"return" { return T_RETURN; }
"void" { return T_VOID; }
"float" { return T_FLOAT; }
"while" { return T_WHILE; }
{LETTER}{LETTERDIGIT}* {
return(IDENTIFIER);
}
{VARIABLE}* {
printf("VARIABLE\n"); yylval.lexeme=(char*)malloc(yyleng+1);
strcpy(yyval.lexeme, yytext);
/*return T_VARIABLE*/
return(LITERAL);
}
{COMMENT} {
return(LITERAL);
}
{SIGN}?{DIGIT}+"."{DIGIT}+ {
printf("FLOAT\n"); sscanf(yytext,"%d", &(yyval.value));
return(LITERAL);
}
{SIGN}?{DIGIT}+ {
printf("INTEGER\n"); sscanf(yytext,"%d", &(yyval.value));
return(LITERAL);
}
{STRINGCONSTANT} {
return(LITERAL);
}
[ \t\n\r] /* skip whitespace */
. { printf("Unknown character [%c]\n",yytext[0]);
return UNKNOWN; }
%%
int main(void){
int token;
while(token = yylex()) {
printf("lexed token: %d\n", token);
}
}
%%
int yywrap(void){
return 1;
}
After you fix your whitespace indenting, try to compile the grammar to a .c file, then when you try to compile the .c file, you'll see you have missing declarations.
T_DO is not defined as a token.
yyval is undefined, it is actually yylval that you want, but I recommend commenting out any references to yylval until you start integrating your parser with yacc or bison. yylval comes from yacc, not lex, and unless you define your own yylval structure, it won't exist.
I recommend you stop adding or changing code until you can compile the .l file to a .c file, and then compile the .c file to an executable. I have posted the directions, please work through the last 2 posts. I will followup tomorrow.
T_DO is not defined as a token.
yyval is undefined, it is actually yylval that you want, but I recommend commenting out any references to yylval until you start integrating your parser with yacc or bison. yylval comes from yacc, not lex, and unless you define your own yylval structure, it won't exist.
I recommend you stop adding or changing code until you can compile the .l file to a .c file, and then compile the .c file to an executable. I have posted the directions, please work through the last 2 posts. I will followup tomorrow.
The reason I recommend commenting or removing yylval for now, is you are trying to build a LEXER first. So all you want is to convert the strings in your language, to discrete integer tokens. Thats the purpose of a lexer. It has to return the token value to a parser so the parser knows what to do.
"do" converts to T_DO (value 11)
"while" converts to T_WHILE (value 8)
You at least want to be able to run your lexer and have it print out all of the token values.
int token;
while(token = yylex())
printf("TOKEN %d\n", token);
Once you get that far, you have a working lexer, and it is time to move on to the parser.
"do" converts to T_DO (value 11)
"while" converts to T_WHILE (value 8)
You at least want to be able to run your lexer and have it print out all of the token values.
int token;
while(token = yylex())
printf("TOKEN %d\n", token);
Once you get that far, you have a working lexer, and it is time to move on to the parser.
ASKER
i commented out the yylval but i still get the last error :
mygrammar.l:117: error: syntax error before % token
i tried commenting out the int yywrap .... but it returned :
Undefined symbols:
"_yywrap", referenced from:
_yylex in ccPDTz1b.o
_input in ccPDTz1b.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
mygrammar.l:117: error: syntax error before % token
i tried commenting out the int yywrap .... but it returned :
Undefined symbols:
"_yywrap", referenced from:
_yylex in ccPDTz1b.o
_input in ccPDTz1b.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
%{
#define COMMENT 1
#define VARIABLE 2
#define INTEGER 3
#define FLOAT 4
#define STRING 5
#define T_IF 6
#define T_ELSE 7
#define T_WHILE 8
#define T_INT 9
#define T_VOID 10
#define T_DO 11
#define T_RETURN 12
#define T_FLOAT 13
#define PLUS 14
#define MINUS 15
#define TIMES 16
#define SLASH 17
#define LPAREN 18
#define RPAREN 19
#define SEMICOLON 20
#define COMMA 21
#define EQL 22
#define OR 23
#define OR2 24
#define AND 25
#define AND2 26
#define LITERAL 27
#define IDENTIFIER 28
#define UNKNOWN 29
%}
LETTER [a-zA-Z_]
DIGIT [0-9]
LETTERDIGIT [a-zA-Z0-9_]
SIGN [-+]
STRINGCONSTANT \"[^"\n]*["\n]
CHARCONSTANT \'[^'\n]*\'
RANKSPEC \[[,]*\]
INTEGER {digit}+
VARIABLE [a-z_]({LETTERDIGIT})*
COMMENT "/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/"
%%
"+" { return PLUS; }
"-" { return MINUS; }
"*" { return TIMES; }
"/" { return SLASH; }
"(" { return LPAREN; }
")" { return RPAREN; }
";" { return SEMICOLON; }
"," { return COMMA; }
"=" { return EQL; }
"|" { return OR; }
"||" { return OR2; }
"&" { return AND; }
"&&" { return AND2; }
"if" { return T_IF; }
"else" { return T_ELSE; }
"do" { return T_DO; }
"int" { return T_INT; }
"return" { return T_RETURN; }
"void" { return T_VOID; }
"float" { return T_FLOAT; }
"while" { return T_WHILE; }
{LETTER}{LETTERDIGIT}* {
return(IDENTIFIER);
}
{VARIABLE}* {
// printf("VARIABLE\n"); yylval.lexeme=(char*)malloc(yyleng+1);
// strcpy(yyval.lexeme, yytext);
/*return T_VARIABLE*/
return(LITERAL);
}
{COMMENT} {
return(LITERAL);
}
{SIGN}?{DIGIT}+"."{DIGIT}+ {
// printf("FLOAT\n");sscanf(yytext,"%d", &(yyval.value));
return(LITERAL);
}
{SIGN}?{DIGIT}+ {
// printf("INTEGER\n"); sscanf(yytext,"%d", &(yyval.value));
return(LITERAL);
}
{STRINGCONSTANT} {
return(LITERAL);
}
[ \t\n\r] /* skip whitespace */
. { printf("Unknown character [%c]\n",yytext[0]);
return UNKNOWN; }
%%
int main(void){
int token;
while(token = yylex()) {
printf("lexed token: %d\n", token);
}
}
%%
int yywrap(void){return 1;}
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER
ok now it compiles !
Congratulations. You now have a lexer. Now run it and start typing some test strings.
I ran it:
[msmith@vice ~]$ flex grammar.l
[msmith@vice ~]$ gcc lex.yy.c
[msmith@vice ~]$ a.out
[msmith@vice ~]$ ./a.out
if
lexed token: 6
test
lexed token: 28
I ran it:
[msmith@vice ~]$ flex grammar.l
[msmith@vice ~]$ gcc lex.yy.c
[msmith@vice ~]$ a.out
[msmith@vice ~]$ ./a.out
if
lexed token: 6
test
lexed token: 28
Make sure, before proceeding:
1) Make a backup of this file! :) If you get it screwed up in the next phase, you can backtrack at least.
2) Everytime you make a change or significant addition, test compile it to make sure. Don't make large sweeping changes without compiling incrementally, it is easier to fix that way until you become more comfortable with debugging grammar files.
I think you have accomplished your task. If you need help on the next step (parser) then you can open a new question and I'll be happy to help.
1) Make a backup of this file! :) If you get it screwed up in the next phase, you can backtrack at least.
2) Everytime you make a change or significant addition, test compile it to make sure. Don't make large sweeping changes without compiling incrementally, it is easier to fix that way until you become more comfortable with debugging grammar files.
I think you have accomplished your task. If you need help on the next step (parser) then you can open a new question and I'll be happy to help.
ASKER
Open in new window