kuntilanak
asked on
symbol table for a compiler
can you guys help me to find an implementation of a symbol table using a hash table in C ?
ASKER
I sometimes see :: in a C program, what does that basically mean? And creating a parse tree, do I need to do that from a symbol table?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
well I've got part of my parser done.. all I need to do right now is to build up a symbol table, do semantic checking, and type checking. I already have an implementation of a hash table that I used to make in the past days, however I don't know what to modify it with and whether there's a lot to modify.
What kind of tokens/symbols does the language have that you're writing the parser for ?
ASKER
infinity if you remember about that grammar where I asked help for the error handling. Now I am done with that and I need to work on with the symbol table and syntax tree.
For all the others it's basically a c-- language that I am dealing with, it's a subset of c but with much simpler grammar. It produces function declaration, for loop, and while loop.
For all the others it's basically a c-- language that I am dealing with, it's a subset of c but with much simpler grammar. It produces function declaration, for loop, and while loop.
ASKER
to make my question much clearer, I am quite confuse what I am supposed to store in the symbol table. I am pretty sure that it will be a struct that has something inside of it. However, what is inside that struct? So far from brainstorming here's what I had in mind:
1. the type of that variable or function
2. the scope, can be local or global ( I am not really sure whether we need to have 2 symbol table,
one is for the local symbol table and the other one is for the global symbol table)
3. the name of the variable? ? ( I am not really sure on this one, maybe we do need this as we need to
check for duplicates of variable names which is not allowed)
4. ????
I know I am missing something else here, some guidance would help
The hash table needs to have a lookup operation and a delete operation,delete is so that whenever the local scope or a function whose scope is local has been executed and no longer needed anymore. For the lookup function, we need to have several different lookup functions right? Or is it just basically looking up for a struct?
Other doubts about this symbol table and type checking, from my research and examples of code, I see that inside the parser people usually have $$.type or $1.type, this is nice as we need to match each type if it's correct or not. However, how do we manipulate the code so that in every part of the code I can do a $$.type or $1.type.
1. the type of that variable or function
2. the scope, can be local or global ( I am not really sure whether we need to have 2 symbol table,
one is for the local symbol table and the other one is for the global symbol table)
3. the name of the variable? ? ( I am not really sure on this one, maybe we do need this as we need to
check for duplicates of variable names which is not allowed)
4. ????
I know I am missing something else here, some guidance would help
The hash table needs to have a lookup operation and a delete operation,delete is so that whenever the local scope or a function whose scope is local has been executed and no longer needed anymore. For the lookup function, we need to have several different lookup functions right? Or is it just basically looking up for a struct?
Other doubts about this symbol table and type checking, from my research and examples of code, I see that inside the parser people usually have $$.type or $1.type, this is nice as we need to match each type if it's correct or not. However, how do we manipulate the code so that in every part of the code I can do a $$.type or $1.type.
>> infinity if you remember about that grammar where I asked help for the error handling.
Apologies, I didn't recognize the name :)
>> I am pretty sure that it will be a struct that has something inside of it.
symbols are not only variables, but also functions, classes, etc. (having different symbol tables might be a good idea) As you noted, you'll need names, types, scope. Take a look at the output of nm, which gives a good idea of what you can keep in there.
>> For the lookup function, we need to have several different lookup functions right? Or is it just basically looking up for a struct?
I think, generally, you'll want to look up based on the symbol name.
>> However, how do we manipulate the code so that in every part of the code I can do a $$.type or $1.type.
You'll want to make yylvalue a struct that contains (a) a type and (b) a union.
Apologies, I didn't recognize the name :)
>> I am pretty sure that it will be a struct that has something inside of it.
symbols are not only variables, but also functions, classes, etc. (having different symbol tables might be a good idea) As you noted, you'll need names, types, scope. Take a look at the output of nm, which gives a good idea of what you can keep in there.
>> For the lookup function, we need to have several different lookup functions right? Or is it just basically looking up for a struct?
I think, generally, you'll want to look up based on the symbol name.
>> However, how do we manipulate the code so that in every part of the code I can do a $$.type or $1.type.
You'll want to make yylvalue a struct that contains (a) a type and (b) a union.
ASKER
can you show me an example where a struct has a union inside of it?? you mentioned that the lookup is when we found a symbol and a symbol here can be a function as well as integers, so we should have different structs for functions and variables?
ASKER
and for information I have attached the grammar of my compiler in a grammar.y. I've also found this code floating on the web, I think it might help my task much easier, as it almost gives me what I need.
http://www.cs.umu.se/kurser/TDBD05/HT04/Lab/Lab1/symbol.h
http://www.cs.umu.se/kurser/TDBD05/HT04/Lab/Lab1/symbol.c
However I am a little bit dull understanding the codes there, so can you please help me to explain what this does:
1. There's this notation of quad merge, quad list.. what is that?
2. The code includes a mips.h, which I think I won't be actually needing, what parts of the code should
I remove besides removing the include mips.h on top?
3. struct _callgraph *callgraphnode; /* Node in callgraph */ --> what is this for?
I think there are a lot of un-necessary things that I can remove from that code so that it suits my need, It's not going to be a lot of work as most of the things are done, I just need to change the implementation a bit so that it suits what I need for my grammar. If you could help me to give a few list of what I can remove from that long code.. I would appreciate it
grammar.txt
http://www.cs.umu.se/kurser/TDBD05/HT04/Lab/Lab1/symbol.h
http://www.cs.umu.se/kurser/TDBD05/HT04/Lab/Lab1/symbol.c
However I am a little bit dull understanding the codes there, so can you please help me to explain what this does:
1. There's this notation of quad merge, quad list.. what is that?
2. The code includes a mips.h, which I think I won't be actually needing, what parts of the code should
I remove besides removing the include mips.h on top?
3. struct _callgraph *callgraphnode; /* Node in callgraph */ --> what is this for?
I think there are a lot of un-necessary things that I can remove from that code so that it suits my need, It's not going to be a lot of work as most of the things are done, I just need to change the implementation a bit so that it suits what I need for my grammar. If you could help me to give a few list of what I can remove from that long code.. I would appreciate it
grammar.txt
>> can you show me an example where a struct has a union inside of it??
For example :
typedef struct Data {
int type;
union {
float f;
char *str;
int v;
} d;
} Data;
The 'type' field indicates which part of the union should be used.
Note that you can make the type an enum for further clarity ...
>> a symbol here can be a function as well as integers, so we should have different structs for functions and variables?
Not "should" ... But it might be interesting, depending on how you use the symbols, to have different symbol tables. It also depends on the language you're dealing with ... If functions can have the same names as identifiers for example, that could impact your choice.
>> 1. There's this notation of quad merge, quad list.. what is that?
No idea. I've never heard the term "quad" used in that context. But they seem to be simple linked lists.
>> 2. The code includes a mips.h, which I think I won't be actually needing, what parts of the code should I remove besides removing the include mips.h on top?
mips is a processor architecture. So, this code is probably designed specifically for that architecture. You'll need to remove the parts specific for that architecture (if any), and potentially port the rest of the code to suit your architecture better.
>> 3. struct _callgraph *callgraphnode; /* Node in callgraph */ --> what is this for?
I haven't analyzed it further, but it seems to be used for identifying the call context of the symbol (which defines its scope).
For example :
typedef struct Data {
int type;
union {
float f;
char *str;
int v;
} d;
} Data;
The 'type' field indicates which part of the union should be used.
Note that you can make the type an enum for further clarity ...
>> a symbol here can be a function as well as integers, so we should have different structs for functions and variables?
Not "should" ... But it might be interesting, depending on how you use the symbols, to have different symbol tables. It also depends on the language you're dealing with ... If functions can have the same names as identifiers for example, that could impact your choice.
>> 1. There's this notation of quad merge, quad list.. what is that?
No idea. I've never heard the term "quad" used in that context. But they seem to be simple linked lists.
>> 2. The code includes a mips.h, which I think I won't be actually needing, what parts of the code should I remove besides removing the include mips.h on top?
mips is a processor architecture. So, this code is probably designed specifically for that architecture. You'll need to remove the parts specific for that architecture (if any), and potentially port the rest of the code to suit your architecture better.
>> 3. struct _callgraph *callgraphnode; /* Node in callgraph */ --> what is this for?
I haven't analyzed it further, but it seems to be used for identifying the call context of the symbol (which defines its scope).
ASKER
okay alongside of creating the symbol table, how would I create a syntax tree along with it? I guess I'll just copy and paste the struct that the above link is using here:
as you can see below, there's this notion of offset and level, I have no idea on what that is...
as you can see below, there's this notion of offset and level, I have no idea on what that is...
struct _symbol {
char id[MAXIDLENGTH]; /* identifier name */
char name[MAXIDLENGTH+1]; /* name to use in assembler code */
int type; /* INTEGER, REAL, BOOLEAN, NOTYPE */
int class; /* VAR, FUNC, PAR, CONST, TEMP, UNDEF */
int used; /* zero if not in scope */
int level; /* static level */
int offset; /* offset in block */
int nrparms; /* Number of parameters */
int value; /* Value for integer constant */
char *string; /* Value for string constant */
int live; /* one if symbol is live */
int live_on_exit; /* one if symbol is live */
struct _icode *next_use; /* next use of symbol */
int framesize; /* size of a-record for function */
struct _callgraph *callgraphnode; /* Node in callgraph */
struct _location *loc; /* List of locations for symbol */
struct _symbol *nextsym; /* next symbol */
};
>> how would I create a syntax tree along with it?
While you're parsing, you build up the parse tree. For example an expression 'expr + expr' might become an addition node with two children (one for each expression that needs to be added).
>> there's this notion of offset and level, I have no idea on what that is...
They've apparently got to do with the scope. The question is whether you need all that, or whether you have your own preferred way to deal with things like scope.
While you're parsing, you build up the parse tree. For example an expression 'expr + expr' might become an addition node with two children (one for each expression that needs to be added).
>> there's this notion of offset and level, I have no idea on what that is...
They've apparently got to do with the scope. The question is whether you need all that, or whether you have your own preferred way to deal with things like scope.
ASKER
ok, so far what I understand is that is that inside my parser I will need to include something like:
include "syntax.h"
which is a syntax tree...
my confusion of a syntax tree is that, a node can have various number of children. so do I need to define also various different struct for each? an example the expr '+' expr has two children as you mentioned, which is expr and expr, a for loop (I think) will have 4 children.. and so on.. so how do I deal with this? What do I need to store inside a node of a syntax tree.. an example of an implementation would help.
include "syntax.h"
which is a syntax tree...
my confusion of a syntax tree is that, a node can have various number of children. so do I need to define also various different struct for each? an example the expr '+' expr has two children as you mentioned, which is expr and expr, a for loop (I think) will have 4 children.. and so on.. so how do I deal with this? What do I need to store inside a node of a syntax tree.. an example of an implementation would help.
>> so how do I deal with this?
Depending on the type of the node, you know how many children to expect. You'll have different kinds of nodes, but they can all be implemented as a common struct with members that are common to all nodes and using a union for the parts that are different among the nodes.
>> What do I need to store inside a node of a syntax tree..
Basically, all information you'll need for whatever you plan to do with the parse tree. I don't know what you plan to do with it, but typically, the tree itself will represent the structure of the input, and the nodes will contain specific information like values, symbols, scope, etc.
>> an example of an implementation would help.
I've got some code lying around here, but it'll likely confuse you more than help you, since it's quite specific;
You'll probably find a lot of help in this book (freely available) :
http://www.cs.vu.nl/~dick/PTAPG.html
I know I did back when I read it, and I'm still referring to it regularly ;)
Depending on the type of the node, you know how many children to expect. You'll have different kinds of nodes, but they can all be implemented as a common struct with members that are common to all nodes and using a union for the parts that are different among the nodes.
>> What do I need to store inside a node of a syntax tree..
Basically, all information you'll need for whatever you plan to do with the parse tree. I don't know what you plan to do with it, but typically, the tree itself will represent the structure of the input, and the nodes will contain specific information like values, symbols, scope, etc.
>> an example of an implementation would help.
I've got some code lying around here, but it'll likely confuse you more than help you, since it's quite specific;
You'll probably find a lot of help in this book (freely available) :
http://www.cs.vu.nl/~dick/PTAPG.html
I know I did back when I read it, and I'm still referring to it regularly ;)
ASKER
ok, I think that I will only need two things here:
one is the node type , which denotes if it's a +, -, /,for,while,etc
and the other one is the expression type, such as, int,char, bool
and the children of that node, you mentioned that I should use the union for storing other nodes.. am I right? I am not quite clear on that explanation..
one more thing, what kind of operations does this syntax tree needs to have?
one is the node type , which denotes if it's a +, -, /,for,while,etc
and the other one is the expression type, such as, int,char, bool
and the children of that node, you mentioned that I should use the union for storing other nodes.. am I right? I am not quite clear on that explanation..
one more thing, what kind of operations does this syntax tree needs to have?
>> and the other one is the expression type, such as, int,char, bool
Note that an expression can be a lot of things. For example :
5 + 4 * 3
could be represented in a parse tree as :
+
/ \
5 *
/ \
4 3
>> you mentioned that I should use the union for storing other nodes..
A union is a C construct that allows to store different types in the same memory location. For example :
union test {
int value;
char *str;
}
defines a type 'test' whose value can be interpreted either as an int or a char*. Depending on the context, you can store either of the two in there, but not both at the same time.
This comes in handy when you need to store some data, but you don't know beforehand what the type of the data will be.
When it comes to a parse tree, this can be used to represent the varying parts of the nodes.
Here's some more information on unions in case I wasn't clear enough :
http://www.cplusplus.com/doc/tutorial/other_data_types.html
>> one more thing, what kind of operations does this syntax tree needs to have?
At the very least, you'll need to be able to add nodes to it, and to "walk" through the tree. What further operations you need to be able to do on it depends entirely on what you'll use the parse tree for.
Note that an expression can be a lot of things. For example :
5 + 4 * 3
could be represented in a parse tree as :
+
/ \
5 *
/ \
4 3
>> you mentioned that I should use the union for storing other nodes..
A union is a C construct that allows to store different types in the same memory location. For example :
union test {
int value;
char *str;
}
defines a type 'test' whose value can be interpreted either as an int or a char*. Depending on the context, you can store either of the two in there, but not both at the same time.
This comes in handy when you need to store some data, but you don't know beforehand what the type of the data will be.
When it comes to a parse tree, this can be used to represent the varying parts of the nodes.
Here's some more information on unions in case I wasn't clear enough :
http://www.cplusplus.com/doc/tutorial/other_data_types.html
>> one more thing, what kind of operations does this syntax tree needs to have?
At the very least, you'll need to be able to add nodes to it, and to "walk" through the tree. What further operations you need to be able to do on it depends entirely on what you'll use the parse tree for.
ASKER
yes I understand that there can be many different combinations of an expression.. especially in the boolean part.. I really don;t know where to start in this parse tree..
>> I really don;t know where to start in this parse tree..
A sound basis in data structures is required for this, as well as a good understanding of the language you are implementing the parser for.
Knowing the language (as well as what you plan to do with the parse tree) allows you to identify what you need to put in the parse tree, and what kind of operations you'll need to perform on the parse tree.
This information is crucial to make the correct choice for a parse tree data structure.
I'm afraid you'll have to analyze your needs first, before you can make a good parse tree design.
You can get ideas from existing implementations of course, but usually, a parse tree will be very specific to the application it's written for, and there are few cases where you can simply re-use an existing implementation.
A sound basis in data structures is required for this, as well as a good understanding of the language you are implementing the parser for.
Knowing the language (as well as what you plan to do with the parse tree) allows you to identify what you need to put in the parse tree, and what kind of operations you'll need to perform on the parse tree.
This information is crucial to make the correct choice for a parse tree data structure.
I'm afraid you'll have to analyze your needs first, before you can make a good parse tree design.
You can get ideas from existing implementations of course, but usually, a parse tree will be very specific to the application it's written for, and there are few cases where you can simply re-use an existing implementation.
ASKER
I've found an implementation where they build the syntax tree first and then they build a symbol table by traversing the syntax tree, is such thing possible? or does it just makes life more complicated.. one more thing, in my .l and .y file I included several other .h file which has a corresponding .c file, I am confused with my makefile though, here's what I have:
What I included in the .h file is a file called globals.h, util.h, and scan.h, only util.h and scan.h has a corresponding.c file and globals.h is just a bunch of declarations.. I am not an expert on makefiles so I need some help
What I included in the .h file is a file called globals.h, util.h, and scan.h, only util.h and scan.h has a corresponding.c file and globals.h is just a bunch of declarations.. I am not an expert on makefiles so I need some help
CC = gcc
CFLAGS = -g
CFILES = main.c lex.yy.c y.tab.c
OFILES = main.o lex.yy.o y.tab.o
HFILES = y.tab.h
compile: $(OFILES)
$(CC) $(CFLAGS) $(OFILES) -lfl -o compile
y.tab.c : parse.y
yacc -v -d parse.y
lex.yy.c : scan.l y.tab.h
flex scan.l
y.tab.h : parse.y
yacc -v -d parse.y
clean:
/bin/rm -f compile lex.yy.c y.tab.c y.tab.h *.BAK *.o
ASKER
oh and one more question:
is the syntax tree created no matter if the input is valid or not? therefore my question is that is there a general convention where we should create a symbol table first and then create a syntax tree or the process can be reversed? Thanks
is the syntax tree created no matter if the input is valid or not? therefore my question is that is there a general convention where we should create a symbol table first and then create a syntax tree or the process can be reversed? Thanks
>> I've found an implementation where they build the syntax tree first and then they build a symbol table by traversing the syntax tree, is such thing possible? or does it just makes life more complicated..
The advantage is that it makes the parsing phase a bit easier. However, building the symbol table after parsing has finished means that the parse tree will have to be updated with these symbols too. For example, the same symbol will likely be used in different locations, and these different locations in the parse tree have to refer to the same symbol.
>> I am not an expert on makefiles so I need some help
I generally do something like below (note that you place everything in one directory - it's probably better to separate different modules in different folders, which will be especially useful when the project gets larger).
>> is the syntax tree created no matter if the input is valid or not?
That depends on your error handling. But at least it will be built up to the point where the error is discovered.
>> therefore my question is that is there a general convention where we should create a symbol table first and then create a syntax tree or the process can be reversed?
Creating the symbol table first is an option. It will require you to do two passes over the input though - once for building the symbol table, and a second time for building the parse tree.
The advantage is that it makes the parsing phase a bit easier. However, building the symbol table after parsing has finished means that the parse tree will have to be updated with these symbols too. For example, the same symbol will likely be used in different locations, and these different locations in the parse tree have to refer to the same symbol.
>> I am not an expert on makefiles so I need some help
I generally do something like below (note that you place everything in one directory - it's probably better to separate different modules in different folders, which will be especially useful when the project gets larger).
>> is the syntax tree created no matter if the input is valid or not?
That depends on your error handling. But at least it will be built up to the point where the error is discovered.
>> therefore my question is that is there a general convention where we should create a symbol table first and then create a syntax tree or the process can be reversed?
Creating the symbol table first is an option. It will require you to do two passes over the input though - once for building the symbol table, and a second time for building the parse tree.
CC = gcc
LEX = flex
YACC = yacc
RM = /bin/rm -f
CFLAGS = -g -Wall -ansi
LEXFLAGS = -sp -Cfe
OBJ = main.o scan.o parse.o
BIN = compile
LIBS = -lfl
LEXSRCGEN = scan.c
YACCSRCGEN = parse.c parse.h
SRCGEN = $(YACCSRCGEN) $(LEXSRCGEN)
YACCDEBUGGEN = y.output
DEBUGGEN = $(YACCDEBUGGEN)
all: all-before $(BIN) all-after
all-before: parse.o scan.o
all-after:
$(BIN): $(OBJ)
$(CC) $(OBJ) -o "$(BIN)" $(LIBS)
main.o: main.c
$(CC) -c main.c -o main.o $(CFLAGS)
scan.o: scan.l parse.h
$(RM) $(LEXSRCGEN)
$(LEX) -t $(LEXFLAGS) scan.l > scan.c
$(CC) -c scan.c -o scan.o $(CFLAGS)
parse.h: parse.o
parse.o: parse.y
$(RM) $(YACCSRCGEN) $(YACCDEBUGGEN) y.tab.c y.tab.h
$(YACC) -v -d parse.y
$(MV) y.tab.c parse.c
$(MV) y.tab.h parse.h
$(CC) -c parse.c -o parse.o -DYYDEBUG=1 $(CFLAGS)
clean:
${RM} $(SRCGEN) $(DEBUGGEN) $(OBJ) $(BIN)
ASKER
>>For example, the same symbol will likely be used in different locations, and these different locations in >>the parse tree have to refer to the same symbol.
I don't get what you mean by this, can you please clarify?
I don't get what you mean by this, can you please clarify?
ASKER
>>That depends on your error handling. But at least it will be built up to the point where the error is >>discovered.
Well I mean here disregard the error handling, what I mean is that. Even though there's an error in the type checking, will the syntax tree still be created?
Well I mean here disregard the error handling, what I mean is that. Even though there's an error in the type checking, will the syntax tree still be created?
>> >>For example, the same symbol will likely be used in different locations, and these different locations in
>> >>the parse tree have to refer to the same symbol.
>>
>> I don't get what you mean by this, can you please clarify?
Sure. Suppose you have :
int value = 5;
value += 3;
return value;
These three lines all use the 'value' symbol. All three lines have to refer to the same 'value' symbol in order for a correct interpretation of the input.
>> Even though there's an error in the type checking, will the syntax tree still be created?
What do you mean ? If you start building the parse tree, then the parse tree will be created. If an error occurs while building the parse tree, then the parse tree will be built up to the location of the error. If not error occurs, then it will be completely built.
>> >>the parse tree have to refer to the same symbol.
>>
>> I don't get what you mean by this, can you please clarify?
Sure. Suppose you have :
int value = 5;
value += 3;
return value;
These three lines all use the 'value' symbol. All three lines have to refer to the same 'value' symbol in order for a correct interpretation of the input.
>> Even though there's an error in the type checking, will the syntax tree still be created?
What do you mean ? If you start building the parse tree, then the parse tree will be created. If an error occurs while building the parse tree, then the parse tree will be built up to the location of the error. If not error occurs, then it will be completely built.
ASKER
well, while I am building the syntax tree.. I don't do type checking.. are you an error like a multiple ID has been declared? something like that?
So, how will you keep your symbol table and parse tree consistent ?
ASKER
well..I think I mis-paraphrased what I am trying to say here, what I meant is that, first in the parser.y I created nodes of the syntax tree, after all that is done.. I created the symbol table by traversing from the root of the syntax tree from the parsing.. after the symbol table is done correctly, I do type checking.. that's basically my steps here.. I just have some doubts here, whether this step is fine or not..
You can do it that way. If it works, it's fine ;) As long as you have a well-defined plan of action, there should be no issues. I'm just trying to help you cover everything.
ASKER
Yes, I am just worried that in the steps above.. I am missing something that I don't see.. one other thing,I am not sure the specs allow this, it says:
If an identifier is declared to have scope local to a function, then all uses of that identifier within that function refer to this local entity; if an identifier is not declared as local to a function, but is declared as a global, then any use of that identifier within that function refers to the entity with global scope.
Then say that I have the code which is something like this:
If an identifier is declared to have scope local to a function, then all uses of that identifier within that function refer to this local entity; if an identifier is not declared as local to a function, but is declared as a global, then any use of that identifier within that function refers to the entity with global scope.
Then say that I have the code which is something like this:
int x;
int foo(void){
x = 5; ==> look in here
}
do you think that based on the specs I pasted above, this will be an error ??
Since you construct your symbol table AFTER constructing the parse tree, you'll have to do another pass over the parse tree to make sure that every symbol usage is correct. In your example, you'll have to make sure that the node for "x = 5;" refers to the global 'x' symbol in the symbol table.
ASKER
yes, that's what I've been doing.. first I create nodes that will be members of the syntax tree at the parsing process, store the root, etc... secondly I do a pass over that tree, which is actually a syntax tree.. remember it's not a parse tree..but a syntax tree.. as the node is the type of operators and it has child representing other nodes.. i do a traversal over that tree to make a symbol table, once the symbol table is done.. it traverses the tree once again to do type checking based on the symbol table that has been created...
the question I am asking is that, based on the specs above I copy and pasted.. is it allowable to have nested scopes? or will the code below generate an error?
the question I am asking is that, based on the specs above I copy and pasted.. is it allowable to have nested scopes? or will the code below generate an error?
int x;
int foo(void){
if ( 5>2){
int b;
for (int i = 5; i < 25; i++){
int z;
if (z > 20){
x = 55;
}
}
return x;
}
>> remember it's not a parse tree..but a syntax tree..
A syntax tree is an evolution of a parse tree - they basically contain the same information (just structured differently, and abstracted further in the case of a syntax tree), so for the current discussion I consider the terms inter-changeable. I'll use the term syntax tree if you prefer though ;)
>> the question I am asking is that, based on the specs above I copy and pasted.. is it allowable to have nested scopes?
Why would it be an error. The specification you posted describes the difference between local and global variables, and how they are valid in a local scope. Basically, what it says is that a locally declared variable x will hide a globally declared variable x.
So :
A syntax tree is an evolution of a parse tree - they basically contain the same information (just structured differently, and abstracted further in the case of a syntax tree), so for the current discussion I consider the terms inter-changeable. I'll use the term syntax tree if you prefer though ;)
>> the question I am asking is that, based on the specs above I copy and pasted.. is it allowable to have nested scopes?
Why would it be an error. The specification you posted describes the difference between local and global variables, and how they are valid in a local scope. Basically, what it says is that a locally declared variable x will hide a globally declared variable x.
So :
case 1 :
------
int x; <--- globally declared x
void fun() {
x; <--- will refer to the globally declared x
}
case 2 :
------
void fun() {
int x; <--- locally declared x
x; <--- will refer to the locally declared x
}
case 3 :
------
int x; <--- globally declared x
void fun() {
int x; <--- locally declared x
x; <--- will refer to the locally declared x (the globally declared x is 'hidden')
}
case 4 :
------
void fun() {
x; <--- error : no variable x declared
}
ASKER
okay.. I think I got it.. as I think I saw something in the text that says that there's only 2 scope, which is local and global.. which is kind of true but kind of absurd for nested statements... as inside a nested statements it can have a local local variable.. etc, right?
and then when I do the type checking on the syntax tree once again.. and consult the symbol table and found an error.. does it erases the node that has an error... or it just leaves it
and I read online that a symbol table can have a delete operator.. what is the purpose of this? as of right now my current symbol table implementation does not have a delete operation, so I was just wondering what it would be useful for
and then when I do the type checking on the syntax tree once again.. and consult the symbol table and found an error.. does it erases the node that has an error... or it just leaves it
and I read online that a symbol table can have a delete operator.. what is the purpose of this? as of right now my current symbol table implementation does not have a delete operation, so I was just wondering what it would be useful for
ASKER
oh and I forgot so based on the code:
it is a valid code? x = 55 will use x that is declared in the global area?
it is a valid code? x = 55 will use x that is declared in the global area?
int x;
int foo(void){
if ( 5>2){
int b;
for (int i = 5; i < 25; i++){
int z;
if (z > 20){
x = 55; ==============> look here
}
}
return x;
}
>> as inside a nested statements it can have a local local variable.. etc, right?
Exactly. See the example code below.
In other words, the declaration from the nearest enclosing scope is taken.
>> does it erases the node that has an error... or it just leaves it
That depends on what you want to do. In case of an error, it's usual to fail gracefully, and report the error to the user, so he can fix it and try again.
>> and I read online that a symbol table can have a delete operator..
I don't know where you read it, so I can't be sure what they were referring to, but the most logical would be that you add an operation to remove a symbol from the symbol table. This can be done as soon as a symbol is no longer needed (for example because its scope has ended).
Take a look at stacks to manage nested scopes ...
>> it is a valid code? x = 55 will use x that is declared in the global area?
Yes, it is, and yes, it will.
(ignoring the missing } and the declaration inside the for initialization part)
Exactly. See the example code below.
In other words, the declaration from the nearest enclosing scope is taken.
>> does it erases the node that has an error... or it just leaves it
That depends on what you want to do. In case of an error, it's usual to fail gracefully, and report the error to the user, so he can fix it and try again.
>> and I read online that a symbol table can have a delete operator..
I don't know where you read it, so I can't be sure what they were referring to, but the most logical would be that you add an operation to remove a symbol from the symbol table. This can be done as soon as a symbol is no longer needed (for example because its scope has ended).
Take a look at stacks to manage nested scopes ...
>> it is a valid code? x = 55 will use x that is declared in the global area?
Yes, it is, and yes, it will.
(ignoring the missing } and the declaration inside the for initialization part)
int x; <--- globally declared x
void fun() {
int x; <--- locally declared x (1)
{
int x; <--- locally declared x (2)
{
x; <--- will refer to the locally declared x (2) (the locally declared x (1) and the globally declared x are 'hidden')
}
x; <--- will refer to the locally declared x (2) (the locally declared x (1) and the globally declared x are 'hidden')
}
x; <--- will refer to the locally declared x (1) (the globally declared x is 'hidden')
}
ASKER
okay, I also have read about this stack thing to manage scopes.. lets use my example again below. Basically as far as I understand about this stack thing is that it pops a symbol table and then when a declaration is used it looks at the top of the stack and see if a declaration has been made there and if it doesn't then it fails.. so in this case the variable x is at the bottom of the stack and when processing x = 55 the top of the stack, doesn't contain any declaration of x.. so how will it handle it?
int foo(void){
if ( 5>2){
int b;
for (int i = 5; i < 25; i++){
int z;
if (z > 20){
x = 55; ==============> look here
}
}
return x;
}
}
There are different ways to handle it. The way you describe it, means that you have to walk the stack (from top to bottom) until you find a declaration for the variable x. The lowest level in the stack will be the global scope.
ASKER
based on the specs I wrote above, should I be able to look deep down inside the stack or just look at what's currently on the top of the stack ?? sorry for asking this as English is not my mothers language so I might understaand it differently from what's it's suppose to mean.. I think looking deep down through the stack makes more sense...
>> should I be able to look deep down inside the stack or just look at what's currently on the top of the stack ??
That depends on how you implement the stack. But the way you described it, you might have to look all the way down the stack to the global scope (if none of the enclosing local scopes contain the symbol declaration).
That depends on how you implement the stack. But the way you described it, you might have to look all the way down the stack to the global scope (if none of the enclosing local scopes contain the symbol declaration).
ASKER
okay I think I got it, I am now just confused on an array of int, say I have:
the specs says that an array of int is compatible with an array of int.. so.. does that means that it's not compatible with an int?
the specs says that an array of int is compatible with an array of int.. so.. does that means that it's not compatible with an int?
int test[20];
test[20] = 5
>> so.. does that means that it's not compatible with an int?
An array is an array, it's not a single int. So, an array of int and an int are not compatible.
Note however that you can assign an int to a certain index position in the array :
test[3] = 5;
will assign the value 5 to the 4-th int in the array 'test'.
An array is an array, it's not a single int. So, an array of int and an int are not compatible.
Note however that you can assign an int to a certain index position in the array :
test[3] = 5;
will assign the value 5 to the 4-th int in the array 'test'.
ASKER
okay, so I can only assign an array of int with an array of int, can you show me that in C...
>> so I can only assign an array of int with an array of int
No, that's not possible in C. You cannot assign an array.
No, that's not possible in C. You cannot assign an array.
ASKER
then what does it mean in the specs if it says:
an array of int is compatible with an array of int..
an array of int is compatible with an array of int..
>> an array of int is compatible with an array of int..
Can you show the full text where this statement occurs ? I'll need the context to answer that question ;)
Can you show the full text where this statement occurs ? I'll need the context to answer that question ;)
ASKER
Variables must be declared before they are used. Functions must have their argument types and return value specified (either via a prototype or via a definition) before they are called. If an identifier is declared to have scope local to a function, then all uses of that identifier within that function refer to this local entity; if an identifier is not declared as local to a function, but is declared as a global, then any use of that identifier within that function refers to the entity with global scope. The following rules guide the checking of type consistency. The notion of two types being compatible is defined as follows:
1. int is compatible with int, and char is compatible with char;
2. int is compatible with char, and vice versa;
3. an array of int is compatible with an array of int, and an array of char is compatible with an array of char; and
4. any pair of types not covered by one of the rules given above is not compatible.
1. int is compatible with int, and char is compatible with char;
2. int is compatible with char, and vice versa;
3. an array of int is compatible with an array of int, and an array of char is compatible with an array of char; and
4. any pair of types not covered by one of the rules given above is not compatible.
>> The following rules guide the checking of type consistency.
This is about type consistency, not about whether you can assign one to the other. For example, if a function takes an array of int as argument, then the calling code HAS to pass it a compatible type - in this case an array of int.
This is about type consistency, not about whether you can assign one to the other. For example, if a function takes an array of int as argument, then the calling code HAS to pass it a compatible type - in this case an array of int.
ASKER
Hmmm...some more concrete example would help I guess..
int fun(int arr[]) {
}
int arr[10];
fun(arr);
}
int arr[10];
fun(arr);
ASKER
so does the compiler actually has another type which is called an array of int and an array of char...
They are different types, yes.
ASKER
so when I declare something like:
int a, b[]
how does it know that b is a type of array of int and a is just an int.. doesn't the parser just says that the type of both is an int
int a, b[]
how does it know that b is a type of array of int and a is just an int.. doesn't the parser just says that the type of both is an int
>> doesn't the parser just says that the type of both is an int
Not if your parser parses the data correctly.
Not if your parser parses the data correctly.
ASKER
okay, then I guess I'll just have to figure out a way so that it can find a way to separate that an id with a[] is an array and without it whatever type it is declared... and is something like below valid? As far as I am concerned by looking at the specs it doesn't as the an array of int is only compatible with an array of int, while "THIS IS A TEST" is an array of char, am I right?
void fun(int a){
int test[200];
test = "THIS IS A TEST";
}
>> I'll just have to figure out a way so that it can find a way to separate that an id with a[] is an array and without it whatever type it is declared...
The grammar should already account for that.
>> and is something like below valid?
Not in C. You cannot assign anything to an array, not even another array of the same type.
The grammar should already account for that.
>> and is something like below valid?
Not in C. You cannot assign anything to an array, not even another array of the same type.
ASKER
okay, so then all I can assign is an element in an array? am I right?
I can assign test[159], but not just test...
I can assign test[159], but not just test...
If you're talking about C, yes.
ASKER
okay got it, and in your example you mentioned, are you trying to say:
int fun(int arr[]);
int fun(int arr[]);
int fun(int arr[]) {
}
int arr[10];
fun(arr);
>> are you trying to say:
No, I was trying to say what I said ;) Why ?
No, I was trying to say what I said ;) Why ?
ASKER
any other places where you can think that an array of int must be compatible with an array of int, besides that example you gave me
If you are trying to write a parser for the C language, it's probably best to take a look at the C standard (ISO-IEC 9899). It will answer all these kind of questions and a lot more.
ASKER
it's not actually a C compiler, but it's a subset of it.. so I guess all the rules that C accepts my compiler also accepts..
ASKER
oh and there's also this rule:
Any function called from within an expression must not have return type void. Any function call that is a statement must have return type void.
Does it mean that say I have:
fun cannot return void here?
and fee must return void?
Any function called from within an expression must not have return type void. Any function call that is a statement must have return type void.
Does it mean that say I have:
fun cannot return void here?
and fee must return void?
int foo(void){
int x;
x = fun(5);
fee(23);
return 2;
}
>> Any function called from within an expression must not have return type void. Any function call that is a statement must have return type void.
That's not true in C.
>> Does it mean that say I have:
That's what they say, yes.
That's not true in C.
>> Does it mean that say I have:
That's what they say, yes.
ASKER
can someone please help me on how to fix this error:
I actually don't want call to be :
var LP args RP
instead I want it to be:
id LP args RP
However if I do this then $$ -> attr.name = savedIdName won't give me the correct name... Basically in order for it to be correct, I need to have a variable xxxx to be equal to savedIdName before it hits args.. how can I do this?
I actually don't want call to be :
var LP args RP
instead I want it to be:
id LP args RP
However if I do this then $$ -> attr.name = savedIdName won't give me the correct name... Basically in order for it to be correct, I need to have a variable xxxx to be equal to savedIdName before it hits args.. how can I do this?
call : var LP args RP
{ $$ = newStmtNode(CallK);
fprintf(stderr, "call function name is %s\n", savedFunName);
$$ -> attr.name = $1->attr.name;
$$ -> child[0] = $3;
}
| var LP args error {fprintf(stderr, "Missing a right parentheses \n");}
;
var : id
{ $$ = newExpNode(IdK);
$$ -> attr.name = savedIdName;
$$ ->child[0] = NULL;
}
| id LSP exp RSP
{ $$ = newExpNode(IdK);
$$ -> arrayType = TRUE;
$$ -> attr.name = savedIdName;
$$ ->child[0] = $3;
}
| id LSP exp error {fprintf(stderr, "Missing a right parentheses \n");}
| id LSP error {fprintf(stderr, "Error in expression inside array \n");}
;
id : ID { savedIdName = copyString(tokenString);}
;
Is this still related to your original question (which was about symbol tables) ? If not, then you should post a new question, so that the PAQ database remains easy to search.
ASKER
although it's not directly related , but I think it's still appropriate in the context as this is important for building my symbol table
>> Basically in order for it to be correct, I need to have a variable xxxx to be equal to savedIdName before it hits args.. how can I do this?
I thought you were constructing the symbol table after parsing the file ?
I thought you were constructing the symbol table after parsing the file ?
ASKER
I construct the symbol table at the same time I parse the file...
Earlier you said you did it in 3 passes - so how are you doing it now then ?
ASKER
basically in my main program, I have two functions.. one is called buildSymboltable and the other is called typecheck. both functions takes a function called parse as the arguments. Parse basically just calls yyparse() and returns the starting root of the syntax tree
>> both functions takes a function called parse as the arguments.
Meaning that you parse the input twice ?? Why ?
Meaning that you parse the input twice ?? Why ?
ASKER
yes, basically I parse it twice.. I know it's kind of useless..but to do the type checking what I did was start from the root of the syntax tree and as I now has more information of the symbol table.. the second traversal was basically to parse it... I know this is not really efficient and I will try to improve this in the future.. as I don't have that much time left.. so back to topic, how can I resolve the shift/reduce conflict above
>> how can I resolve the shift/reduce conflict above
I'm sorry. What shift/reduce conflict are you talking about ?
>> I know this is not really efficient and I will try to improve this in the future.. as I don't have that much time left..
It's no big deal. Instead of :
buildSymbolTable(parse());
typecheck(parse());
you can do :
syntaxTree = parse();
buildSymbolTable(syntaxTre e);
typecheck(syntaxTree);
I'm sorry. What shift/reduce conflict are you talking about ?
>> I know this is not really efficient and I will try to improve this in the future.. as I don't have that much time left..
It's no big deal. Instead of :
buildSymbolTable(parse());
typecheck(parse());
you can do :
syntaxTree = parse();
buildSymbolTable(syntaxTre
typecheck(syntaxTree);
ASKER
look at the call production rule above, I actually want it to be id lp args rp instead of var lp exp rp, but args can be a var, if that's so then savedIdName would be overwritten and that's not the result I want
>> if that's so then savedIdName would be overwritten and that's not the result I want
That's not a shift/reduce conflict ;)
Why do you use savedIdName ? Why not return the name ? Or alternatively, use the name as soon as you parse it, rather than wait until the end of the rule.
That's not a shift/reduce conflict ;)
Why do you use savedIdName ? Why not return the name ? Or alternatively, use the name as soon as you parse it, rather than wait until the end of the rule.
ASKER
savedIdName is basically the name of the identifier that the scanner found and that is assigned to a struct that I have, which is :
$$ -> attr.name = $1->attr.name;
if I write it as:
id LSP exp RSP
then the semantics would be :
$$ ->attr.name = savedIdName
that's the purpose why I need savedIdName to return the name of the first id not the id which is on the expression. I've made several different approach to this problem, such as :
id dummy LSP exp RSP
dummy : {savedCallName = savedIdName;}
but this then will cause me to have a shift reduce conflict later on, which I don't have right now when using the var LSP exp RSP, I've tried to move that dummy rule elsewhere, as long as it's before it hits exp (or to be more precise var), but still it gave me this stupid shift/reduce conflict. I am not really clear my self what you meant by use the name as soon as you parse it, as I will need $$ -> attr.name to have the name id, as I need this in the future for my type checking and symbol table.. if I have to modify on how it works this then that would just screw eveyrthing up.
$$ -> attr.name = $1->attr.name;
if I write it as:
id LSP exp RSP
then the semantics would be :
$$ ->attr.name = savedIdName
that's the purpose why I need savedIdName to return the name of the first id not the id which is on the expression. I've made several different approach to this problem, such as :
id dummy LSP exp RSP
dummy : {savedCallName = savedIdName;}
but this then will cause me to have a shift reduce conflict later on, which I don't have right now when using the var LSP exp RSP, I've tried to move that dummy rule elsewhere, as long as it's before it hits exp (or to be more precise var), but still it gave me this stupid shift/reduce conflict. I am not really clear my self what you meant by use the name as soon as you parse it, as I will need $$ -> attr.name to have the name id, as I need this in the future for my type checking and symbol table.. if I have to modify on how it works this then that would just screw eveyrthing up.
ASKER
okay nevermind the problem above, I got it settled now.. now regarding type checking and the symbol table, the spec says something like this:
A function whose prototype is preceded by the keyword extern must not be defined in the program being processed.
what does it mean? does it mean that if I see a function preceeded with the extern keyword I don't put that into my function table/symbol table?
A function whose prototype is preceded by the keyword extern must not be defined in the program being processed.
what does it mean? does it mean that if I see a function preceeded with the extern keyword I don't put that into my function table/symbol table?
It is a valid symbol - it just means that its definition will be found in some other object file.
ASKER
so we don't actually put the definition again in that symbol table right?
The symbol table is for symbols. The definition should be elsewhere. Note that it's the linker's job to find the correct implementation for each requested symbol.
ASKER
ok sot he point that the specs says:
A function whose prototype is preceded by the keyword extern must not be defined in the program being processed.
means that if I already have:
extern int foo(void)
then I can't have some other int foo(void) inside the code right?
A function whose prototype is preceded by the keyword extern must not be defined in the program being processed.
means that if I already have:
extern int foo(void)
then I can't have some other int foo(void) inside the code right?
No, it means that the function definition (implementation) might be in some other compilation unit. That's all.
ASKER
okay and more issue or doubts I am having, is such thing possible if you look at this rule:
int is compatible with char, and vice versa;
int is compatible with char, and vice versa;
int a[5];
int y[20];
int x;
x = y[a[2]];
Sure.
ASKER
ok back to the extern stuff...
If I have extern int foo(void), I still can define int foo(void) underneath in the same code??
If I have extern int foo(void), I still can define int foo(void) underneath in the same code??
Sure.
ASKER
so then basically the type checking should be checking if there's another funcion declaration which is foo but not extern
?? Why ?
ASKER
you see I am not really clear by this extern stuff, so say I have
extern int foo(void)
then what can't I have after that? what type of error checking do I need to check if there's a function definition that has an extern?
extern int foo(void)
then what can't I have after that? what type of error checking do I need to check if there's a function definition that has an extern?
A definition is not the same as a declaration. This is a function declaration :
int fun();
This is a function definition :
int fun() {
return 0;
}
(its implementation)
If you have an extern function declaration, it means that its definition is in some other compilation unit.
int fun();
This is a function definition :
int fun() {
return 0;
}
(its implementation)
If you have an extern function declaration, it means that its definition is in some other compilation unit.
ASKER
okay, I understand that now.. however if I have:
how does the function that uses fun knows where it should look fun at? is it the extern one or the one I declared in file.
or the below code will result an error?
how does the function that uses fun knows where it should look fun at? is it the extern one or the one I declared in file.
or the below code will result an error?
extern int fun(void);
int fun(){
return 5;
}
As I said earlier : that's the linker's job. The compiler does not care.
The linker will complain if :
(a) it doesn't find the function definition it needs
(b) it finds multiple definitions for the same function
The linker will complain if :
(a) it doesn't find the function definition it needs
(b) it finds multiple definitions for the same function
ASKER
okay, but I think as the specs write this.. I think now it's the compiler's job to see that if :
extern int fun(void);
int fun(){
return 5;
}
it should complain
extern int fun(void);
int fun(){
return 5;
}
it should complain
>> I think now it's the compiler's job
Not in any major C compiler currently existing ;) There's no point to already check it in the compiler, since all a compiler does is translate input files to object files. It doesn't create the executable, and it isn't aware of other object files.
Not in any major C compiler currently existing ;) There's no point to already check it in the compiler, since all a compiler does is translate input files to object files. It doesn't create the executable, and it isn't aware of other object files.
ASKER
another question:
is the following legal?
is the following legal?
int fun(void);
int fun(void){
char a[20];
int h;
int j;
return 5;
}
int fee(void){
int h;
int j;
return 2;
}
May I remind you that the original question was :
>> can you guys help me to find an implementation of a symbol table using a hash table in C ?
I've answered a LOT of your extra questions that are not directly related to this original question. But you have to realize that this is not the way EE works ... The idea is to have a question with an answer that is then stored in the PAQ database so it can be found by others with the same question in the future.
Grouping multiple questions into one does not make searching answers easier.
So, please make this your last off-topic question, and consider closing this question, and opening a new one.
>> is the following legal?
Yes it is.
>> can you guys help me to find an implementation of a symbol table using a hash table in C ?
I've answered a LOT of your extra questions that are not directly related to this original question. But you have to realize that this is not the way EE works ... The idea is to have a question with an answer that is then stored in the PAQ database so it can be found by others with the same question in the future.
Grouping multiple questions into one does not make searching answers easier.
So, please make this your last off-topic question, and consider closing this question, and opening a new one.
>> is the following legal?
Yes it is.
http://www.iis.sinica.edu.
and the following docs:
http://www.mec.ac.in/resou
http://www.ks.uiuc.edu/Res