Link to home
Start Free TrialLog in
Avatar of Frank-22
Frank-22

asked on

Building a compiler

Hi

I'm in the middle of contruction a compiler for new programming language.

I have managed to build a basic compiler consisting of the following:

[code]
/********************global.h**************************/

#include <stdio.h>
#include <ctype.h>

#define BSIZE 128
#define NONE -1
#define EOS '\0'

#define NUM 256
#define DIV 257
#define MOD 258
#define ID 259
#define DONE 260

int tokenval;
int lineno;

struct entry {
  char *lexptr;
  int token;
};

struct entry symtable[100];

/***************************emitter.c*******************************/

#include "global.h"

emit(t, tval)
     int t, tval;
{
  switch(t) {
  case '+': case '-': case '*': case '/':
    printf("%c\n", t); break;
  case DIV:
    printf("DIV\n"); break;
  case NUM:
    printf("%d\n", tval); break;
  case ID:
    printf("%s\n", symtable[tval].lexptr); break;
  default:
    printf("token %d, tokenval %d\n", t, tval);
  }
}

/******************error.c***************************/

#include "global.h"

error(m)
     char *m;
{
  fprintf(stderr, "line %d: %s\n", lineno, m);
  exit(1);
}

/**************************init.c************************************/

#include "global.h"

struct entry keywords[] = {
  "div", DIV,
  "mod", MOD,
  0, 0
};

init()
{
  struct entry *p;
  for (p = keywords; p->token; p++)
    insert(p->lexptr, p->token);
}

/*****************************lexer************************/
#include "global.h"

char lexbuf[BSIZE];
int lineno = 1;
int tokenval = NONE;

int lexan()
{
  int t;
  while(1) {
    t = getchar();
    if (t == ' ' || t == '\t');
    else if (t == '\n')
      lineno = lineno + 1;
    else if (isdigit(t)) {
      ungetc(t, stdin);
      scanf("%d", &tokenval);
      return NUM;
    }
    else if (isalpha(t)) {
      int p, b = 0;
      while (isalnum(t)) {
      lexbuf[b] = t;
      t = getchar();
      b = b + 1;
      if (b >= BSIZE)
        error("We have a compiler error ha ha");
      }
      lexbuf[b] = EOS;
      if (t != EOF)
      ungetc (t, stdin);
      p = lookup(lexbuf);
      if( p == 0)
      p = insert(lexbuf, ID);
      tokenval = p;
      return symtable[p].token;
    }
    else if ( t == EOF)
      return DONE;

    else {
      tokenval = NONE;
      return t;
    }
  }
}

#include "global.h"

main()
{
  init();
  parse();
  exit(0);
}

/*********************************parser************************/
#include "global.h"

int lookahead;

parse()
{
  lookahead = lexan();
  while (lookahead != DONE ) {
    expr(); match(';');
  }
}
expr()
{
  int t;
  term();
  while(1)
    switch (lookahead) {
    case '+': case '-':
      t = lookahead;
      match(lookahead); term(); emit(t, NONE);
      continue;
    default:
      return;
    }
}
term()
{
  int t;
  factor();
  while(1)
    switch(lookahead) {
    case '*': case '/': case DIV: case MOD:
      t = lookahead;
      match(lookahead); factor(); emit(t,NONE);
      continue;
    default:
      return;
    }
}
factor()
{
  switch(lookahead) {
  case '(':
    match('('); expr(); match(')'); break;
  case NUM:
    emit(NUM, tokenval); match(NUM); break;
  case ID:
    emit(ID, tokenval); match(ID); break;
  default:
    error("syntax error ha ha");
  }
}
match(t)
     int t;
{
  if (lookahead == t)
    lookahead = lexan();
  else error("syntax error again haha");
}

/******************************symbol.c***********************************/

#include "global.h"

#define STRMAX 999
#define SYMMAX 100

char lexemes[STRMAX];
int lastchar = -1;
struct entry symtable[SYMMAX];
int lastentry = 0;

int lookup(s)
     char s[];
{
  int p;
  for (p = lastentry; p > 0; p = p - 1)
    if (strcmp(symtable[p].lexptr, s) == 0)
      return p;
  return 0;
}
int insert(s, tok)
char s[];
int tok;
{
  int len;
  len = strlen(s);
  if (lastentry + 1 >= SYMMAX)
    error("symbol table full");
  if (lastchar + len + 1 >= STRMAX)
    error("lexemes array is full");
  lastentry = lastentry + 1;
  symtable[lastentry].token = tok;
  symtable[lastentry].lexptr = &lexemes[lastchar + 1];
  lastchar = lastchar + len + 1;
  strcpy(symtable[lastentry].lexptr, s);
  return lastentry;
}

[/code]

I have a couple of questions: How do I make the compiler read-in keywords and statement from a specific file-format .oebj ?

Secondly then programming I have noticed that keywords are highlighted after they have been typed-in. Do I need implement anything specific in-order to get this to work in my programming language ?

Sincerely Frank
ASKER CERTIFIED SOLUTION
Avatar of ankuratvb
ankuratvb
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Kent Olsen
Hi Frank,

Compiler construction is one of the few frontiers left in the software development world.  And the available tools make it far less challenging (and fun) than it used to be.

If you'll post the "specific file format" we'll be glad to offer suggestions.


Kent
Avatar of Frank-22
Frank-22

ASKER

Hi,

The file-format I would like to use for my programming language is .oeb .

Sincerely

Frank
Hi Frank,

Which "oeb" format are you trying to use?  I know of at least three different formats that use the .oeb extension.  (I assume that you mean Open EBook, but I'd like to make sure before wandering into the deep end of the pool.)


Kent
Hi again,

i didn't that that file-extension was in use. I'm implementing a theoretical programming language which has not been implemented before. Therefore I find neccersery to come up with a completely new file-extension.
 
I found that .oebj isn't in use so I choosen that on insteed.

What I would like to be able to do is the following:

I have compiled all the files shown in my intial post into one compiler-program called "coebj.exe".  I would like to be able to write "coebj test-file.oebj" . This would compile the imaginary source-file test-file.oebj into a .exe program.

Any ideers on how I do that ?  


Sincerley

Frank


 
Ahhh...

Now I better understand.

Your code looks a lot more like a parser than a compiler.  (The parser, of course, comes first.)  I don't see anything that indicates any type of code generation.

Code generation is, of course, machine dependent.  If you're writing to an Intel or AMD processor you'll generate one set of instructions (binary data).  If you're writing to IBM mainframe, IBM RS6000, HP PA-RISC, or other processor you'll have to generate instructions for those processors.

Are you familiar with code generation?


Kent
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi Kdo,

I'm a bit familiar with code-generation. To make easier I would like to make my programming language run Intel, AMD machines.

As of now the code lack some basic - abilities such as recognizing characters, numbers, deliminators and basic keywords such as "if, else, while, do" etc.

How and where do I define this ?

Do I define this the parser.c file ?

Sincerely Frank

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
You can handcraft the parser if the grammar is not too compilcated.
The most popular way of doing that is to write a "recursive descent parser".
You may be able to find an online working example.


Hi

I have almost written my grammar maybe You Guys can help and maybe route out error if there are any.

Here are the the basics:

0,9 -> int

aZ, Az -> char

+,-,*/,=,<,>, =>, <=,^ .   -> Arithmetic operators.

(),[],{} , | -> Deliminators.

/{ \}  -> Comments (Must be ignored by the compiler).

Expressions:

a + b = c      
a  - b  |    
a * b  |      
a / b   |    
 
a + b < c
a  - b  |    
a * b  |      
a / b   |    

a + b > c
a - b  |
a * b |
a / b  |

(a +b) +c  = d  -> Evaluating Algebraic properties of regular expressions.
(a - b) * c |
a * (b - c) |


 X := expr; -> A variable is declaired with X as its name seperated by a := followed by the type of the variable and then its value and finally ';' that indicates the end of variable declairation.

a :  = type b;  -> Declairing a variable of a specific type (Int).

b:  =  type 'x';  -> Declairing a variable of  a specific type (Char).

c : =  type "y";   -> Declairing a variable of a specific type (String).

If an expression is starts with a '(' or '[' a scanning for ')' is conducted.

Reserved keywords are: if, then, else, while, do, begin, end, Func, Proc, Bool, Int, String.

This is my grammar as of now. Its lacks procedures, if,then,else , while, do.

Anybody get a good surgestion on how to do a grammar for those ?

Again if the are any inconsistensies in my grammar please correct me !

Sincerely

Frank