Solved

parsing algebraic expression into binary tree?

Posted on 1997-05-03
12
915 Views
Last Modified: 2013-11-15
If I want to parse an algebraic expression into a binary tree, how can I determine where the root is?

I have written a parser to solve an expression as it parses it (using algebraic rules) but this would be too slow if the expression must be used multiple times.  I decided to first parse the expression into a binary tree, then solve the expression using the tree.  Here's 2 examples...

(a+b)*c

my parser would work like this...
add a + b...get result...multiply result by c...
the tree could represent this by...

          *
        /   \
      +      c
    /    \
   a     b

(sorry if it is difficult to read the tree :( )
but,

a/b+(c-d)

my parser would work like this...
divide a by b...get result1...subtract d from c...get result2...add result2 to result1
the tree couldn't be represented in this fashion...

                            divide
                           /        \
                        a            b

HEY, how do I continue with the tree, divide shouldn't be the root, it should be...
                               +
                            /      \
                         div        -
                        /    \     /   \
                      a      b  c     d
...but my parser didn't do this.  So how can I find the correct root using only the algebraic rules?  Is there another way?  An algorithm to implement this correctly or an example code with explanation will be immensely helpful.

Thanks,

Scott
0
Comment
Question by:saspencer
  • 6
  • 3
  • 2
  • +1
12 Comments
 

Author Comment

by:saspencer
ID: 1163240
Edited text of question
0
 
LVL 5

Expert Comment

by:yonat
ID: 1163241
From the comp.lang.c FAQ at
http://www.eskimo.com/~scs/C-faq/q18.14.html

Two available packages are "defunc", available from
    ftp://sunsite.unc.edu/pub/packages/development/libraries/
                          defunc-1.3.tar.Z
and "parse" at ftp://lamont.ldgo.columbia.edu . Other options include the S-Lang interpreter, available from
    ftp://amy.tch.harvard.edu/pub/slang
and the shareware Cmm (``C-minus-minus'' or ``C minus the hard
stuff'').
There is also some parsing/evaluation code in Software Solutions
in C (chapter 12, pp. 235-55).
0
 

Author Comment

by:saspencer
ID: 1163242
A couple of the links were broken and I didn't have anything to open the tar.Z extension.  At any rate I would rather you explained a general algorithm and approach for this.  Some sample code would be even better.  One idea I had was to parse the entire expression into a stack.  Then the last item on the stack WILL be the root to the tree.  Is there a better way?
0
 
LVL 5

Expert Comment

by:yonat
ID: 1163243
> I didn't have anything to open the tar.Z
Then you probably use MS Windows - try downloading WinZip from
http://www.winzip.com .

> One idea I had was to parse the entire expression into a stack
For small expression, this is probably OK.
For larger ones, you can stop to calculate an already parsed
sub-expression whenever
1. Parenthesis are closed; or
2. You reach a lower priority operator. (for example '+' after
   a '/' sub-expression, or '*' after a '^' (power), and so on.)

You can find yet another example source code in the book "The C++
Programming Language" by Bjarne Stroustrup, chapter 3. This
example is, however, much simpler and less general from the
others.

Hope this helps.
0
 

Author Comment

by:saspencer
ID: 1163244
Maybe my question wasn't clear.  I need someone to EXPLAIN an algorithm to me to parse an expression into a binary tree.  Code with explanation accompanying the algorithm will be greatly appreciated.  I don't need links to someone's unexplained and outdated code.  I don't have the resources to go out and get all these books either.

Thanks
0
 
LVL 5

Expert Comment

by:yonat
ID: 1163245
There is no *one* algorithm. Below is a simple one. It parses
recoursivly (using xparse() ), until it reaches a number
(getoperand() ). Then it applies the arithmetic operation to it
(apply() ).

int getoperand(char **p);           // reads a number
int apply(int x, char op, int y);   // applies an operation
int xparse(char **p);               // parses an expression


int getoperand(char **p)
{
     int val;
     char ch;

     if (**p == '(') {          // sub-expression
          (*p)++;
          return xparse (p);
     }

     val = 0;

     while (1) {                // number
          ch = **p;
          if (ch < '0' || ch > '9')
               return val;
          val = val * 10 + ch - '0';
          (*p)++;
     }
}

int apply(int x, char op, int y)
{
     switch (op) {
          case '+': return x + y;
          case '-': return x - y;
          case '*': return x * y;
          case '/': return x / y;
     }
}

int xparse(char **p)
{
     int x, y;
     char op, lastop;
 
     x = 0;
     lastop = '+';
     while (1) {
          y = getoperand (p);
          op = **p;
          (*p)++;
          x = apply (x, lastop, y);
          if (op == '\0' || op == ')')
               return x;
          lastop = op;
     }
}

int arithmeticparse(char *line)
{
     char **p;
     p = &line;
     return xparse (p);
}


#ifdef TEST
/*==========================================================================*/
#include <stdio.h>

int main()
{
     int n;
     n = arithmeticparse ("5+6");
     printf ("5+6 = %d\n", n);
     n = arithmeticparse ("5*(6+2)");
     printf ("5*(6+2) = %d\n", n);
     n = arithmeticparse ("5*6+2");
     printf ("5*6+2 = %d\n", n);
     n = arithmeticparse ("2+(5*6)");
     printf ("2+(5*6) = %d\n", n);

     return 0;
}
#endif
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:saspencer
ID: 1163246
I have already written a very similar routine.  From my original question, what I don't have an algorithm for is getting it into a BINARY TREE.  In the method above, you will have to reparse the expression each time to evaluate the answer.  I need it to be faster.  The idea I had was to implement something like the above routine and , instead of evaluating it while parsing, set it up as a binary tree so that anytime I need to evaluated it, it doesn't need to be parsed, just step through the tree.  Hopefully much quicker.

:)
0
 
LVL 10

Expert Comment

by:RONSLOW
ID: 1163247
Here is some (untested) code that builds an expression tree (which you can later traverse to calculate the result)...

If you like this, please reject other answer and I can post this as an answer (and get some more points :-)


/* this is the token type for the token stream */
enum OpType {
      ENDEXPR = 0,
      ADD = '+',
      SUB = '-',
      MUL = '*',
      DIV = '/',
      LPAREN = '(',
      RPAREN = ')',
      CONST = '#'
};

/* here is a token */
struct Token {
      OpType op;      /* one of the above symbols */
      double number;      /* if op==CONST */
};

/* here is a node for the expression tree */
struct Node {
      OpType op;      /* type of node (operation or constant) */
      double number;      /* if op==CONST */
      struct Node* pLhs;      /* lhs sub-expression tree */
      struct Node* pRhs;      /* rhs sub-expression tree */
}

/* create a new node with specified op type and args */
Node* NewNode (OpType op, Node* pLhs, Node* pRhs) {
      Node* pNode = malloc(sizeof(Node));
      pNode->op = op;
      pNode->number = 0.0;
      pNode->pLhs = pLhs;
      pNode->pRhs = pRhs;
      return pNode;
}

/* create a new node for a constant number */
Node* NewNodeNumber (double number) {
      Node* pNode = malloc(sizeof(Node));
      pNode->op = CONST;
      pNode->number = number;
      pNode->pLhs = NULL;
      pNode->pRhs = NULL;
      return pNode;
}

Node* IsConst(Node* pNode) {
      return pNode && pNode->op == CONST;
}

Node* IsOp(Node* pNode) {
      return pNode && pNode->op != CONST;
}

/* return Lhs pointer for node (if any) */
Node* Lhs(Node* pNode) {
      return IsOp(pNode) ? pNode->pLhs : NULL;
}

/* return Rhs pointer for node (if any) */
Node* Rhs(Node* pNode) {
      return IsOp(pNode) ? pNode->pRhs : NULL;
}

/* return number value for node (if any) */
double Number(Node* pNode) {
      return IsConst(pNode) ? pNode->number : 0.0;
}

/* use this to delete a node and everything under it (for cleanup) */
void DeleteNode (Node* pNode) {
      if (pNode) {
            if (IsOp(pNode)) {
                  DeleteNode(Lhs(pNode));
                  DeleteNode(Rhs(pNode));
            }
            free(pNode);
      }
}

/* assume an array of tokens has been built elswhere */
/* NOTE: token stream MUST finish with an ENDEXPR token */
/* then build an expression tree from the tokens */
/* and return a poitner to the root of the tree */
Node* BuildTree (Token tokens[]) {
      int p = 0;
      return BuildAddSub(tokens,&p);
}

/* build a tree for an add or subtract operation */
Node* BuildAddSub (Token tokens[], int& p) {
      Node* pNode = BuildMulDiv(tokens,p);
      while (pNode) {
            OpType op = tokens[*p].op;
            if (op == ADD || op == SUB) {
                  Node* pOther;
                  (*p)++;
                  pOther = BuildMulDiv(tokens,p);
                  pNode = NewNode(op,pNode,pOther);
            } else {
                  break;
            }
      }
      return pNode;
}

/* build a tree for a multiply or divide operation */
Node* BuildMulDiv (Token tokens[], int& p) {
      Node* pNode = BuildSimple(tokens,p);
      while (pNode) {
            OpType op = tokens[*p].op;
            if (op == MUL || op == DIV) {
                  Node* pOther;
                  (*p)++;
                  pOther = BuildSimple(tokens,p);
                  pNode = NewNode(op,pNode,pOther);
            } else {
                  break;
            }
      }
      return pNode;
}

/* build a tree for a () expression or a number */
Node* BuildSimple (Token tokens[], int& p) {
      Node* pNode = NULL;
      OpType op = tokens[*p].op;
      switch (op) {
      case LPAREN:
            (*p)++;
            pNode = BuildAddSub(tokens,p); /* recursion here */
            op = tokens[*p].op;
            if (op != RPAREN) return NULL;
            (*p)++;
            break;
      case CONST:
            pNode = NewNodeNubmer(tokens[*p].number);
            (*p)++;
            break;
      }
      return pNode;
}

0
 

Author Comment

by:saspencer
ID: 1163248
RONSLOW:  That is exactly what I needed.  I haven't used binary trees for my data structures much.  If you changed the C stuff to C++ (malloc to new, etc.) I would be ESPECIALLY grateful.  :)  
0
 

Expert Comment

by:prade
ID: 1163249
Any more wishes? Perhaps to develop a gui for the program?
(ESPECIALLY sarcastic)
0
 
LVL 10

Accepted Solution

by:
RONSLOW earned 200 total points
ID: 1163250
I assume then that my above comment is an answer..

here 'tis again so you can grade me -- Thanks :-)

/* this is the token type for the token stream */
enum OpType {
ENDEXPR = 0,
ADD = '+',
SUB = '-',
MUL = '*',
DIV = '/',
LPAREN = '(',
RPAREN = ')',
CONST = '#'
};

/* here is a token */
struct Token {
OpType op; /* one of the above symbols */
double number; /* if op==CONST */
};

/* here is a node for the expression tree */
struct Node {
OpType op; /* type of node (operation or constant) */
double number; /* if op==CONST */
struct Node* pLhs; /* lhs sub-expression tree */
struct Node* pRhs; /* rhs sub-expression tree */
}

/* create a new node with specified op type and args */
Node* NewNode (OpType op, Node* pLhs, Node* pRhs) {
Node* pNode = malloc(sizeof(Node));
pNode->op = op;
pNode->number = 0.0;
pNode->pLhs = pLhs;
pNode->pRhs = pRhs;
return pNode;
}

/* create a new node for a constant number */
Node* NewNodeNumber (double number) {
Node* pNode = malloc(sizeof(Node));
pNode->op = CONST;
pNode->number = number;
pNode->pLhs = NULL;
pNode->pRhs = NULL;
return pNode;
}

Node* IsConst(Node* pNode) {
return pNode && pNode->op == CONST;
}

Node* IsOp(Node* pNode) {
return pNode && pNode->op != CONST;
}

/* return Lhs pointer for node (if any) */
Node* Lhs(Node* pNode) {
return IsOp(pNode) ? pNode->pLhs : NULL;
}

/* return Rhs pointer for node (if any) */
Node* Rhs(Node* pNode) {
return IsOp(pNode) ? pNode->pRhs : NULL;
}

/* return number value for node (if any) */
double Number(Node* pNode) {
return IsConst(pNode) ? pNode->number : 0.0;
}

/* use this to delete a node and everything under it (for cleanup) */
void DeleteNode (Node* pNode) {
if (pNode) {
if (IsOp(pNode)) {
DeleteNode(Lhs(pNode));
DeleteNode(Rhs(pNode));
}
free(pNode);
}
}

/* assume an array of tokens has been built elswhere */
/* NOTE: token stream MUST finish with an ENDEXPR token */
/* then build an expression tree from the tokens */
/* and return a poitner to the root of the tree */
Node* BuildTree (Token tokens[]) {
int p = 0;
return BuildAddSub(tokens,&p);
}

/* build a tree for an add or subtract operation */
Node* BuildAddSub (Token tokens[], int& p) {
Node* pNode = BuildMulDiv(tokens,p);
while (pNode) {
OpType op = tokens[*p].op;
if (op == ADD || op == SUB) {
Node* pOther;
(*p)++;
pOther = BuildMulDiv(tokens,p);
pNode = NewNode(op,pNode,pOther);
} else {
break;
}
}
return pNode;
}

/* build a tree for a multiply or divide operation */
Node* BuildMulDiv (Token tokens[], int& p) {
Node* pNode = BuildSimple(tokens,p);
while (pNode) {
OpType op = tokens[*p].op;
if (op == MUL || op == DIV) {
Node* pOther;
(*p)++;
pOther = BuildSimple(tokens,p);
pNode = NewNode(op,pNode,pOther);
} else {
break;
}
}
return pNode;
}

/* build a tree for a () expression or a number */
Node* BuildSimple (Token tokens[], int& p) {
Node* pNode = NULL;
OpType op = tokens[*p].op;
switch (op) {
case LPAREN:
(*p)++;
pNode = BuildAddSub(tokens,p); /* recursion here */
op = tokens[*p].op;
if (op != RPAREN) return NULL;
(*p)++;
break;
case CONST:
pNode = NewNodeNubmer(tokens[*p].number);
(*p)++;
break;
}
return pNode;
}

0
 

Author Comment

by:saspencer
ID: 1163251
Thanks again for your help.
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

I previously wrote an article addressing the use of UBCD4WIN and SARDU. All are great, but I have always been an advocate of SARDU. Recently it was suggested that I go back and take a look at Easy2Boot in comparison.
This article describes how to use the timestamp of existing data in a database to allow Tableau to calculate the prior work day instead of relying on case statements or if statements to calculate the days of the week.
This video demonstrates how to use each tool, their shortcuts, where and when to use them, and how to use the keyboard to improve workflow.
This video will demonstrate how to find the puppet warp tool from the edit menu and where to put the points to edit.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now