Solved

a little Java lexer regex question

Posted on 2014-03-07
16
418 Views
Last Modified: 2014-03-08
hi Guys,

I'm writing a little lexer for a simple calculator. Nothing fancy. I'm trying to learn, so I want to write it myself.

I have this pattern:

(?<SIN>(?!sin\\()([-]?[0-9.]+)(?=\\)))

Open in new window


that will match the number inside the parenthesis of sin(xx), where x is any number, so "sin(2.3)" will give me this token: ["2.3"]

That would be great, except, my matcher also catches parentheses by these expressions:

(?<LEFTPARENS>\\()|(?<RIGHTTPARENS>\\))

Open in new window


So I end up with these tokens: ["(", "2.3", ")"] but I only want ["2.3"]

Is there a way to tell the matcher to skip the part of the string that is matched by another group?
0
Comment
Question by:Kyle Hamilton
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
  • 4
16 Comments
 
LVL 31

Expert Comment

by:farzanj
ID: 39914187
I don't have your code, so I don't know exactly what you are trying to do

But if you want to capture only what is in parentheses of sin, this works for me

    public static void main(String args[])
    {
        Pattern p = Pattern.compile("sin\\((-?\\d+(?:\\.\\d+)?)\\)");
        String  s = "sin(2.3)";

        Matcher m = p.matcher(s);

        if (m.find())
        {
            System.out.println(m.group(1));
        }
    }

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 39914348
It must be said that regex is not the right tool for the job. For instance, the last code doesn't match
 'sin(2.3 )'
 'sin(.23)'
and that's just a very simple expression. If your objective is to learn regex then this is not really a good context in which to do it
0
 
LVL 31

Expert Comment

by:farzanj
ID: 39914496
This would catch it:
sin\\((-?\\d*(?:\\.\\d+)?)\\)
0
SharePoint Admin?

Enable Your Employees To Focus On The Core With Intuitive Onscreen Guidance That is With You At The Moment of Need.

 
LVL 86

Expert Comment

by:CEHJ
ID: 39914508
Yes, but beyond simple expressions, the approach really doesn't scale
0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39914533
the objective is to write a lexer/tokenizer for simple math expressions. asfaik regular expressions are one way to do that. i dont know of other ways, but what i dont want to do is write a psuedo state machine that reads the input letter by letter.

the expression for extracting the number from the sin function is not the issue.

the issue is that besides the sin expression i have a parenthesis expression for picking up parentheses. my question is, how to pick up parentheses but not ones already picked up by other exressions.


given this input string:

(1+2)*sin(2.3)

i want to end up with these tokens:

(, 1, 2, +, ), *, 2.3

i will post my whole pattern in a bit. i'm mot at my computer.

thanks
0
 
LVL 31

Assisted Solution

by:farzanj
farzanj earned 250 total points
ID: 39914542
Hi CEHJ,  Just a question.  What is YACC and what is it based on?
0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39914546
or more precisely, these tokens:

LEFTPARENS: (
NUMBER: 1
OPERATOR: +
NUMBER: 2
RIGHTPARENS: )
OPERATOR: *
SIN: 2.3


(i have the order wrong in previous post. i don't want to confuse things. plus sign should have come before the 2. sorry. )
0
 
LVL 31

Expert Comment

by:farzanj
ID: 39914558
Something like:
(\\()?(\\d+([+-*\\/]\\d+)*)(\\))?[*\\/]sin\\((-?\\d*(?:\\.\\d+)?)\\)

Open in new window

0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39914584
maybe i'm gonna need that FSM after all. looks like i was skipping the "scanner" phase of the tokenization process, and  going straight to the "evaluator" phase.

tokenization section:
http://en.m.wikipedia.org/wiki/Lexical_analyzer
0
 
LVL 31

Expert Comment

by:farzanj
ID: 39914598
I don't know what you are trying to do.  I answered whatever you asked.  Regular expression implements FSM.  This is how compilers are written.  YACC is a tool used to write compilers and it creates parsers for programming languages.  It uses regex to write BNF.
0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39914610
hi farzanj,

I appreciate all the help. I'm sorry if my question is not clear - I should have given it a different title. My question doesn't have to do with regex, it has to do with lexical analysis and tokenization.

I'm aware the regular expressions are implemented with FSMs. When I mentioned the FSM before, it was not referring to a regex engine implementation. It was the "scanner" phase of the tokenization process which employs its own FSM.

For now, I decided not to do everything in one step, and catch the entire sin(x) function then process it again later to extract the number. To do this whole project "properly" I would rewrite it according to the wikipedia page I posted earlier.

My code is on github, if that helps:
https://github.com/kyleiwaniec/cos210/blob/master/Spring2014/Calculator/InfixToPostfix.java

with this sample input:

(2+3)*sin(2.3)

I now get:

OPERATOR : (
NUMBER : 2
OPERATOR : +
NUMBER : 3
OPERATOR : )
OPERATOR : *
SIN : sin(2.3)  // process again to extract number


( I am not trying to write a full fledged lexer/parser. Just something small for a very simple calculator. At the moment all the code lives in one file, that's just for convenience ).
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 250 total points
ID: 39914856
You might like to look at https://javacc.java.net/ though i haven't used it myself
0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39914902
Thanks CEHJ.

That's much more than what I was looking for. I wanted to write a lexer from scratch - a very basic one.

I think I better close this question. I didn't phrase it properly, and it's probably too broad a question anyway.
0
 
LVL 25

Author Closing Comment

by:Kyle Hamilton
ID: 39915001
I'm assigning points this way because it led me to try to clarify my own question in my own mind. Thanks for the help.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 39915026
OK. Maybe you can give me some lessons on it once you're au fait ;)
0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39915032
lol - dont hold your breath!
:))
0

Featured Post

Online Training Solution

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action. Forget about retraining and skyrocket knowledge retention rates.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

This is an explanation of a simple data model to help parse a JSON feed
Displaying an arrayList in a listView using the default adapter is rarely the best solution. To get full control of your display data, and to be able to refresh it after editing, requires the use of a custom adapter.
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question