?
Solved

a little Java lexer regex question

Posted on 2014-03-07
16
Medium Priority
?
426 Views
Last Modified: 2014-03-08
hi Guys,

I'm writing a little lexer for a simple calculator. Nothing fancy. I'm trying to learn, so I want to write it myself.

I have this pattern:

(?<SIN>(?!sin\\()([-]?[0-9.]+)(?=\\)))

Open in new window


that will match the number inside the parenthesis of sin(xx), where x is any number, so "sin(2.3)" will give me this token: ["2.3"]

That would be great, except, my matcher also catches parentheses by these expressions:

(?<LEFTPARENS>\\()|(?<RIGHTTPARENS>\\))

Open in new window


So I end up with these tokens: ["(", "2.3", ")"] but I only want ["2.3"]

Is there a way to tell the matcher to skip the part of the string that is matched by another group?
0
Comment
Question by:Kyle Hamilton
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
  • 4
16 Comments
 
LVL 31

Expert Comment

by:farzanj
ID: 39914187
I don't have your code, so I don't know exactly what you are trying to do

But if you want to capture only what is in parentheses of sin, this works for me

    public static void main(String args[])
    {
        Pattern p = Pattern.compile("sin\\((-?\\d+(?:\\.\\d+)?)\\)");
        String  s = "sin(2.3)";

        Matcher m = p.matcher(s);

        if (m.find())
        {
            System.out.println(m.group(1));
        }
    }

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 39914348
It must be said that regex is not the right tool for the job. For instance, the last code doesn't match
 'sin(2.3 )'
 'sin(.23)'
and that's just a very simple expression. If your objective is to learn regex then this is not really a good context in which to do it
0
 
LVL 31

Expert Comment

by:farzanj
ID: 39914496
This would catch it:
sin\\((-?\\d*(?:\\.\\d+)?)\\)
0
Optimize your web performance

What's in the eBook?
- Full list of reasons for poor performance
- Ultimate measures to speed things up
- Primary web monitoring types
- KPIs you should be monitoring in order to increase your ROI

 
LVL 86

Expert Comment

by:CEHJ
ID: 39914508
Yes, but beyond simple expressions, the approach really doesn't scale
0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39914533
the objective is to write a lexer/tokenizer for simple math expressions. asfaik regular expressions are one way to do that. i dont know of other ways, but what i dont want to do is write a psuedo state machine that reads the input letter by letter.

the expression for extracting the number from the sin function is not the issue.

the issue is that besides the sin expression i have a parenthesis expression for picking up parentheses. my question is, how to pick up parentheses but not ones already picked up by other exressions.


given this input string:

(1+2)*sin(2.3)

i want to end up with these tokens:

(, 1, 2, +, ), *, 2.3

i will post my whole pattern in a bit. i'm mot at my computer.

thanks
0
 
LVL 31

Assisted Solution

by:farzanj
farzanj earned 1000 total points
ID: 39914542
Hi CEHJ,  Just a question.  What is YACC and what is it based on?
0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39914546
or more precisely, these tokens:

LEFTPARENS: (
NUMBER: 1
OPERATOR: +
NUMBER: 2
RIGHTPARENS: )
OPERATOR: *
SIN: 2.3


(i have the order wrong in previous post. i don't want to confuse things. plus sign should have come before the 2. sorry. )
0
 
LVL 31

Expert Comment

by:farzanj
ID: 39914558
Something like:
(\\()?(\\d+([+-*\\/]\\d+)*)(\\))?[*\\/]sin\\((-?\\d*(?:\\.\\d+)?)\\)

Open in new window

0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39914584
maybe i'm gonna need that FSM after all. looks like i was skipping the "scanner" phase of the tokenization process, and  going straight to the "evaluator" phase.

tokenization section:
http://en.m.wikipedia.org/wiki/Lexical_analyzer
0
 
LVL 31

Expert Comment

by:farzanj
ID: 39914598
I don't know what you are trying to do.  I answered whatever you asked.  Regular expression implements FSM.  This is how compilers are written.  YACC is a tool used to write compilers and it creates parsers for programming languages.  It uses regex to write BNF.
0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39914610
hi farzanj,

I appreciate all the help. I'm sorry if my question is not clear - I should have given it a different title. My question doesn't have to do with regex, it has to do with lexical analysis and tokenization.

I'm aware the regular expressions are implemented with FSMs. When I mentioned the FSM before, it was not referring to a regex engine implementation. It was the "scanner" phase of the tokenization process which employs its own FSM.

For now, I decided not to do everything in one step, and catch the entire sin(x) function then process it again later to extract the number. To do this whole project "properly" I would rewrite it according to the wikipedia page I posted earlier.

My code is on github, if that helps:
https://github.com/kyleiwaniec/cos210/blob/master/Spring2014/Calculator/InfixToPostfix.java

with this sample input:

(2+3)*sin(2.3)

I now get:

OPERATOR : (
NUMBER : 2
OPERATOR : +
NUMBER : 3
OPERATOR : )
OPERATOR : *
SIN : sin(2.3)  // process again to extract number


( I am not trying to write a full fledged lexer/parser. Just something small for a very simple calculator. At the moment all the code lives in one file, that's just for convenience ).
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 1000 total points
ID: 39914856
You might like to look at https://javacc.java.net/ though i haven't used it myself
0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39914902
Thanks CEHJ.

That's much more than what I was looking for. I wanted to write a lexer from scratch - a very basic one.

I think I better close this question. I didn't phrase it properly, and it's probably too broad a question anyway.
0
 
LVL 25

Author Closing Comment

by:Kyle Hamilton
ID: 39915001
I'm assigning points this way because it led me to try to clarify my own question in my own mind. Thanks for the help.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 39915026
OK. Maybe you can give me some lessons on it once you're au fait ;)
0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39915032
lol - dont hold your breath!
:))
0

Featured Post

Get real performance insights from real users

Key features:
- Total Pages Views and Load times
- Top Pages Viewed and Load Times
- Real Time Site Page Build Performance
- Users’ Browser and Platform Performance
- Geographic User Breakdown
- And more

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
Q&A with Course Creator, Mark Lassoff, on the importance of HTML5 in the career of a modern-day developer.
This tutorial covers a step-by-step guide to install VisualVM launcher in eclipse.
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
Suggested Courses
Course of the Month11 days, 4 hours left to enroll

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question