Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

a little Java lexer regex question

Posted on 2014-03-07
16
Medium Priority
?
429 Views
Last Modified: 2014-03-08
hi Guys,

I'm writing a little lexer for a simple calculator. Nothing fancy. I'm trying to learn, so I want to write it myself.

I have this pattern:

(?<SIN>(?!sin\\()([-]?[0-9.]+)(?=\\)))

Open in new window


that will match the number inside the parenthesis of sin(xx), where x is any number, so "sin(2.3)" will give me this token: ["2.3"]

That would be great, except, my matcher also catches parentheses by these expressions:

(?<LEFTPARENS>\\()|(?<RIGHTTPARENS>\\))

Open in new window


So I end up with these tokens: ["(", "2.3", ")"] but I only want ["2.3"]

Is there a way to tell the matcher to skip the part of the string that is matched by another group?
0
Comment
Question by:Kyle Hamilton
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
  • 4
16 Comments
 
LVL 31

Expert Comment

by:farzanj
ID: 39914187
I don't have your code, so I don't know exactly what you are trying to do

But if you want to capture only what is in parentheses of sin, this works for me

    public static void main(String args[])
    {
        Pattern p = Pattern.compile("sin\\((-?\\d+(?:\\.\\d+)?)\\)");
        String  s = "sin(2.3)";

        Matcher m = p.matcher(s);

        if (m.find())
        {
            System.out.println(m.group(1));
        }
    }

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 39914348
It must be said that regex is not the right tool for the job. For instance, the last code doesn't match
 'sin(2.3 )'
 'sin(.23)'
and that's just a very simple expression. If your objective is to learn regex then this is not really a good context in which to do it
0
 
LVL 31

Expert Comment

by:farzanj
ID: 39914496
This would catch it:
sin\\((-?\\d*(?:\\.\\d+)?)\\)
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 86

Expert Comment

by:CEHJ
ID: 39914508
Yes, but beyond simple expressions, the approach really doesn't scale
0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39914533
the objective is to write a lexer/tokenizer for simple math expressions. asfaik regular expressions are one way to do that. i dont know of other ways, but what i dont want to do is write a psuedo state machine that reads the input letter by letter.

the expression for extracting the number from the sin function is not the issue.

the issue is that besides the sin expression i have a parenthesis expression for picking up parentheses. my question is, how to pick up parentheses but not ones already picked up by other exressions.


given this input string:

(1+2)*sin(2.3)

i want to end up with these tokens:

(, 1, 2, +, ), *, 2.3

i will post my whole pattern in a bit. i'm mot at my computer.

thanks
0
 
LVL 31

Assisted Solution

by:farzanj
farzanj earned 1000 total points
ID: 39914542
Hi CEHJ,  Just a question.  What is YACC and what is it based on?
0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39914546
or more precisely, these tokens:

LEFTPARENS: (
NUMBER: 1
OPERATOR: +
NUMBER: 2
RIGHTPARENS: )
OPERATOR: *
SIN: 2.3


(i have the order wrong in previous post. i don't want to confuse things. plus sign should have come before the 2. sorry. )
0
 
LVL 31

Expert Comment

by:farzanj
ID: 39914558
Something like:
(\\()?(\\d+([+-*\\/]\\d+)*)(\\))?[*\\/]sin\\((-?\\d*(?:\\.\\d+)?)\\)

Open in new window

0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39914584
maybe i'm gonna need that FSM after all. looks like i was skipping the "scanner" phase of the tokenization process, and  going straight to the "evaluator" phase.

tokenization section:
http://en.m.wikipedia.org/wiki/Lexical_analyzer
0
 
LVL 31

Expert Comment

by:farzanj
ID: 39914598
I don't know what you are trying to do.  I answered whatever you asked.  Regular expression implements FSM.  This is how compilers are written.  YACC is a tool used to write compilers and it creates parsers for programming languages.  It uses regex to write BNF.
0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39914610
hi farzanj,

I appreciate all the help. I'm sorry if my question is not clear - I should have given it a different title. My question doesn't have to do with regex, it has to do with lexical analysis and tokenization.

I'm aware the regular expressions are implemented with FSMs. When I mentioned the FSM before, it was not referring to a regex engine implementation. It was the "scanner" phase of the tokenization process which employs its own FSM.

For now, I decided not to do everything in one step, and catch the entire sin(x) function then process it again later to extract the number. To do this whole project "properly" I would rewrite it according to the wikipedia page I posted earlier.

My code is on github, if that helps:
https://github.com/kyleiwaniec/cos210/blob/master/Spring2014/Calculator/InfixToPostfix.java

with this sample input:

(2+3)*sin(2.3)

I now get:

OPERATOR : (
NUMBER : 2
OPERATOR : +
NUMBER : 3
OPERATOR : )
OPERATOR : *
SIN : sin(2.3)  // process again to extract number


( I am not trying to write a full fledged lexer/parser. Just something small for a very simple calculator. At the moment all the code lives in one file, that's just for convenience ).
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 1000 total points
ID: 39914856
You might like to look at https://javacc.java.net/ though i haven't used it myself
0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39914902
Thanks CEHJ.

That's much more than what I was looking for. I wanted to write a lexer from scratch - a very basic one.

I think I better close this question. I didn't phrase it properly, and it's probably too broad a question anyway.
0
 
LVL 25

Author Closing Comment

by:Kyle Hamilton
ID: 39915001
I'm assigning points this way because it led me to try to clarify my own question in my own mind. Thanks for the help.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 39915026
OK. Maybe you can give me some lessons on it once you're au fait ;)
0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39915032
lol - dont hold your breath!
:))
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Today, the web development industry is booming, and many people consider it to be their vocation. The question you may be asking yourself is – how do I become a web developer?
The SignAloud Glove is capable of translating American Sign Language signs into text and audio.
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
How to fix incompatible JVM issue while installing Eclipse While installing Eclipse in windows, got one error like above and unable to proceed with the installation. This video describes how to successfully install Eclipse. How to solve incompa…
Suggested Courses

610 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question