Solved

a little Java lexer regex question

Posted on 2014-03-07
16
417 Views
Last Modified: 2014-03-08
hi Guys,

I'm writing a little lexer for a simple calculator. Nothing fancy. I'm trying to learn, so I want to write it myself.

I have this pattern:

(?<SIN>(?!sin\\()([-]?[0-9.]+)(?=\\)))

Open in new window


that will match the number inside the parenthesis of sin(xx), where x is any number, so "sin(2.3)" will give me this token: ["2.3"]

That would be great, except, my matcher also catches parentheses by these expressions:

(?<LEFTPARENS>\\()|(?<RIGHTTPARENS>\\))

Open in new window


So I end up with these tokens: ["(", "2.3", ")"] but I only want ["2.3"]

Is there a way to tell the matcher to skip the part of the string that is matched by another group?
0
Comment
Question by:Kyle Hamilton
  • 7
  • 5
  • 4
16 Comments
 
LVL 31

Expert Comment

by:farzanj
ID: 39914187
I don't have your code, so I don't know exactly what you are trying to do

But if you want to capture only what is in parentheses of sin, this works for me

    public static void main(String args[])
    {
        Pattern p = Pattern.compile("sin\\((-?\\d+(?:\\.\\d+)?)\\)");
        String  s = "sin(2.3)";

        Matcher m = p.matcher(s);

        if (m.find())
        {
            System.out.println(m.group(1));
        }
    }

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 39914348
It must be said that regex is not the right tool for the job. For instance, the last code doesn't match
 'sin(2.3 )'
 'sin(.23)'
and that's just a very simple expression. If your objective is to learn regex then this is not really a good context in which to do it
0
 
LVL 31

Expert Comment

by:farzanj
ID: 39914496
This would catch it:
sin\\((-?\\d*(?:\\.\\d+)?)\\)
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
LVL 86

Expert Comment

by:CEHJ
ID: 39914508
Yes, but beyond simple expressions, the approach really doesn't scale
0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39914533
the objective is to write a lexer/tokenizer for simple math expressions. asfaik regular expressions are one way to do that. i dont know of other ways, but what i dont want to do is write a psuedo state machine that reads the input letter by letter.

the expression for extracting the number from the sin function is not the issue.

the issue is that besides the sin expression i have a parenthesis expression for picking up parentheses. my question is, how to pick up parentheses but not ones already picked up by other exressions.


given this input string:

(1+2)*sin(2.3)

i want to end up with these tokens:

(, 1, 2, +, ), *, 2.3

i will post my whole pattern in a bit. i'm mot at my computer.

thanks
0
 
LVL 31

Assisted Solution

by:farzanj
farzanj earned 250 total points
ID: 39914542
Hi CEHJ,  Just a question.  What is YACC and what is it based on?
0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39914546
or more precisely, these tokens:

LEFTPARENS: (
NUMBER: 1
OPERATOR: +
NUMBER: 2
RIGHTPARENS: )
OPERATOR: *
SIN: 2.3


(i have the order wrong in previous post. i don't want to confuse things. plus sign should have come before the 2. sorry. )
0
 
LVL 31

Expert Comment

by:farzanj
ID: 39914558
Something like:
(\\()?(\\d+([+-*\\/]\\d+)*)(\\))?[*\\/]sin\\((-?\\d*(?:\\.\\d+)?)\\)

Open in new window

0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39914584
maybe i'm gonna need that FSM after all. looks like i was skipping the "scanner" phase of the tokenization process, and  going straight to the "evaluator" phase.

tokenization section:
http://en.m.wikipedia.org/wiki/Lexical_analyzer
0
 
LVL 31

Expert Comment

by:farzanj
ID: 39914598
I don't know what you are trying to do.  I answered whatever you asked.  Regular expression implements FSM.  This is how compilers are written.  YACC is a tool used to write compilers and it creates parsers for programming languages.  It uses regex to write BNF.
0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39914610
hi farzanj,

I appreciate all the help. I'm sorry if my question is not clear - I should have given it a different title. My question doesn't have to do with regex, it has to do with lexical analysis and tokenization.

I'm aware the regular expressions are implemented with FSMs. When I mentioned the FSM before, it was not referring to a regex engine implementation. It was the "scanner" phase of the tokenization process which employs its own FSM.

For now, I decided not to do everything in one step, and catch the entire sin(x) function then process it again later to extract the number. To do this whole project "properly" I would rewrite it according to the wikipedia page I posted earlier.

My code is on github, if that helps:
https://github.com/kyleiwaniec/cos210/blob/master/Spring2014/Calculator/InfixToPostfix.java

with this sample input:

(2+3)*sin(2.3)

I now get:

OPERATOR : (
NUMBER : 2
OPERATOR : +
NUMBER : 3
OPERATOR : )
OPERATOR : *
SIN : sin(2.3)  // process again to extract number


( I am not trying to write a full fledged lexer/parser. Just something small for a very simple calculator. At the moment all the code lives in one file, that's just for convenience ).
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 250 total points
ID: 39914856
You might like to look at https://javacc.java.net/ though i haven't used it myself
0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39914902
Thanks CEHJ.

That's much more than what I was looking for. I wanted to write a lexer from scratch - a very basic one.

I think I better close this question. I didn't phrase it properly, and it's probably too broad a question anyway.
0
 
LVL 25

Author Closing Comment

by:Kyle Hamilton
ID: 39915001
I'm assigning points this way because it led me to try to clarify my own question in my own mind. Thanks for the help.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 39915026
OK. Maybe you can give me some lessons on it once you're au fait ;)
0
 
LVL 25

Author Comment

by:Kyle Hamilton
ID: 39915032
lol - dont hold your breath!
:))
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A short article about a problem I had getting the GPS LocationListener working.
Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question