[Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1733
  • Last Modified:

re2c example program

I'd like to evaluate the re2c utility (regular expression 2 c). The documentation indicates it can take a script that defines a regular expression and generate the "scanner" routine.
www.re2c.org

There are some examples, but I'm finding them more obscure than I care to wrestle with. I want to be able to evaluate re2c and compare it to the boost regex-related libraries (regex, xpressive, and spirit) and to "hand-tuned" recognizers.

Here's a "stub" of the program I would like to get working:

char* scanner(char* str)  // generated by re2c
{
}

void main(void)
{
   char  testStr[] =
          "Alternate days of the week are Tue and Thursday and Sat and Monday. "
          "And then Monday and Wed and Friday and Sun. "

   boost::regex reg("((Sunday|Sun)|(Monday|Mon)|(Tuesday|Tue)|"
             "(Wednesday|Wed)|(Thursday|Thu)|(Friday|Fri)|(Saturday|Sat))");

  int pos = 0;
  char* result;

  while (result = scanner(&testStr[pos]) != NULL) {
     printf("%.10s\n", result);  // not aware of what is returned
     pos += strlen(result);       // not aware of how to advance to next search
  }
}
0
newton-allan
Asked:
newton-allan
  • 3
  • 3
1 Solution
 
NopiusCommented:
I've just read the manual. It seems to me rather clear, how to use the parser.
It matches against first character, so you cannot use '^' in your regular expressions. It

Here is a working example for your regex:

#define NULL            ((char*) 0)

static char *q;

char *scan(char *p){
#define YYCTYPE         char
#define YYCURSOR        p
#define YYLIMIT         p
#define YYMARKER        q
#define YYFILL(n)
/*!re2c
  (("Sunday"|"Sun")|("Monday"|"Mon")|("Tuesday"|"Tue")|("Wednesday"|"Wed")|("Thursday"|"Thu")|("Friday"|"Fri")|("Saturday"|"Sat"))          {return YYCURSOR;}
  [\000-\377]     {return NULL;}
*/
}

int
main()
{
   char  *testStr =
          "Alternate days of the week are Tue and Thursday and Sat and Monday. "          "And then Monday and Wed and Friday and Sun. ";
  char *match;
  char *curr;
  char buff[32]; /* the longest possible match */

  curr=testStr;
  while (*curr != '\0') {
     match=scan(curr);
     if (match)
     {
      bzero(buff, sizeof buff);
      memcpy(buff, curr, q-curr);
      printf("res=%.10s\n", buff);  // not aware of what is returned
     }
     curr++;
  }
}
0
 
newton-allanAuthor Commented:
Wow ...

Couple of glitches, but otherwise does almost all of what I asked:

* bzero is non-standard (from mks?)
* Needs .h files for printf, memcpy, and memset (to replace bzero)
* redefinition of NULL gives warning

Two remaining questions:

* What should YYFILL(n) be? This becomes a "nop" with a compiler warning about "if statement being empty  ... empty controlled statement found; is this the intent?".

Also, I think this #define has something to do with the result always being three letters long instead of the complete token:
res=Tue¦¦¦¦¦¦¦
res=Thu¦¦¦¦¦¦¦
res=Sat¦¦¦¦¦¦¦
res=Mon¦¦¦¦¦¦¦
res=Mon¦¦¦¦¦¦¦
etc.

And finally (outside of the original question) ... can the scanner figure out and make known with "out" reference variables the "match-index", position/offset, and length of the token that was matched?
MatchIndex 0 = Sunday or Sun
MatchIndex 1 = Monday or Mon
2 = Tuesday or Tue
etc.

int matchIndex, len, pos;
char* res = scan(curr, &matchIndex, &len, &pos);

So that something like the following could be shown:
Found:       Pos     Length  MatchIndex
---------  -------    -------  ------
Tue            31         3          2
Thursday    39         8          4
etc.

Thanks VERY MUCH for your help on this. I was baffled.

0
 
NopiusCommented:
1) bzero is non-standard (from mks?)
yes, it's BSD specific, the same as memset(0).
2) Needs .h files for printf, memcpy, and memset (to replace bzero)
of course :)
3) redefinition of NULL gives warning
it's very very annoying :)

4) YYFILL(n) is used when you have no entire string in a buffer, but have a stream of characters (so you may need to feed portions of chars for each new call). Read manual here: http://www.re2c.org/manual.html section INTERFACE CODE

YYFILL(n)
    The generated code "calls" YYFILL when the buffer needs (re)filling: at least n additional characters should be provided. YYFILL should adjust YYCURSOR, YYLIMIT, YYMARKER and YYCTXMARKER as needed. Note that for typical programming languages n will be the length of the longest keyword plus one.

5) 'Also, I think this #define has something to do with the result always being three letters long instead of the complete token:'
Really regex 'matching' should give 'YES' or 'NO'. I didn't look deeper.
There are many equal forms of expressions, that will mutch the same string.
Result not always will be 3 letters (try to remove 'Tue' from REGEX string from example).

6) 'And finally (outside of the original question) ... can the scanner figure out and make known with "out" reference variables the "match-index", position/offset, and length of the token that was matched'

YYCURSOR should point to the first character of new token, on complition of scan() it will point to new token (for example after processing of 'Tuesday lalala' it will point to ' lalala')
Pos is original value of YYCURSOR, Length is expressed by YYMARKER minus original value of YYCURSOR.
MatchIndex is not available in this parser, BUT you can do complex parsing and return '1' for Mon or Monday, '2' for Tue or Tuesday. This technique is demonstrated in 'complex' example in the manual (at the end). So you may return any value for some token inside () of your REGEX.

0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
newton-allanAuthor Commented:
Terrific. Thanks for your help. I'm going to post a very similar question to get an example program that illustrates their -f flag (which involves YYGETSTATE and YYSETSTATE)
0
 
newton-allanAuthor Commented:
I stared at your suggested code (which I'll call DayOfWeekRecognizer.re) some more, and revised/simplified it to this based on your very valuable recommendations. It provides the length of the matching string as an "out" variable, and returns bool if a match was found.

// File DayOfWeekRecognizer.re
// re2c command line to generate
// re2c -s -i -b -s -oDayOfWeekRecognizer.cpp DayOfWeekRecognizer.re
// vc7.1 command line to compile/link:
// cl -O2 /DNDEBUG /D_CONSOLE /DWIN32 /D_MBSC DayofWeekRecognizer.cpp
#include "stdio.h"
#include "string.h"

static char *pBacktrackInfo;

#define YYCTYPE         char
#define YYCURSOR        pStrToScan
#define YYLIMIT         pStrToScan
#define YYMARKER        pBacktrackInfo
#define YYFILL(n)

bool RecognizeDayOfWeek(char *pStrToScan, int* pLen)
{
   char* pOrigStr = pStrToScan;
/*!re2c
  (("Sunday"|"Sun")|
   ("Monday"|"Mon")|
   ("Tuesday"|"Tues")|
   ("Wednesday"|"Wed")|
   ("Thursday"|"Thu")|
   ("Friday"|"Fri")|
   ("Saturday"|"Sat"))
  {
     *pLen = YYCURSOR - pOrigStr;
     return true;
  }
  [\000-\377]     {return false;}
*/
}

void main(void)
{
   char  *testStr =
          "Alternate days of the week are Tues and Thursday and Sat and Monday. "         
          "And then Monday and Wed and Friday and Sun. ";
  bool  bMatch;
  char *pCurTestStrPos = testStr;
  int  len;

  while (*pCurTestStrPos != '\0') {
     bMatch = RecognizeDayOfWeek(pCurTestStrPos, &len);
     if (bMatch)
     {
        printf("Day=%.*s  len: %d\n", len, pCurTestStrPos, len);
     }
     pCurTestStrPos++;
  }
}
0
 
NopiusCommented:
Thank you.
About statefull algorithm and -f flag, I recommend you to look inside sources,
that already use re2c (links some of them are listed on re2c site).
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 3
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now