Solved

Regex Library (free)

Posted on 2006-07-24
24
1,515 Views
Last Modified: 2013-12-14
I'm looking for a good regex library that has the features of the boost::regex library, but that is easier to add to an existing application, and doesn't require a large library like boost.

I've tried using the CRegex class listed on CodeGuru, but that doesn't have modern regex feature support.
I've also tried MFC 7 CRegEx class, but that's also missing modern features.

Does any one know of a good free regex library that has modern regex feature support, and that can be added standalone?

Please, no google answer....!  I know how to google!
0
Comment
Question by:Axter
24 Comments
 
LVL 86

Assisted Solution

by:jkr
jkr earned 100 total points
ID: 17170792
0
 
LVL 30

Author Comment

by:Axter
ID: 17170873
I haven't tried that one yet.
I'll give it a test tonight.

The following are some of the regex features I'm looking for:
Word Boundaries: \bfoo\b
Repeating Character Classes: ([0-9])\1+
Limiting Repetition: [0-9]{2,4}
Shorthand Character Classes:\d\d\s\d\d

In reading the greta link, it does look like it support \d, but so did CRegEx, and yet CRegEx didn't support other shorthand character classes, like \s and \w
0
 
LVL 17

Accepted Solution

by:
rstaveley earned 300 total points
ID: 17177072
Check out http://www.pcre.org/ too. I can vouch for it. We're using it in some production code that ports on Win32 and Linux.
0
 
LVL 30

Author Comment

by:Axter
ID: 17177089
rstaveley,
Does it support the features I posted?
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 17177108
Yes, note the comments about Unicode though at http://www.pcre.org/pcre.txt
0
 
LVL 30

Author Comment

by:Axter
ID: 17177115
jkr,
>>Have you tried GRETA (http://research.microsoft.com/projects/greta/)?

This implementation is pretty good, but I can't use it because the license is not compatible with a GPL licesnse application.

I'm working on the WinMerge project, which is distributed under GPL license.
We're trying to find a good replacement for current regex code.
0
 
LVL 30

Author Comment

by:Axter
ID: 17177136
>>Yes, note the comments about Unicode though at http://www.pcre.org/pcre.txt

There's a lot of info under Unicode section.
What specifically should I be looking for?
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 17177202
Point 7:

       The character escapes \b, \B, \d, \D, \s, \S, \w, and  \W  correctly
       test  characters of any code value, but the characters that PCRE recog-
       nizes as digits, spaces, or word characters  remain  the  same  set  as
       before, all with values less than 256. This remains true even when PCRE
       includes Unicode property support, because to do otherwise  would  slow
       down  PCRE in many common cases. If you really want to test for a wider
       sense of, say, "digit", you must use Unicode  property  tests  such  as
       \p{Nd}.
0
 
LVL 30

Author Comment

by:Axter
ID: 17177221
Thanks.

I'll test it out tonight, and see what the WinMerge group thinks.
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 17177237
BTW... PCRE is the same library that PHP programmers use (http://uk.php.net/pcre)
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 17177265
WinMerge is a great tool. Good luck with the project.
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 30

Author Comment

by:Axter
ID: 17180970
I'm not having much luck with PCRE.
The documentation is very poor, so I'm not sure how to set some of the arguments.
I'm trying to create a wrapper class that is similar to the CodeGuru CRegExp.
I'm only doing this so I can test existing code with different regex implementation.
Here's what I have so far, but it's not working:
#include "config.h"
#include "pcrecpp.h"

using pcrecpp::StringPiece;
using pcrecpp::RE;
using pcrecpp::RE_Options;
using pcrecpp::Hex;
using pcrecpp::Octal;
using pcrecpp::CRadix;
class CRegExp
{
public:
      CRegExp():m_re(NULL), m_len_last_match(0){}
      ~CRegExp(){delete m_re;}

      void RegComp( const TCHAR *re )
      {
            delete m_re;
            m_re = new pcrecpp::RE(re);
      }

      int RegFind(const TCHAR *str)
      {
            std::string Str = str;
            pcrecpp::StringPiece input(Str);
            string var;
            if (!m_re->Consume(&input, &var, &m_len_last_match))
                  return -1;
            return  input.data() - Str.c_str();
      }

      TCHAR* GetReplaceString( const TCHAR* sReplaceExp ){return 0;} //Caller is responsible for deleting return buffer

      template<class T>
            int ReplaceAll(const TCHAR* sSearchExp, const TCHAR* sReplaceExp, T& String)
      {
            return  0;
      }

      int GetFindLen()
      {
            return  m_len_last_match;
      }
private:
      pcrecpp::RE *m_re;
      int m_len_last_match;
};
0
 
LVL 30

Author Comment

by:Axter
ID: 17180974
Consume continues to return false, and it's not finding the pattern.
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 17185601
Sorry for the late response. I'm under the cosh today. Have you tried looking at pcredemo.c in the source - i.e. http://prdownloads.sourceforge.net/gnuwin32/pcre-6.4-1-src.exe?download ? I've not tried the distributed C++ class.
0
 
LVL 30

Author Comment

by:Axter
ID: 17185777
I was trying to use the C++ class, but looking at the predemo.c file, it looks like the C code is far easier to use then the C++ class.
I'm not sure who put the C++ class together.  It has a very poor interface, and bad documentation.

I'll try out the C code tonight.

When I call pcre_exec, does it populate the ovector variable with all the matching locations?

How would I call a regex replace patern?
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 17186196
> When I call pcre_exec, does it populate the ovector variable with all the matching locations?

Yes, vector = array in this library :-)

> How would I call a regex replace patern?

You need to call the pcre substring functions and do it yourself.
0
 
LVL 30

Author Comment

by:Axter
ID: 17186770
>>You need to call the pcre substring functions and do it yourself.

I'm not sure what you mean by do it myself.  Are you saying this library doesn't have regex replace patern logic?

The regex replace patern logic would be complex to implement.
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 17186975
You get the substrings and you put them together. The library avoids mutating strings.

> The regex replace patern logic would be complex to implement.

Not as bad as you think. Take a look at the replace implementation in AFC for an example - http://www.koders.com/c/fid68FD24B5B8A620DBC0030A374BFD6A8B633DF196.aspx .
0
 
LVL 30

Author Comment

by:Axter
ID: 17187052
>>Take a look at the replace implementation in AFC for an example

It doesn't look like they're doing a patern on the replace string.  It looks like regex is only being perform on the search implementation, but not replace string.

A regex replace string looks like the following:
\1foofoo\2

See following link under table 2:
http://alkaline.vestris.com/docs/alkaline/acnf-regexp.html
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 17189140
> A regex replace string looks like...

I don't believee that regex replace strings have a standard to conform to. If we are all dancing to Perl's tune, we should be using operator =~, which is a challenge for C++ programmers. Once TR1 is supported by compilers, I guess we ought to expect regex_replace to function acording to the next C++ standard.
0
 
LVL 30

Author Comment

by:Axter
ID: 17189161
>>I don't believee that regex replace strings have a standard to conform to

Neither does regex search string.  Currently there is no ANSI/ISO type standard for regex.
However, the other regex implemenations I tested did support the common regex replace strings, and I need to use an implemenation that supports it, since the current WinMerge regex code does have limited support for it.
0
 
LVL 37

Assisted Solution

by:Harisha M G
Harisha M G earned 100 total points
ID: 17189596
Axter, did you try using $1foofoo$2 instead of \1foofoo\2
0
 
LVL 30

Author Comment

by:Axter
ID: 17189672
>>Axter, did you try using $1foofoo$2 instead of \1foofoo\2

No, I haven't tried that.

I'll try it tonight.
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Many modern programming languages support the concept of a property -- a class member that combines characteristics of both a data member and a method.  These are sometimes called "smart fields" because you can add logic that is applied automaticall…
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
The viewer will learn how to use and create keystrokes in Netbeans IDE 8.0 for Windows.
The goal of the video will be to teach the user the difference and consequence of passing data by value vs passing data by reference in C++. An example of passing data by value as well as an example of passing data by reference will be be given. Bot…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now