Regex Library (free)

I'm looking for a good regex library that has the features of the boost::regex library, but that is easier to add to an existing application, and doesn't require a large library like boost.

I've tried using the CRegex class listed on CodeGuru, but that doesn't have modern regex feature support.
I've also tried MFC 7 CRegEx class, but that's also missing modern features.

Does any one know of a good free regex library that has modern regex feature support, and that can be added standalone?

Please, no google answer....!  I know how to google!
LVL 30
AxterAsked:
Who is Participating?
 
rstaveleyConnect With a Mentor Commented:
Check out http://www.pcre.org/ too. I can vouch for it. We're using it in some production code that ports on Win32 and Linux.
0
 
jkrConnect With a Mentor Commented:
0
 
AxterAuthor Commented:
I haven't tried that one yet.
I'll give it a test tonight.

The following are some of the regex features I'm looking for:
Word Boundaries: \bfoo\b
Repeating Character Classes: ([0-9])\1+
Limiting Repetition: [0-9]{2,4}
Shorthand Character Classes:\d\d\s\d\d

In reading the greta link, it does look like it support \d, but so did CRegEx, and yet CRegEx didn't support other shorthand character classes, like \s and \w
0
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
AxterAuthor Commented:
rstaveley,
Does it support the features I posted?
0
 
rstaveleyCommented:
Yes, note the comments about Unicode though at http://www.pcre.org/pcre.txt
0
 
AxterAuthor Commented:
jkr,
>>Have you tried GRETA (http://research.microsoft.com/projects/greta/)?

This implementation is pretty good, but I can't use it because the license is not compatible with a GPL licesnse application.

I'm working on the WinMerge project, which is distributed under GPL license.
We're trying to find a good replacement for current regex code.
0
 
AxterAuthor Commented:
>>Yes, note the comments about Unicode though at http://www.pcre.org/pcre.txt

There's a lot of info under Unicode section.
What specifically should I be looking for?
0
 
rstaveleyCommented:
Point 7:

       The character escapes \b, \B, \d, \D, \s, \S, \w, and  \W  correctly
       test  characters of any code value, but the characters that PCRE recog-
       nizes as digits, spaces, or word characters  remain  the  same  set  as
       before, all with values less than 256. This remains true even when PCRE
       includes Unicode property support, because to do otherwise  would  slow
       down  PCRE in many common cases. If you really want to test for a wider
       sense of, say, "digit", you must use Unicode  property  tests  such  as
       \p{Nd}.
0
 
AxterAuthor Commented:
Thanks.

I'll test it out tonight, and see what the WinMerge group thinks.
0
 
rstaveleyCommented:
BTW... PCRE is the same library that PHP programmers use (http://uk.php.net/pcre)
0
 
rstaveleyCommented:
WinMerge is a great tool. Good luck with the project.
0
 
AxterAuthor Commented:
I'm not having much luck with PCRE.
The documentation is very poor, so I'm not sure how to set some of the arguments.
I'm trying to create a wrapper class that is similar to the CodeGuru CRegExp.
I'm only doing this so I can test existing code with different regex implementation.
Here's what I have so far, but it's not working:
#include "config.h"
#include "pcrecpp.h"

using pcrecpp::StringPiece;
using pcrecpp::RE;
using pcrecpp::RE_Options;
using pcrecpp::Hex;
using pcrecpp::Octal;
using pcrecpp::CRadix;
class CRegExp
{
public:
      CRegExp():m_re(NULL), m_len_last_match(0){}
      ~CRegExp(){delete m_re;}

      void RegComp( const TCHAR *re )
      {
            delete m_re;
            m_re = new pcrecpp::RE(re);
      }

      int RegFind(const TCHAR *str)
      {
            std::string Str = str;
            pcrecpp::StringPiece input(Str);
            string var;
            if (!m_re->Consume(&input, &var, &m_len_last_match))
                  return -1;
            return  input.data() - Str.c_str();
      }

      TCHAR* GetReplaceString( const TCHAR* sReplaceExp ){return 0;} //Caller is responsible for deleting return buffer

      template<class T>
            int ReplaceAll(const TCHAR* sSearchExp, const TCHAR* sReplaceExp, T& String)
      {
            return  0;
      }

      int GetFindLen()
      {
            return  m_len_last_match;
      }
private:
      pcrecpp::RE *m_re;
      int m_len_last_match;
};
0
 
AxterAuthor Commented:
Consume continues to return false, and it's not finding the pattern.
0
 
rstaveleyCommented:
Sorry for the late response. I'm under the cosh today. Have you tried looking at pcredemo.c in the source - i.e. http://prdownloads.sourceforge.net/gnuwin32/pcre-6.4-1-src.exe?download ? I've not tried the distributed C++ class.
0
 
AxterAuthor Commented:
I was trying to use the C++ class, but looking at the predemo.c file, it looks like the C code is far easier to use then the C++ class.
I'm not sure who put the C++ class together.  It has a very poor interface, and bad documentation.

I'll try out the C code tonight.

When I call pcre_exec, does it populate the ovector variable with all the matching locations?

How would I call a regex replace patern?
0
 
rstaveleyCommented:
> When I call pcre_exec, does it populate the ovector variable with all the matching locations?

Yes, vector = array in this library :-)

> How would I call a regex replace patern?

You need to call the pcre substring functions and do it yourself.
0
 
AxterAuthor Commented:
>>You need to call the pcre substring functions and do it yourself.

I'm not sure what you mean by do it myself.  Are you saying this library doesn't have regex replace patern logic?

The regex replace patern logic would be complex to implement.
0
 
rstaveleyCommented:
You get the substrings and you put them together. The library avoids mutating strings.

> The regex replace patern logic would be complex to implement.

Not as bad as you think. Take a look at the replace implementation in AFC for an example - http://www.koders.com/c/fid68FD24B5B8A620DBC0030A374BFD6A8B633DF196.aspx .
0
 
AxterAuthor Commented:
>>Take a look at the replace implementation in AFC for an example

It doesn't look like they're doing a patern on the replace string.  It looks like regex is only being perform on the search implementation, but not replace string.

A regex replace string looks like the following:
\1foofoo\2

See following link under table 2:
http://alkaline.vestris.com/docs/alkaline/acnf-regexp.html 
0
 
rstaveleyCommented:
> A regex replace string looks like...

I don't believee that regex replace strings have a standard to conform to. If we are all dancing to Perl's tune, we should be using operator =~, which is a challenge for C++ programmers. Once TR1 is supported by compilers, I guess we ought to expect regex_replace to function acording to the next C++ standard.
0
 
AxterAuthor Commented:
>>I don't believee that regex replace strings have a standard to conform to

Neither does regex search string.  Currently there is no ANSI/ISO type standard for regex.
However, the other regex implemenations I tested did support the common regex replace strings, and I need to use an implemenation that supports it, since the current WinMerge regex code does have limited support for it.
0
 
Harisha M GConnect With a Mentor Commented:
Axter, did you try using $1foofoo$2 instead of \1foofoo\2
0
 
AxterAuthor Commented:
>>Axter, did you try using $1foofoo$2 instead of \1foofoo\2

No, I haven't tried that.

I'll try it tonight.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.