Link to home
Start Free TrialLog in
Avatar of sjcu
sjcu

asked on

Pattern matching using boost::regex

I need to match some file names from a list. The ones that i need to match from the list has the following format:-
Number1_Number2.cas and Number1_Number2.das
example:-
8_78990.cas
8_98878.das

From the above examples the known quantities in the string are Number1(Code) and the file extension.In the above example the code is 8, and file extension cas and das.

I need some help with matching the above example by using `boost::regex`. I would like to use `boost::FOR_EACH` and iterate over the collection and apply the match criteria to each file name. The names that match will be pushed to a list. This way all the results will be in a list.

My match pattern code snippet looks like this
    boost::wregex fileFormat(L"(\\w+)\\_(\\w+)\\.(\\w+)");     
    boost::wsmatch result;  
    std::wstring fileName;
    for(col.begin.....the usual)
    {
      if(boost::regex_match(iterator.getName(), result, fileFormat, boost::match_extra)) 
       {
         //Store the matched strings in a container
       }
    
    }

Open in new window


This fileFormat is very generic and it does not satisfy the requirement of filtering by the known quantities, i.e Number1(Code) and file extension.

thanks in advance for all your help.
SOLUTION
Avatar of evilrix
evilrix
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sjcu
sjcu

ASKER

Thank you very much this is what i was looking for. I looked at the regex_iterator and i am confused, as to where the result is stored. Can you please give me an example. ?
The iterator is the result for that match. Next time the iterator is incremented it becomes the result for the next match. The example given in the boost documentation shows how this works.

You'll see that the for_each STL algorithm is used to call the regex callback for each iteration. What's actually passed to the callback is just a dereferenced iterator.

#include <string>
#include <map>
#include <fstream>
#include <iostream>
#include <boost/regex.hpp>

using namespace std;

// purpose:
// takes the contents of a file in the form of a string
// and searches for all the C++ class definitions, storing
// their locations in a map of strings/int's

typedef std::map<std::string, std::string::difference_type, std::less<std::string> > map_type;

const char* re =
   // possibly leading whitespace:   
   "^[[:space:]]*"
   // possible template declaration:
   "(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"
   // class or struct:
   "(class|struct)[[:space:]]*"
   // leading declspec macros etc:
   "("
      "\\<\\w+\\>"
      "("
         "[[:blank:]]*\\([^)]*\\)"
      ")?"
      "[[:space:]]*"
   ")*"
   // the class name
   "(\\<\\w*\\>)[[:space:]]*"
   // template specialisation parameters
   "(<[^;:{]+>)?[[:space:]]*"
   // terminate in { or :
   "(\\{|:[^;\\{()]*\\{)";


boost::regex expression(re);
map_type class_index;

bool regex_callback(const boost::match_results<std::string::const_iterator>& what)
{
   // what[0] contains the whole string
   // what[5] contains the class name.
   // what[6] contains the template specialisation if any.
   // add class name and position to map:
   class_index[what[5].str() + what[6].str()] = what.position(5);
   return true;
}

void load_file(std::string& s, std::istream& is)
{
   s.erase();
   s.reserve(is.rdbuf()->in_avail());
   char c;
   while(is.get(c))
   {
      if(s.capacity() == s.size())
         s.reserve(s.capacity() * 3);
      s.append(1, c);
   }
}

int main(int argc, const char** argv)
{
   std::string text;
   for(int i = 1; i < argc; ++i)
   {
      cout << "Processing file " << argv[i] << endl;
      std::ifstream fs(argv[i]);
      load_file(text, fs);
      // construct our iterators:
      boost::sregex_iterator m1(text.begin(), text.end(), expression);
      boost::sregex_iterator m2;
      std::for_each(m1, m2, &regex_callback);
      // copy results:
      cout << class_index.size() << " matches found" << endl;
      map_type::iterator c, d;
      c = class_index.begin();
      d = class_index.end();
      while(c != d)
      {
         cout << "class \"" << (*c).first << "\" found at index: " << (*c).second << endl;
         ++c;
      }
      class_index.erase(class_index.begin(), class_index.end());
   }
   return 0;
}

Open in new window

Avatar of sjcu

ASKER

I looked at this already, i wasn't sure how retrieve the data from the ierator..
boost::sregex_iterator m1(text.begin(), text.end(), expression)

so i asked for an example.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial