Link to home
Start Free TrialLog in
Avatar of MarFarMa
MarFarMa

asked on

regular expression in c# - capture pattern jan10 or jan08 but not janis

I have descriptions that contain strings like:

"Jan09 blah blah blah"

or

"blah blah feb12 blah"

Basically the string may contain a token in the form MMMYY where MMM is a three letter month abbrieviation, and YY is a two digit year.   This token form may appear 0-2 times in the string (if it appears twice, there will be two months listed, ie apr07-oct07)

I need, for each string, to determine which month appears in the string - if any.   I want to write a c# method that will take the string and return either null or the three character month code.

So, if jan07 appears, or jan14 appears, I want it to return jan.  But if some other value that is missing the year digits appears, such as "janis feb08" - the test should not return feb.  If two months appear, I want the method to return the first month code - "blah apr07-oct07 blah blah" should have a value of "apr"

I figure I need a sequence of regular expressions that test for each month code in sequence - and, if found, note the index location it was found at.  If more than one was found, return the one that has the smaller index location value.

So, I need a c# regular expression test that will return true for both "jan08" and "jan12" , but false for "janis" -- and some way to determine the index it was found at, so I'd get an index location value of 11 and not 0 for this string: "janis blah jan08"  (if I counted it right).

Or any other code that will get the job done.

Thanks!
Avatar of ddrudik
ddrudik
Flag of United States of America image

(?<!-)(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)(?=\d\d)

"blah apr07-oct07 blah blah":

Array
(
    [0] => Array
        (
            [0] => apr
        )

)

Avatar of MarFarMa
MarFarMa

ASKER

I don't get it.  Is (?<!-) C# syntax?  It seems more like perl.  Same with the Array construct.  I don't begin to understand what it's doing, or how it's related to the first line of code.

If it is C#, then I need baby steps, because I've never seen a code like this before, and I don't know how to use it.  If it's not, I need help to translate it into C#.

Thanks.
ASKER CERTIFIED SOLUTION
Avatar of ddrudik
ddrudik
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I just ran it in the debugger - works a treat.  How is it that it only matches the first occurance in the string?
The array construct was only to show the matches received with your sample text and my regex pattern, it is not C# code.
(?<!-)

This regex construct says "match but do not capture absence of '-'"  Anything following it (our date construct) would fail to match if it followed a '-'.
Thanks for the question and the points.
Ok - just tested this:

MatchCollection matchColl = reg.Matches("janis oct07apr07 blah blah");

returns oct and apr - but since oct is first in the array, I'm still OK if I just take the first element.  Was that coincidence?  or can I rely on it?

ie - if I have multiple matches, the first one in the array will have been the first in order in the string?
If the source could vary you might be best in matching on:
Regex reg = new Regex(@"(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)(?=\d\d)", RegexOptions.IgnoreCase);

Then just use just m[0].Captures[0].Value, ignoring the remaining matches, if any.  The captures will always be in the order found in the string, from start to end.