Solved

regular expression in c# - capture pattern jan10 or jan08 but not janis

Posted on 2007-11-27
9
435 Views
Last Modified: 2010-04-15
I have descriptions that contain strings like:

"Jan09 blah blah blah"

or

"blah blah feb12 blah"

Basically the string may contain a token in the form MMMYY where MMM is a three letter month abbrieviation, and YY is a two digit year.   This token form may appear 0-2 times in the string (if it appears twice, there will be two months listed, ie apr07-oct07)

I need, for each string, to determine which month appears in the string - if any.   I want to write a c# method that will take the string and return either null or the three character month code.

So, if jan07 appears, or jan14 appears, I want it to return jan.  But if some other value that is missing the year digits appears, such as "janis feb08" - the test should not return feb.  If two months appear, I want the method to return the first month code - "blah apr07-oct07 blah blah" should have a value of "apr"

I figure I need a sequence of regular expressions that test for each month code in sequence - and, if found, note the index location it was found at.  If more than one was found, return the one that has the smaller index location value.

So, I need a c# regular expression test that will return true for both "jan08" and "jan12" , but false for "janis" -- and some way to determine the index it was found at, so I'd get an index location value of 11 and not 0 for this string: "janis blah jan08"  (if I counted it right).

Or any other code that will get the job done.

Thanks!
0
Comment
Question by:MarFarMa
  • 6
  • 3
9 Comments
 
LVL 27

Expert Comment

by:ddrudik
ID: 20360777
(?<!-)(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)(?=\d\d)

"blah apr07-oct07 blah blah":

Array
(
    [0] => Array
        (
            [0] => apr
        )

)

0
 
LVL 1

Author Comment

by:MarFarMa
ID: 20360874
I don't get it.  Is (?<!-) C# syntax?  It seems more like perl.  Same with the Array construct.  I don't begin to understand what it's doing, or how it's related to the first line of code.

If it is C#, then I need baby steps, because I've never seen a code like this before, and I don't know how to use it.  If it's not, I need help to translate it into C#.

Thanks.
0
 
LVL 27

Accepted Solution

by:
ddrudik earned 500 total points
ID: 20360936
It's a regex pattern, you would need to include it with c# syntax etc.

Something like:
Regex reg = new Regex(@"(?<!-)(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)(?=\d\d)", RegexOptions.IgnoreCase);
MatchCollection matchColl = reg.Matches("blah apr07-oct07 blah blah");
foreach (Match m in matchColl)
  {
  Console.WriteLine(m.Captures[0].Value);
  }
0
 
LVL 1

Author Comment

by:MarFarMa
ID: 20361119
I just ran it in the debugger - works a treat.  How is it that it only matches the first occurance in the string?
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 27

Expert Comment

by:ddrudik
ID: 20361123
The array construct was only to show the matches received with your sample text and my regex pattern, it is not C# code.
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 20361145
(?<!-)

This regex construct says "match but do not capture absence of '-'"  Anything following it (our date construct) would fail to match if it followed a '-'.
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 20361148
Thanks for the question and the points.
0
 
LVL 1

Author Comment

by:MarFarMa
ID: 20361210
Ok - just tested this:

MatchCollection matchColl = reg.Matches("janis oct07apr07 blah blah");

returns oct and apr - but since oct is first in the array, I'm still OK if I just take the first element.  Was that coincidence?  or can I rely on it?

ie - if I have multiple matches, the first one in the array will have been the first in order in the string?
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 20361323
If the source could vary you might be best in matching on:
Regex reg = new Regex(@"(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)(?=\d\d)", RegexOptions.IgnoreCase);

Then just use just m[0].Captures[0].Value, ignoring the remaining matches, if any.  The captures will always be in the order found in the string, from start to end.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Introduction Hi all and welcome to my first article on Experts Exchange. A while ago, someone asked me if i could do some tutorials on object oriented programming. I decided to do them on C#. Now you may ask me, why's that? Well, one of the re…
Performance in games development is paramount: every microsecond counts to be able to do everything in less than 33ms (aiming at 16ms). C# foreach statement is one of the worst performance killers, and here I explain why.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now