• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1113
  • Last Modified:

Regular expression - Multiple occurences

Hi,

I am currently trying to create a regular expression that will only result in  a "match" if  at least  N (3 for example) instances of the matched string are present. The current expression i have created is as follows:

00:0[0-5]:

In the example below, i would like a match to occur (using at least 3 instances as the occurence criteria). There are actually 8 matches based on the original regular expression i have created.

  4 41230    5557    6719 55547594    0    0 00:04:08         10
  4 29668    5558    6706 55547594    0    0 00:04:03          8
  4  6667   11453   12707 55547594    0    0 00:06:05        265
  4 15395    5563    6709 55547594    0    0 00:02:06         8
  4 12513    5556    6688 55547594    0    0 00:03:03          13
  4  5631    5557    6720 55547594    0    0 00:01:03           4
  4 14076   11695   12668 55547594    0    0 00:04:03          3
  4  6656    5561    6721 55547594    0    0 00:04:03          15
  4 10310   11383   12712 55547594    0    0 00:04:03          29

Thanks...

0
PhilMacavity
Asked:
PhilMacavity
  • 3
  • 2
  • 2
1 Solution
 
mrjoltcolaCommented:
Hi Phil, you will get better response if you give more than 50 points to your question, some experts have their question filters set to values well above this. Maybe was an error?

I will try to help, please clarify that you mean by the Nth occurence? In your test sample, do you men you do not want to match the first 2 lines that have 00:04 and 00:04, but want to match 00:06?
0
 
mrjoltcolaCommented:
What is the host language you are using to run the regex? You might want to add that to your zones to get more experts next time (Perl or Python for example).

0
 
PhilMacavityAuthor Commented:
Hi,

Basically, i want to be able to identify if there are more than X occurences (at least 3 for example) of a particular string. In the sample text above, there are 8 matches using the regular expression 00:0[0-5]:
The particualr column which contains the values of interest is (in the sample above) the one which begins 00:04:08 - this relates to a session uptime. I only want a match to occur  if there are more than 3 of these occurences.
This regular expression will eventually be run in a monitoring package  called Sitescope.

Cheers,
phil

P.S. I've modified the number of points associated with this question
0
Cloud Class® Course: Microsoft Exchange Server

The MCTS: Microsoft Exchange Server 2010 certification validates your skills in supporting the maintenance and administration of the Exchange servers in an enterprise environment. Learn everything you need to know with this course.

 
Hube02Commented:
I'm going to start this off with the fact that I don't know anything about Sitescope, but looking at the page I found about regular expressions in this application (http://schist.und.nodak.edu:8888/SiteScope/docs/regexp.htm) it appears to be compatable with Perl regular expressions.

But I am unable to find any information on regular expression functions. For instance, in PHP I would use preg_match_all and then count how many matches were found. Are there different types of functions that can be used with Sitescope? If so I may be able to come up with a different, shorter regex than the one that follows.

Short of some type of match all function, the following will only match if there are at least 5 occurrences of 00:0 in a string... well, I think it will match, It matches for me and I believe follows the syntax as found on the page I mentioned above. Try it and let me know.

/(00:0.*){5}/s
0
 
PhilMacavityAuthor Commented:
Hi,

This is almost working with the following slightly modified expression:

(00:0[0-5]:)\d{2,}

This is searching for at least  two instances of the string 00:0[0-5]: with the \d for digits (although \w for alphanumeric characters would also be ok).
The final problem is that if i have a file (the one in the original posting for example)with 8 strings which should match the above expression, the results are correct if i use 1, or 2, (at least one and at least two matches). If i use ,3 (at least three matches), a negative result (no matches) is returned. I'm not sure why this is the case.

Thanks,

Phil...
0
 
Hube02Commented:
There are 2 things that I can think of:

The first is that you need to match all of the characters between the occorences, that is what the .* in my example does, to add this to yours it might look something like:

/((00:0[0-5]:\d{2}).*){2,}/s

And I'm pretty sure that it follows the syntax guidelines on that page I mentioned.

what it does?

match 00:0
followed by a 0, 1, 2, 3, 4, or 5
followed by :
followed by any 2 digits
followed by any number of other characters of any type
repeated at least 2 times
the /s at the end means to treat returns \r\n or \n as white space characters for the .(dot) which the .(dot) does not generally match.

The second thing that could be a problem is this paragraph from the document:

"Related to this pitfall is the fact that the content buffer used           for the URL monitor types             is limited to 50,000 bytes of data.  Depending on the number of             characters in the URL or web page and the character encoding of the content, the             complete content of the URL may be truncated in the Match Content buffer. This             may cause the content match to fail even though the target content is present in             the full URL. You may need to increase the size of the Match Content buffer by editing             the _urlContentMatchMax setting in the master.config file.             See the section on SiteScope Configuration Settings in the             SiteScope Reference Guide for more information."
0
 
PhilMacavityAuthor Commented:
Hi,

Thanks - this seems to have done the  trick.

Cheers,

Phil...
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Certified Penetration Testing

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

  • 3
  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now