Solved

Regular expression - Multiple occurences

Posted on 2009-03-30
7
1,003 Views
Last Modified: 2012-05-06
Hi,

I am currently trying to create a regular expression that will only result in  a "match" if  at least  N (3 for example) instances of the matched string are present. The current expression i have created is as follows:

00:0[0-5]:

In the example below, i would like a match to occur (using at least 3 instances as the occurence criteria). There are actually 8 matches based on the original regular expression i have created.

  4 41230    5557    6719 55547594    0    0 00:04:08         10
  4 29668    5558    6706 55547594    0    0 00:04:03          8
  4  6667   11453   12707 55547594    0    0 00:06:05        265
  4 15395    5563    6709 55547594    0    0 00:02:06         8
  4 12513    5556    6688 55547594    0    0 00:03:03          13
  4  5631    5557    6720 55547594    0    0 00:01:03           4
  4 14076   11695   12668 55547594    0    0 00:04:03          3
  4  6656    5561    6721 55547594    0    0 00:04:03          15
  4 10310   11383   12712 55547594    0    0 00:04:03          29

Thanks...

0
Comment
Question by:PhilMacavity
  • 3
  • 2
  • 2
7 Comments
 
LVL 40

Expert Comment

by:mrjoltcola
Comment Utility
Hi Phil, you will get better response if you give more than 50 points to your question, some experts have their question filters set to values well above this. Maybe was an error?

I will try to help, please clarify that you mean by the Nth occurence? In your test sample, do you men you do not want to match the first 2 lines that have 00:04 and 00:04, but want to match 00:06?
0
 
LVL 40

Expert Comment

by:mrjoltcola
Comment Utility
What is the host language you are using to run the regex? You might want to add that to your zones to get more experts next time (Perl or Python for example).

0
 
LVL 1

Author Comment

by:PhilMacavity
Comment Utility
Hi,

Basically, i want to be able to identify if there are more than X occurences (at least 3 for example) of a particular string. In the sample text above, there are 8 matches using the regular expression 00:0[0-5]:
The particualr column which contains the values of interest is (in the sample above) the one which begins 00:04:08 - this relates to a session uptime. I only want a match to occur  if there are more than 3 of these occurences.
This regular expression will eventually be run in a monitoring package  called Sitescope.

Cheers,
phil

P.S. I've modified the number of points associated with this question
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 18

Expert Comment

by:Hube02
Comment Utility
I'm going to start this off with the fact that I don't know anything about Sitescope, but looking at the page I found about regular expressions in this application (http://schist.und.nodak.edu:8888/SiteScope/docs/regexp.htm) it appears to be compatable with Perl regular expressions.

But I am unable to find any information on regular expression functions. For instance, in PHP I would use preg_match_all and then count how many matches were found. Are there different types of functions that can be used with Sitescope? If so I may be able to come up with a different, shorter regex than the one that follows.

Short of some type of match all function, the following will only match if there are at least 5 occurrences of 00:0 in a string... well, I think it will match, It matches for me and I believe follows the syntax as found on the page I mentioned above. Try it and let me know.

/(00:0.*){5}/s
0
 
LVL 1

Author Comment

by:PhilMacavity
Comment Utility
Hi,

This is almost working with the following slightly modified expression:

(00:0[0-5]:)\d{2,}

This is searching for at least  two instances of the string 00:0[0-5]: with the \d for digits (although \w for alphanumeric characters would also be ok).
The final problem is that if i have a file (the one in the original posting for example)with 8 strings which should match the above expression, the results are correct if i use 1, or 2, (at least one and at least two matches). If i use ,3 (at least three matches), a negative result (no matches) is returned. I'm not sure why this is the case.

Thanks,

Phil...
0
 
LVL 18

Accepted Solution

by:
Hube02 earned 150 total points
Comment Utility
There are 2 things that I can think of:

The first is that you need to match all of the characters between the occorences, that is what the .* in my example does, to add this to yours it might look something like:

/((00:0[0-5]:\d{2}).*){2,}/s

And I'm pretty sure that it follows the syntax guidelines on that page I mentioned.

what it does?

match 00:0
followed by a 0, 1, 2, 3, 4, or 5
followed by :
followed by any 2 digits
followed by any number of other characters of any type
repeated at least 2 times
the /s at the end means to treat returns \r\n or \n as white space characters for the .(dot) which the .(dot) does not generally match.

The second thing that could be a problem is this paragraph from the document:

"Related to this pitfall is the fact that the content buffer used           for the URL monitor types             is limited to 50,000 bytes of data.  Depending on the number of             characters in the URL or web page and the character encoding of the content, the             complete content of the URL may be truncated in the Match Content buffer. This             may cause the content match to fail even though the target content is present in             the full URL. You may need to increase the size of the Match Content buffer by editing             the _urlContentMatchMax setting in the master.config file.             See the section on SiteScope Configuration Settings in the             SiteScope Reference Guide for more information."
0
 
LVL 1

Author Comment

by:PhilMacavity
Comment Utility
Hi,

Thanks - this seems to have done the  trick.

Cheers,

Phil...
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

by Batuhan Cetin Regular expression is a language that we use to edit a string or retrieve sub-strings that meets specific rules from a text. A regular expression can be applied to a set of string variables. There are many RegEx engines for u…
As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power,…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now