?
Solved

Regular expression - Multiple occurences

Posted on 2009-03-30
7
Medium Priority
?
1,045 Views
Last Modified: 2012-05-06
Hi,

I am currently trying to create a regular expression that will only result in  a "match" if  at least  N (3 for example) instances of the matched string are present. The current expression i have created is as follows:

00:0[0-5]:

In the example below, i would like a match to occur (using at least 3 instances as the occurence criteria). There are actually 8 matches based on the original regular expression i have created.

  4 41230    5557    6719 55547594    0    0 00:04:08         10
  4 29668    5558    6706 55547594    0    0 00:04:03          8
  4  6667   11453   12707 55547594    0    0 00:06:05        265
  4 15395    5563    6709 55547594    0    0 00:02:06         8
  4 12513    5556    6688 55547594    0    0 00:03:03          13
  4  5631    5557    6720 55547594    0    0 00:01:03           4
  4 14076   11695   12668 55547594    0    0 00:04:03          3
  4  6656    5561    6721 55547594    0    0 00:04:03          15
  4 10310   11383   12712 55547594    0    0 00:04:03          29

Thanks...

0
Comment
Question by:PhilMacavity
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
7 Comments
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24019828
Hi Phil, you will get better response if you give more than 50 points to your question, some experts have their question filters set to values well above this. Maybe was an error?

I will try to help, please clarify that you mean by the Nth occurence? In your test sample, do you men you do not want to match the first 2 lines that have 00:04 and 00:04, but want to match 00:06?
0
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24019839
What is the host language you are using to run the regex? You might want to add that to your zones to get more experts next time (Perl or Python for example).

0
 
LVL 1

Author Comment

by:PhilMacavity
ID: 24027038
Hi,

Basically, i want to be able to identify if there are more than X occurences (at least 3 for example) of a particular string. In the sample text above, there are 8 matches using the regular expression 00:0[0-5]:
The particualr column which contains the values of interest is (in the sample above) the one which begins 00:04:08 - this relates to a session uptime. I only want a match to occur  if there are more than 3 of these occurences.
This regular expression will eventually be run in a monitoring package  called Sitescope.

Cheers,
phil

P.S. I've modified the number of points associated with this question
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 18

Expert Comment

by:Hube02
ID: 24045608
I'm going to start this off with the fact that I don't know anything about Sitescope, but looking at the page I found about regular expressions in this application (http://schist.und.nodak.edu:8888/SiteScope/docs/regexp.htm) it appears to be compatable with Perl regular expressions.

But I am unable to find any information on regular expression functions. For instance, in PHP I would use preg_match_all and then count how many matches were found. Are there different types of functions that can be used with Sitescope? If so I may be able to come up with a different, shorter regex than the one that follows.

Short of some type of match all function, the following will only match if there are at least 5 occurrences of 00:0 in a string... well, I think it will match, It matches for me and I believe follows the syntax as found on the page I mentioned above. Try it and let me know.

/(00:0.*){5}/s
0
 
LVL 1

Author Comment

by:PhilMacavity
ID: 24048810
Hi,

This is almost working with the following slightly modified expression:

(00:0[0-5]:)\d{2,}

This is searching for at least  two instances of the string 00:0[0-5]: with the \d for digits (although \w for alphanumeric characters would also be ok).
The final problem is that if i have a file (the one in the original posting for example)with 8 strings which should match the above expression, the results are correct if i use 1, or 2, (at least one and at least two matches). If i use ,3 (at least three matches), a negative result (no matches) is returned. I'm not sure why this is the case.

Thanks,

Phil...
0
 
LVL 18

Accepted Solution

by:
Hube02 earned 450 total points
ID: 24049219
There are 2 things that I can think of:

The first is that you need to match all of the characters between the occorences, that is what the .* in my example does, to add this to yours it might look something like:

/((00:0[0-5]:\d{2}).*){2,}/s

And I'm pretty sure that it follows the syntax guidelines on that page I mentioned.

what it does?

match 00:0
followed by a 0, 1, 2, 3, 4, or 5
followed by :
followed by any 2 digits
followed by any number of other characters of any type
repeated at least 2 times
the /s at the end means to treat returns \r\n or \n as white space characters for the .(dot) which the .(dot) does not generally match.

The second thing that could be a problem is this paragraph from the document:

"Related to this pitfall is the fact that the content buffer used           for the URL monitor types             is limited to 50,000 bytes of data.  Depending on the number of             characters in the URL or web page and the character encoding of the content, the             complete content of the URL may be truncated in the Match Content buffer. This             may cause the content match to fail even though the target content is present in             the full URL. You may need to increase the size of the Match Content buffer by editing             the _urlContentMatchMax setting in the master.config file.             See the section on SiteScope Configuration Settings in the             SiteScope Reference Guide for more information."
0
 
LVL 1

Author Comment

by:PhilMacavity
ID: 24105852
Hi,

Thanks - this seems to have done the  trick.

Cheers,

Phil...
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fa…
We are witnesses that everyone is saying that our children shouldn't "play" with a technology because it is dangerous. This article is going to prove that they are wrong.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Suggested Courses

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question