Link to home
Start Free TrialLog in
Avatar of psyfect
psyfect

asked on

4 specific recursive RegEx searches in VB.net

Find and return all instances of random (unknown) strings RECURSIVELY via surrounding expressions (returning the entire expression is fine, I can strip the data):

The asterisk represents the random string that I am searching for, everything else is a constant expectation when searching the surrounding context.

Here are the expressions:
%26t=*&";                            (* represents anything)
&n=*"                            (* represents anything)
<span>* / * / *</span>        (HERE THE * represents any number)
<span>*</span>                 (HERE THE * represents a number larger than 100)

I want to search each line for each type of case, I think that'd be fastest instead of searching the whole page for one expression, then search it again for a second, instead, take it one at a time, but I'll take any form of a recursive search using regex for these patterns.

Example:
%26t=THISSTRING&";

skip 20 lines of text

%26t=THISSTRING&";

skip 30 lines of text

&amp;n=ISTHESTRINGTHAT"

skip 50 lines of text searching until

<span>1 / 22 / 90</span>
<span>5545</span>

Should return:
THISSTRING
THISSTRING
ISTHESTRINGTHAT
12290
5545

Thanks!
Edit: assume the string in question is simply stored as a string variable in vb.net
dim str as string
str = wall of text
recursively search wall of text for expressions above.
Avatar of kaufmed
kaufmed
Flag of United States of America image

From the way you started you post, this sounds like homework. Are you asking for guidance, or for someone to do your work for you?
Avatar of psyfect
psyfect

ASKER

I don't understand regular expressions.  It's not home work, I'm making a program and I'm stuck.  If you could put it in the form of guidance it'd be appreciated so long as I can come to a resolution.  But I've researched many pages on reg expression and I just don't understand the formatting.
Avatar of psyfect

ASKER

I see what you're saying, the first statement sounds like a textbook problem.  No, I just wrote it that way.
Maybe I'm just being dense, but what do you mean by searching it recursively. if you search using RegEx, you should be able to pull out each occurrence as the regex searches.
Avatar of psyfect

ASKER

Ok, that's cool.  It's just that there's an unspecified amount of each individual expression, so there could be 100 of the first 2 of the second or vice versa, as long as it extracts every expression found in the string (which is a really long string with undefined occurrences of each expression).
It looks like you are capturing data from a website. If so, is there an example website you can provide?
Avatar of psyfect

ASKER

I suppose I could fashion a sample or something if it were indeed a necessity.  Is there some place you could refer me that explains how to create regex conditionals in laymen terms?  If I can figure out an answer from your reference I'll gladly assign points, I don't have to have the code handed to me (though I shan't refuse that).

Regardless of surrounding text I need to find:
%26t=    (doesn't matter what goes here)      &";
&amp;n=    (doesn't matter)      "
 (anything) / (goes) / (here)
    (random text)      
and extract the:
   (doesn't matter what goes here)      
    (doesn't matter)      
(anything)  (goes)  (here)
    (random text)      

The spaces are not present, I just added them for sake of visibility.  See OP for asterisk placement.
Avatar of psyfect

ASKER

That last post didn't come out right because of html.  But the original post really has the best synopsis.  I feel surrounding the example in verbose html code would only further confuse things.
ASKER CERTIFIED SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Correction to the last posted regex:
(?:\%26t\=[^\&]*\&\"";)|(?:\&amp;n\=[^\""]*"")|(?:\<span\>\d+/\d+/\d+\</span\>)|(?:\<span\>[1-9]+[0-9]{2}\</span\>)

Open in new window

An example of how to use in code:
        Dim regex As New System.Text.RegularExpressions.Regex("(?:\%26t\=[^\&]\"";)|(?:\&amp;n\=[^\""]*"")|(?:\<span\>\d+/\d+/\d+</span>)|(?:<span>[1-9]+[0-9][1-9]</span>)")
        Dim matches As System.Text.RegularExpressions.MatchCollection
 
        matches = string_to_search
 
        For Each match As System.Text.RegularExpressions.Match In matches
            Me.ListBox1.Items.Add(match.Value)
        Next

Open in new window

Avatar of psyfect

ASKER

Thanks a lot!  Really informative.  Now really helped me understand how it works.  If you've got a moment I'm looking at another string, but haven't figured it out yet.  It's fl=1 followed by any string of characters and ending with

So: fl=1randomdataandhtmltags.

Either way, thanks for the insightful help!
Avatar of psyfect

ASKER

This is what I have for my last request (if you have the time), so far:
?:\fl\=1[^]\

find fl=1, then find anything, then find .  It's not working though...any thoughts as to why?
You can take the \ out before "f1" as there is no need to escape "f". the [^] is not doing anything as of your writing. Try [^(?:\</span>)]*\</span> instead (I'm not sure it will work, but you can try it).
I don't believe the last post will work, however this one should. It reads:

Find "f1" followed by equals sign, followed by any number of characters that do not start with "</span>" then find "</span>".
(?:fl\=1(?!\</span>).*\</span>)

Open in new window

Avatar of psyfect

ASKER

Hmm, can't seem to get that one to work.  I was trying something like this:
Super happy FunSuperb

I was trying to extract:
fl=1">Super happy FunSuperb

by finding fl=1 followed by anything, ending with .

I'm still playing around with it, hopefully I'll figure it out.  You've been a great help, thanks a lot!
Avatar of psyfect

ASKER

Keep forgetting about the formatting:
fl=1"><strong>Super happy Fun</strong></a></q></td><td><q><span>Superb</span>

Open in new window

Avatar of psyfect

ASKER

I think my problem might be this:
. = Matches any single character except line break characters \r and \n. Most regex flavors have an option to make the dot match line break characters too.

I think there might be a line break in the code...how do I make it include line breaks as it says?
Avatar of psyfect

ASKER

Figured out how to change it (singleline), still not working though.  Here's what I have:
Dim mc As MatchCollection = Regex.Matches(tmp, "(?:fl\=1.*\", RegexOptions.Singleline & RegexOptions.IgnoreCase)

What's wrong with the expression and Is it ok to have "RegexOptions.Singleline & RegexOptions.IgnoreCase" or can you only put one?
Avatar of psyfect

ASKER

I put 50 points up for the resolution to this one in a new question:
https://www.experts-exchange.com/questions/24026301/Regular-Expression-VB-Net.html

I appreciate your time and would like to give additional points if you respond in time.
Avatar of psyfect

ASKER

Wow, nevermind.  I think I'm figuring it out.  I missed a closing ")" at the end of my code.  I'll post back if I have further questions.  I would like to get you the points as well, so I closed that question and will recreate if need be.

Thanks!