psyfect
asked on
4 specific recursive RegEx searches in VB.net
Find and return all instances of random (unknown) strings RECURSIVELY via surrounding expressions (returning the entire expression is fine, I can strip the data):
The asterisk represents the random string that I am searching for, everything else is a constant expectation when searching the surrounding context.
Here are the expressions:
%26t=*&"; (* represents anything)
&n=*" (* represents anything)
<span>* / * / *</span> (HERE THE * represents any number)
<span>*</span> (HERE THE * represents a number larger than 100)
I want to search each line for each type of case, I think that'd be fastest instead of searching the whole page for one expression, then search it again for a second, instead, take it one at a time, but I'll take any form of a recursive search using regex for these patterns.
Example:
%26t=THISSTRING&";
skip 20 lines of text
%26t=THISSTRING&";
skip 30 lines of text
&n=ISTHESTRINGTHAT"
skip 50 lines of text searching until
<span>1 / 22 / 90</span>
<span>5545</span>
Should return:
THISSTRING
THISSTRING
ISTHESTRINGTHAT
12290
5545
Thanks!
Edit: assume the string in question is simply stored as a string variable in vb.net
dim str as string
str = wall of text
recursively search wall of text for expressions above.
The asterisk represents the random string that I am searching for, everything else is a constant expectation when searching the surrounding context.
Here are the expressions:
%26t=*&"; (* represents anything)
&n=*" (* represents anything)
<span>* / * / *</span> (HERE THE * represents any number)
<span>*</span> (HERE THE * represents a number larger than 100)
I want to search each line for each type of case, I think that'd be fastest instead of searching the whole page for one expression, then search it again for a second, instead, take it one at a time, but I'll take any form of a recursive search using regex for these patterns.
Example:
%26t=THISSTRING&";
skip 20 lines of text
%26t=THISSTRING&";
skip 30 lines of text
&n=ISTHESTRINGTHAT"
skip 50 lines of text searching until
<span>1 / 22 / 90</span>
<span>5545</span>
Should return:
THISSTRING
THISSTRING
ISTHESTRINGTHAT
12290
5545
Thanks!
Edit: assume the string in question is simply stored as a string variable in vb.net
dim str as string
str = wall of text
recursively search wall of text for expressions above.
From the way you started you post, this sounds like homework. Are you asking for guidance, or for someone to do your work for you?
ASKER
I don't understand regular expressions. It's not home work, I'm making a program and I'm stuck. If you could put it in the form of guidance it'd be appreciated so long as I can come to a resolution. But I've researched many pages on reg expression and I just don't understand the formatting.
ASKER
I see what you're saying, the first statement sounds like a textbook problem. No, I just wrote it that way.
Maybe I'm just being dense, but what do you mean by searching it recursively. if you search using RegEx, you should be able to pull out each occurrence as the regex searches.
ASKER
Ok, that's cool. It's just that there's an unspecified amount of each individual expression, so there could be 100 of the first 2 of the second or vice versa, as long as it extracts every expression found in the string (which is a really long string with undefined occurrences of each expression).
It looks like you are capturing data from a website. If so, is there an example website you can provide?
ASKER
I suppose I could fashion a sample or something if it were indeed a necessity. Is there some place you could refer me that explains how to create regex conditionals in laymen terms? If I can figure out an answer from your reference I'll gladly assign points, I don't have to have the code handed to me (though I shan't refuse that).
Regardless of surrounding text I need to find:
%26t= (doesn't matter what goes here) &";
&n= (doesn't matter) "
(anything) / (goes) / (here)
(random text)
and extract the:
(doesn't matter what goes here)
(doesn't matter)
(anything) (goes) (here)
(random text)
The spaces are not present, I just added them for sake of visibility. See OP for asterisk placement.
Regardless of surrounding text I need to find:
%26t= (doesn't matter what goes here) &";
&n= (doesn't matter) "
(anything) / (goes) / (here)
(random text)
and extract the:
(doesn't matter what goes here)
(doesn't matter)
(anything) (goes) (here)
(random text)
The spaces are not present, I just added them for sake of visibility. See OP for asterisk placement.
ASKER
That last post didn't come out right because of html. But the original post really has the best synopsis. I feel surrounding the example in verbose html code would only further confuse things.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Correction to the last posted regex:
(?:\%26t\=[^\&]*\&\"";)|(?:\&n\=[^\""]*"")|(?:\<span\>\d+/\d+/\d+\</span\>)|(?:\<span\>[1-9]+[0-9]{2}\</span\>)
An example of how to use in code:
Dim regex As New System.Text.RegularExpressions.Regex("(?:\%26t\=[^\&]\"";)|(?:\&n\=[^\""]*"")|(?:\<span\>\d+/\d+/\d+</span>)|(?:<span>[1-9]+[0-9][1-9]</span>)")
Dim matches As System.Text.RegularExpressions.MatchCollection
matches = string_to_search
For Each match As System.Text.RegularExpressions.Match In matches
Me.ListBox1.Items.Add(match.Value)
Next
ASKER
Thanks a lot! Really informative. Now really helped me understand how it works. If you've got a moment I'm looking at another string, but haven't figured it out yet. It's fl=1 followed by any string of characters and ending with
So: fl=1randomdataandhtmltags.
Either way, thanks for the insightful help!
So: fl=1randomdataandhtmltags.
Either way, thanks for the insightful help!
ASKER
This is what I have for my last request (if you have the time), so far:
?:\fl\=1[^]\
find fl=1, then find anything, then find . It's not working though...any thoughts as to why?
?:\fl\=1[^]\
find fl=1, then find anything, then find . It's not working though...any thoughts as to why?
You can take the \ out before "f1" as there is no need to escape "f". the [^] is not doing anything as of your writing. Try [^(?:\</span>)]*\</span> instead (I'm not sure it will work, but you can try it).
I don't believe the last post will work, however this one should. It reads:
Find "f1" followed by equals sign, followed by any number of characters that do not start with "</span>" then find "</span>".
Find "f1" followed by equals sign, followed by any number of characters that do not start with "</span>" then find "</span>".
(?:fl\=1(?!\</span>).*\</span>)
ASKER
Hmm, can't seem to get that one to work. I was trying something like this:
Super happy FunSuperb
I was trying to extract:
fl=1">Super happy FunSuperb
by finding fl=1 followed by anything, ending with .
I'm still playing around with it, hopefully I'll figure it out. You've been a great help, thanks a lot!
Super happy FunSuperb
I was trying to extract:
fl=1">Super happy FunSuperb
by finding fl=1 followed by anything, ending with .
I'm still playing around with it, hopefully I'll figure it out. You've been a great help, thanks a lot!
ASKER
Keep forgetting about the formatting:
fl=1"><strong>Super happy Fun</strong></a></q></td><td><q><span>Superb</span>
ASKER
I think my problem might be this:
. = Matches any single character except line break characters \r and \n. Most regex flavors have an option to make the dot match line break characters too.
I think there might be a line break in the code...how do I make it include line breaks as it says?
. = Matches any single character except line break characters \r and \n. Most regex flavors have an option to make the dot match line break characters too.
I think there might be a line break in the code...how do I make it include line breaks as it says?
ASKER
Figured out how to change it (singleline), still not working though. Here's what I have:
Dim mc As MatchCollection = Regex.Matches(tmp, "(?:fl\=1.*\", RegexOptions.Singleline & RegexOptions.IgnoreCase)
What's wrong with the expression and Is it ok to have "RegexOptions.Singleline & RegexOptions.IgnoreCase" or can you only put one?
Dim mc As MatchCollection = Regex.Matches(tmp, "(?:fl\=1.*\", RegexOptions.Singleline & RegexOptions.IgnoreCase)
What's wrong with the expression and Is it ok to have "RegexOptions.Singleline & RegexOptions.IgnoreCase" or can you only put one?
ASKER
I put 50 points up for the resolution to this one in a new question:
https://www.experts-exchange.com/questions/24026301/Regular-Expression-VB-Net.html
I appreciate your time and would like to give additional points if you respond in time.
https://www.experts-exchange.com/questions/24026301/Regular-Expression-VB-Net.html
I appreciate your time and would like to give additional points if you respond in time.
ASKER
Wow, nevermind. I think I'm figuring it out. I missed a closing ")" at the end of my code. I'll post back if I have further questions. I would like to get you the points as well, so I closed that question and will recreate if need be.
Thanks!
Thanks!