Link to home
Start Free TrialLog in
Avatar of aanuncio
aanuncio

asked on

Vexing regex in VB, Pt. 2

I need to find table names that are located between two markers, "Begin InputTables" and "Begin OutputColumns". The table names are in turn located between double quotes. In the following example, I am looking for "AGCOUNTY", "ELIGIBILITY" and "X":

Begin InputTables
    Name ="AGCOUNTY"
    Name ="ELIGIBILITY"
    Name ="X"
End
Begin OutputColumns
Since I'm looping through the text, I don't need or want to find these strings use a regex MatchCollection.

My question: why doesn't the following regex...

(?<=^Begin InputTables\r*.*)X
... find the X in singleline mode?

This regex...

(?<=^Begin InputTables\r\n.*)AGCOUNTY
... will find AGCOUNTY in multiline mode, and this regex...

(?<=^Begin InputTables\r\n.*\r\n.*)ELIGIBILITY
... will find ELIGIBILITY in multiline mode.

Put another way, what regex can I use to place an unknown number of carriage returns in a lookahead in .Net?
Avatar of Terry Woods
Terry Woods
Flag of New Zealand image

It worked for me on myregextester.com with the .NET mode turned on. The wildcard "." should match \r so the pattern can be simplified to:
(?<=^Begin InputTables.*)X

Open in new window

but it won't work if the word Begin is not at the very start of the input text, which is the only thing that ^ matches.
Are you wanting to find those 3 tables by their exact names, or would you prefer to grab the values matching something like:
Name ="[^"]*"

Open in new window

Multiline mode will make ^ match the beginning of any line, rather than just the beginning of the input string.
\s matches a space character, which can be a space, tab, newline or line feed. That might be what you're looking for?
.
This pattern might do the trick? It will capture the table names; you just need to extract them from the resulting array
(?s)(?<=^Begin InputTables(?:(?!Begin OutputColumns).)*)Name ="([^"]*)"

Open in new window

The (?s) at the start indicates singleline mode (it can be activated that way).
The (?:(?!Begin OutputColumns).)* part of the pattern says don't go past "Begin OutputColumns" trying to find a match.
Avatar of aanuncio
aanuncio

ASKER

TerryAtOpus: eventually I'll need to match anything between double quotes that are in turn between "Begin InputTables" and "Begin OutputColumns", but for now I want to search for the tables by their exact names.
ASKER CERTIFIED SOLUTION
Avatar of Terry Woods
Terry Woods
Flag of New Zealand image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Should, but doesn't. It only finds the first match. On subsequent iterations, it misses "ELIGIBILITY" and "X".

Here's the code:

                Dim options As RegexOptions = RegexOptions.IgnoreCase Or RegexOptions.Multiline
                Dim pattern As String = "(?ms)(?<=^Begin InputTables(?:(?!Begin OutputColumns).)*)Name =""" & findText & """"
                Dim m As Match = Regex.Match(inputLines, pattern, options)

Open in new window

I can make it match the first, second or third lines between "Begin InputTables" and "Begin OutputColumns", but I can't make it find all of them using a single pattern. That wouldn't be a problem if I had a finite number of lines, but I don't.
By the way, what's the "(?ms)"?
TerryAtOpus: I got it. You were right. I am unworthy.
How do people get so smart?