aanuncio
asked on
Vexing regex in VB, Pt. 2
I need to find table names that are located between two markers, "Begin InputTables" and "Begin OutputColumns". The table names are in turn located between double quotes. In the following example, I am looking for "AGCOUNTY", "ELIGIBILITY" and "X":
Begin InputTables
Name ="AGCOUNTY"
Name ="ELIGIBILITY"
Name ="X"
End
Begin OutputColumns
Since I'm looping through the text, I don't need or want to find these strings use a regex MatchCollection.
My question: why doesn't the following regex...
This regex...
Put another way, what regex can I use to place an unknown number of carriage returns in a lookahead in .Net?
Begin InputTables
Name ="AGCOUNTY"
Name ="ELIGIBILITY"
Name ="X"
End
Begin OutputColumns
My question: why doesn't the following regex...
(?<=^Begin InputTables\r*.*)X
... find the X in singleline mode?This regex...
(?<=^Begin InputTables\r\n.*)AGCOUNTY
... will find AGCOUNTY in multiline mode, and this regex...
(?<=^Begin InputTables\r\n.*\r\n.*)EL IGIBILITY
... will find ELIGIBILITY in multiline mode.Put another way, what regex can I use to place an unknown number of carriage returns in a lookahead in .Net?
Are you wanting to find those 3 tables by their exact names, or would you prefer to grab the values matching something like:
Name ="[^"]*"
Multiline mode will make ^ match the beginning of any line, rather than just the beginning of the input string.
\s matches a space character, which can be a space, tab, newline or line feed. That might be what you're looking for?
.
This pattern might do the trick? It will capture the table names; you just need to extract them from the resulting array
(?s)(?<=^Begin InputTables(?:(?!Begin OutputColumns).)*)Name ="([^"]*)"
The (?s) at the start indicates singleline mode (it can be activated that way).
The (?:(?!Begin OutputColumns).)* part of the pattern says don't go past "Begin OutputColumns" trying to find a match.
ASKER
TerryAtOpus: eventually I'll need to match anything between double quotes that are in turn between "Begin InputTables" and "Begin OutputColumns", but for now I want to search for the tables by their exact names.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Should, but doesn't. It only finds the first match. On subsequent iterations, it misses "ELIGIBILITY" and "X".
Here's the code:
Here's the code:
Dim options As RegexOptions = RegexOptions.IgnoreCase Or RegexOptions.Multiline
Dim pattern As String = "(?ms)(?<=^Begin InputTables(?:(?!Begin OutputColumns).)*)Name =""" & findText & """"
Dim m As Match = Regex.Match(inputLines, pattern, options)
I can make it match the first, second or third lines between "Begin InputTables" and "Begin OutputColumns", but I can't make it find all of them using a single pattern. That wouldn't be a problem if I had a finite number of lines, but I don't.
ASKER
By the way, what's the "(?ms)"?
ASKER
TerryAtOpus: I got it. You were right. I am unworthy.
ASKER
How do people get so smart?
Open in new window
but it won't work if the word Begin is not at the very start of the input text, which is the only thing that ^ matches.