Solved

Vexing regex in VB

Posted on 2014-02-22
6
378 Views
Last Modified: 2014-02-23
I've been wrestling with this for days. I'm trying to find a string between two markers in a text file. The markers are "Begin InputTables" and "Begin OutputColumns". The string I need to find is the text between double quotes, i.e. "PCODE" and "CLIENTSX" in:

Begin InputTables
    Name ="PCODE"
    Name ="CLIENTSX"
End
Begin OutputColumns

I can get the regex to find the string if I move the string up to a point immediately after the look-behind...

"Begin InputTables PCODE"

... but of course that does me no good since what I have are hundreds of files where the string is in a line after "Begin InputTables".

I'm searching for the string in For... Each iterations, so I only need to find one at a time.

Here's the code:

                For Each tableInQueryToFind In txtInputTablesInDir
                    If textIn.Peek <> -1 Then
                        Dim tableNm As String = tableInQueryToFind.Name
                        Dim findText As String = tableNm
                        Dim strLength As Integer = Len(findText)
                        findText = findText.Substring(0, strLength - 4)
                        Dim pattern As String = "(?<=^Begin InputTables$)" & findText & "\b" ' & "+(?<=End)??$"
                        Dim options As RegexOptions = RegexOptions.IgnoreCase Or RegexOptions.Multiline
                        Dim m As Match = Regex.Match(line, pattern, options)
                        If Not IsNothing(line) Then
                            If m.Success Then
                                My.Computer.FileSystem.WriteAllText(tableInQueriesOutputFullPath, "<query>" & queriesDirFileNm.Name & "</query>" & "<tableinquery>" & findText & "</tableinquery>" & vbNewLine, True)
                                Exit For
                            End If
                        Else
                            Exit For
                        End If
                    End If
                Next

Open in new window

0
Comment
Question by:aanuncio
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 35

Assisted Solution

by:Dan Craciun
Dan Craciun earned 166 total points
ID: 39879954
This:
@"InputTables.*?""(\w+)"".*?""(\w+)"".*?OutputColumns"

Open in new window

will store in $1 and $2 your required text.

HTH,
Dan
0
 
LVL 2

Assisted Solution

by:RannyMeier
RannyMeier earned 334 total points
ID: 39880001
Have you considered using MatchCollection?
We can get all of the InputTable names into a Regex MatchCollection.  Then we can use Linq.Any() method to test.

Regex rx = new Regex(@"(?<Records>\s*Name ="(?<Name>[A-Z]+)"\s+)", RegexOptions.IgnoreCase);

MatchCollection matches = rx.Matches(text);
0
 

Author Comment

by:aanuncio
ID: 39880105
I see that I need to simplify the question. What I need to know is why my look-behind isn't finding anything beyond the first line.

I've tried every combination of line endings to get past the carriage return, but nothing seems to work.

I'll also try the MatchCollection approach, but now that I've started down this road, I'd really like to know why the regex doesn't seem to work following normal conventions.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:aanuncio
ID: 39880148
This...

(?<=^Begin InputTables\r\n.*)PCODE(?=.*\r\n.*\r\nEnd\r\n^Begin OutputColumns)
... should work to find "PCODE". So why doesn't it?
0
 
LVL 2

Accepted Solution

by:
RannyMeier earned 334 total points
ID: 39880174
I believe that
(?<=^Begin InputTables\r\n.*)PCODE(?=.*\r\n.*\r\nEnd\r\n^Begin OutputColumns)
does find the word PCODE.

Does the program script above result in this same search pattern string?  I did not see that in the original question post.
0
 

Author Closing Comment

by:aanuncio
ID: 39880809
I found the problem, and it was me.

It turns out that no multiline regex (including the correctly syntaxed one verfified by RannyMeier) could possibly work because the input string was a single line. Doh!

Thank you all for putting time into this.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many of us here at EE write code. Many of us write exceptional code; just as many of us write exception-prone code. As we all should know, exceptions are a mechanism for handling errors which are typically out of our control. From database errors, t…
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

710 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question