Solved

Vexing regex in VB

Posted on 2014-02-22
6
367 Views
Last Modified: 2014-02-23
I've been wrestling with this for days. I'm trying to find a string between two markers in a text file. The markers are "Begin InputTables" and "Begin OutputColumns". The string I need to find is the text between double quotes, i.e. "PCODE" and "CLIENTSX" in:

Begin InputTables
    Name ="PCODE"
    Name ="CLIENTSX"
End
Begin OutputColumns

I can get the regex to find the string if I move the string up to a point immediately after the look-behind...

"Begin InputTables PCODE"

... but of course that does me no good since what I have are hundreds of files where the string is in a line after "Begin InputTables".

I'm searching for the string in For... Each iterations, so I only need to find one at a time.

Here's the code:

                For Each tableInQueryToFind In txtInputTablesInDir
                    If textIn.Peek <> -1 Then
                        Dim tableNm As String = tableInQueryToFind.Name
                        Dim findText As String = tableNm
                        Dim strLength As Integer = Len(findText)
                        findText = findText.Substring(0, strLength - 4)
                        Dim pattern As String = "(?<=^Begin InputTables$)" & findText & "\b" ' & "+(?<=End)??$"
                        Dim options As RegexOptions = RegexOptions.IgnoreCase Or RegexOptions.Multiline
                        Dim m As Match = Regex.Match(line, pattern, options)
                        If Not IsNothing(line) Then
                            If m.Success Then
                                My.Computer.FileSystem.WriteAllText(tableInQueriesOutputFullPath, "<query>" & queriesDirFileNm.Name & "</query>" & "<tableinquery>" & findText & "</tableinquery>" & vbNewLine, True)
                                Exit For
                            End If
                        Else
                            Exit For
                        End If
                    End If
                Next

Open in new window

0
Comment
Question by:aanuncio
  • 3
  • 2
6 Comments
 
LVL 34

Assisted Solution

by:Dan Craciun
Dan Craciun earned 166 total points
ID: 39879954
This:
@"InputTables.*?""(\w+)"".*?""(\w+)"".*?OutputColumns"

Open in new window

will store in $1 and $2 your required text.

HTH,
Dan
0
 
LVL 2

Assisted Solution

by:RannyMeier
RannyMeier earned 334 total points
ID: 39880001
Have you considered using MatchCollection?
We can get all of the InputTable names into a Regex MatchCollection.  Then we can use Linq.Any() method to test.

Regex rx = new Regex(@"(?<Records>\s*Name ="(?<Name>[A-Z]+)"\s+)", RegexOptions.IgnoreCase);

MatchCollection matches = rx.Matches(text);
0
 

Author Comment

by:aanuncio
ID: 39880105
I see that I need to simplify the question. What I need to know is why my look-behind isn't finding anything beyond the first line.

I've tried every combination of line endings to get past the carriage return, but nothing seems to work.

I'll also try the MatchCollection approach, but now that I've started down this road, I'd really like to know why the regex doesn't seem to work following normal conventions.
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 

Author Comment

by:aanuncio
ID: 39880148
This...

(?<=^Begin InputTables\r\n.*)PCODE(?=.*\r\n.*\r\nEnd\r\n^Begin OutputColumns)
... should work to find "PCODE". So why doesn't it?
0
 
LVL 2

Accepted Solution

by:
RannyMeier earned 334 total points
ID: 39880174
I believe that
(?<=^Begin InputTables\r\n.*)PCODE(?=.*\r\n.*\r\nEnd\r\n^Begin OutputColumns)
does find the word PCODE.

Does the program script above result in this same search pattern string?  I did not see that in the original question post.
0
 

Author Closing Comment

by:aanuncio
ID: 39880809
I found the problem, and it was me.

It turns out that no multiline regex (including the correctly syntaxed one verfified by RannyMeier) could possibly work because the input string was a single line. Doh!

Thank you all for putting time into this.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

This article describes relatively difficult and non-obvious issues that are likely to arise when creating COM class in Visual Studio and deploying it by professional MSI-authoring tools. It is assumed that the reader is already familiar with the cla…
More often than not, we developers are confronted with a need: a need to make some kind of magic happen via code. Whether it is for a client, for the boss, or for our own personal projects, the need must be satisfied. Most of the time, the Framework…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now