Solved

Vexing regex in VB

Posted on 2014-02-22
6
382 Views
Last Modified: 2014-02-23
I've been wrestling with this for days. I'm trying to find a string between two markers in a text file. The markers are "Begin InputTables" and "Begin OutputColumns". The string I need to find is the text between double quotes, i.e. "PCODE" and "CLIENTSX" in:

Begin InputTables
    Name ="PCODE"
    Name ="CLIENTSX"
End
Begin OutputColumns

I can get the regex to find the string if I move the string up to a point immediately after the look-behind...

"Begin InputTables PCODE"

... but of course that does me no good since what I have are hundreds of files where the string is in a line after "Begin InputTables".

I'm searching for the string in For... Each iterations, so I only need to find one at a time.

Here's the code:

                For Each tableInQueryToFind In txtInputTablesInDir
                    If textIn.Peek <> -1 Then
                        Dim tableNm As String = tableInQueryToFind.Name
                        Dim findText As String = tableNm
                        Dim strLength As Integer = Len(findText)
                        findText = findText.Substring(0, strLength - 4)
                        Dim pattern As String = "(?<=^Begin InputTables$)" & findText & "\b" ' & "+(?<=End)??$"
                        Dim options As RegexOptions = RegexOptions.IgnoreCase Or RegexOptions.Multiline
                        Dim m As Match = Regex.Match(line, pattern, options)
                        If Not IsNothing(line) Then
                            If m.Success Then
                                My.Computer.FileSystem.WriteAllText(tableInQueriesOutputFullPath, "<query>" & queriesDirFileNm.Name & "</query>" & "<tableinquery>" & findText & "</tableinquery>" & vbNewLine, True)
                                Exit For
                            End If
                        Else
                            Exit For
                        End If
                    End If
                Next

Open in new window

0
Comment
Question by:aanuncio
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 35

Assisted Solution

by:Dan Craciun
Dan Craciun earned 166 total points
ID: 39879954
This:
@"InputTables.*?""(\w+)"".*?""(\w+)"".*?OutputColumns"

Open in new window

will store in $1 and $2 your required text.

HTH,
Dan
0
 
LVL 2

Assisted Solution

by:RannyMeier
RannyMeier earned 334 total points
ID: 39880001
Have you considered using MatchCollection?
We can get all of the InputTable names into a Regex MatchCollection.  Then we can use Linq.Any() method to test.

Regex rx = new Regex(@"(?<Records>\s*Name ="(?<Name>[A-Z]+)"\s+)", RegexOptions.IgnoreCase);

MatchCollection matches = rx.Matches(text);
0
 

Author Comment

by:aanuncio
ID: 39880105
I see that I need to simplify the question. What I need to know is why my look-behind isn't finding anything beyond the first line.

I've tried every combination of line endings to get past the carriage return, but nothing seems to work.

I'll also try the MatchCollection approach, but now that I've started down this road, I'd really like to know why the regex doesn't seem to work following normal conventions.
0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 

Author Comment

by:aanuncio
ID: 39880148
This...

(?<=^Begin InputTables\r\n.*)PCODE(?=.*\r\n.*\r\nEnd\r\n^Begin OutputColumns)
... should work to find "PCODE". So why doesn't it?
0
 
LVL 2

Accepted Solution

by:
RannyMeier earned 334 total points
ID: 39880174
I believe that
(?<=^Begin InputTables\r\n.*)PCODE(?=.*\r\n.*\r\nEnd\r\n^Begin OutputColumns)
does find the word PCODE.

Does the program script above result in this same search pattern string?  I did not see that in the original question post.
0
 

Author Closing Comment

by:aanuncio
ID: 39880809
I found the problem, and it was me.

It turns out that no multiline regex (including the correctly syntaxed one verfified by RannyMeier) could possibly work because the input string was a single line. Doh!

Thank you all for putting time into this.
0

Featured Post

Free Backup Tool for VMware and Hyper-V

Restore full virtual machine or individual guest files from 19 common file systems directly from the backup file. Schedule VM backups with PowerShell scripts. Set desired time, lean back and let the script to notify you via email upon completion.  

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fa…
If you need to start windows update installation remotely or as a scheduled task you will find this very helpful.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

635 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question