Solved

Regex Balancing Group

Posted on 2016-07-28
30
66 Views
Last Modified: 2016-08-12
I'm trying to parse an Oracle TNS Names file. It looks something like this:
# Generated by Oracle configuration tools.

NorthWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = NorthWind)
    )
  )

SouthWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = SouthWind)
    )
  )
  
WestWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.1.44)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = WestWind)
    )
  )

Open in new window

What I need to be able to do is come up with a regular expression that will allow me to identify the TNS Name and then capture everything inside the parentheses that follow. I've been looking into using a Regex balancing group but haven't quite got the hang of it. Here's what I have so far:
[\n][\s]*[^\(]SouthWind[\s]*=[\s]*((?<Begin>[(]).*(?<End-Begin>[)]))

Open in new window

This isn't working. The capture group overruns the closing parenthesis. There seems to be little good documentation on balancing groups in Regex. They seem like the bastard child that nobody wants to talk about.

This is what I want to get:
(DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = SouthWind)
    )
  )

Open in new window


Can anyone help?
0
Comment
Question by:Russ Suter
  • 12
  • 10
  • 3
  • +1
30 Comments
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
If you don't mind adding a couple of LF at the end of the file, this should work:
^\w+ =\n(.*?\)\n\s*\))\n\s*\n

Open in new window


HTH,
Dan
0
 
LVL 20

Author Comment

by:Russ Suter
Comment Utility
I just tried that. I added the LFs at the end as you suggested. It didn't match on anything.
0
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
OK. What do you use to test?

 Matches in RegeBuddy
0
 
LVL 20

Author Comment

by:Russ Suter
Comment Utility
I use Expresso
I also tried this online Regex tester: https://regex101.com/
And Visual Studio

None of them worked
0
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
You forgot the modifiers:
1. Dot matches line breaks.
  \s on regex101.com, SingleLine on VS
2. ^$ match at line breaks.
\m on regex101.com, Multiline on VS

Here is the link: https://regex101.com/r/vT4gC5/1
0
 
LVL 20

Author Comment

by:Russ Suter
Comment Utility
Didn't forget those. Both are enabled. It still doesn't work.
0
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
Did you click on the link? It shows the first match.

Add g (global) to see all matches.
0
 
LVL 20

Author Comment

by:Russ Suter
Comment Utility
It doesn't work in C#. that's where I need it.
0
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
subjectString = "
# Generated by Oracle configuration tools.

NorthWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = NorthWind)
    )
  )

SouthWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = SouthWind)
    )
  )
  
WestWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.1.44)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = WestWind)
    )
  )

";

MatchCollection allMatchResults = null;
try {
	Regex regexObj = new Regex(@"^\w+ =\n(.*?\)\n\s*\))\n\s*\n", RegexOptions.Singleline | RegexOptions.Multiline);
	allMatchResults = regexObj.Matches(subjectString);
	if (allMatchResults.Count > 0) {
		// Access individual matches using allMatchResults.Item[]
	} else {
		// Match attempt failed
	} 
} catch (ArgumentException ex) {
	// Syntax error in the regular expression
}

Open in new window


If it does not work, make sure the end line is <LF> (like in the sample you provided), not <CR><LF>.

If the file has Windows line endings, then use this (basically replace \n with \r\n):
^\w+ =\r\n(.*?\)\r\n\s*\))\r\n\s*\r\n

Open in new window

0
 
LVL 11

Expert Comment

by:louisfr
Comment Utility
If you want documentation on balancing groups, you can check this: http://www.regular-expressions.info/balancing.html
0
 
LVL 20

Author Comment

by:Russ Suter
Comment Utility
I've already been there and read through it.

Ultimately I just ended up not using Regex for capturing the text. I decided to just use it to determine my start point then just wrote a quick program that parses the following text character by character and keeps track of the parentheses. Sometimes I guess the brute force approach is the best.
0
 
LVL 20

Author Comment

by:Russ Suter
Comment Utility
I've requested that this question be closed as follows:

Accepted answer: 0 points for Russ Suter's comment #a41742550

for the following reason:

None of the above offered solutions actually worked.
0
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
The regular expression provided works. Proof: https://regex101.com/r/vT4gC5/1 . Add g to see all matches.

The first time the author mentioned C# is comment ID: 41735968.

After that I provided sample code in C#.

I only tested with (and provided a solution for) the sample data provided by the author in the question. If it's different from the live data... I can't test on something I can't see.
0
 
LVL 20

Author Comment

by:Russ Suter
Comment Utility
Your protestation that it works doesn't actually make it work.
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
So... ignore a working solution (you can protest, but the link on regex101.com proves the regular expression works on the sample you provided), then accept a semi-blind link just in spite.
0
 
LVL 20

Author Comment

by:Russ Suter
Comment Utility
It doesn't work. It may work on some website but it doesn't work in a real world application. I gave the other solution a C grade because it offered some (but not enough) information. It wasn't personal or spiteful.
0
 
LVL 20

Author Comment

by:Russ Suter
Comment Utility
You're joking, right?

I provided the actual data (names and IP addresses changed) and the language, admittedly not at first but in a future post I did.

The solution DOES NOT work on what I provided. It may work in your test case but not in my real-world case. I'm really not sure why you fail to understand this.

Furthermore, I'm not the one getting upset. I solved my issue and moved on by using a different solution. You objected so I revisited and gave some (but not full) credit to the only link that offered anything actually useful.

There are plenty of times I answered a question and someone else's answer was accepted even though I thought mine was perfectly valid. I just moved along. I suggest you do the same.

I don't think I'll be able to add anything to this discussion. I'll not reply again. Feel free to get the last word if you wish.
0
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
I don't have access to your real world case. I only have access to what you provided on your question.

My regular expression works on your (not my) test data. I only copy/pasted from your post.
Verified in RegexBuddy, regex101.com and Visual Studio.

If it is different from the real world data, how can I (or anyone) provide a solution???

On your next questions, please read and try to provide a SSCCE.

Thank you.
0
 
LVL 20

Author Comment

by:Russ Suter
Comment Utility
Let's set aside for a moment that the proposed solution didn't work. It actually didn't even properly address the question which involved finding a specific block of text following an identifier. An alternate solution was found. However, I'm happy to wait a while longer for a more complete Regex based solution. Allow me to specify the requirements more fully. I've attached the actual TNS names file (appended with a .txt extension which normally isn't there).

Here are the requirements and restrictions:

1. I cannot in any way modify the file. I must read it as-is on the computer.
2. The file may or may not have additional characters following the last entry. These characters should be irrelevant.
3. I need to be able to extract the text within a balanced block of parentheses following an identifier. In the attached file there are 3 entries and the identifiers are:
    NorthWind =
    SouthWind =
    WestWind =
  The parenthesized block following any one of these (specified by user input) must be extracted. The outer parentheses are optional since I know I can add them back in if they are omitted.

If a C# code block produces the desired result on this website: http://rextester.com/ I will consider it a success.
tnsnames.ora.txt
0
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
As I said in my answer above if the pattern does not work with \n it simply means that you have Windows line endings (\r\n).

Here is the link to working proof: http://rextester.com/ADXFEV61390

I'm not a C# programmer so you'll need to write yourself the loop to find the rest of the matches.
If you can't do that, please post a new question in the appropriate TA.

Thank you.
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
Comment Utility
Dan's suggestion appears to work in regexhero.net, which is a Silverlight app (hence it uses .NET's regex engine). You could make the carriage returns optional ( \r? ) to account for either style of line ending:

^\w+ =\r?\n(.*?\)\r?\n\s*\))\r?\n\s*\r?\n

Open in new window


Expresso should work as well if you account for the line ending issue that Dan mentioned.
0
 
LVL 11

Accepted Solution

by:
louisfr earned 500 total points
Comment Utility
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;

namespace Rextester
{
    public class Program
    {
        public static void Main(string[] args)
        {
            string text = @"# Generated by Oracle configuration tools.

NorthWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = Northwind)
    )
  )

SouthWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = SouthWind)
    )
  )
 
WestWind = 
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.1.44)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = WestWind)
    )
  )
";
            Regex rx = new Regex(@"\w+\s+=\s+\([^()]+(?>(?>(?'open'\()[^()]*)+(?>(?'-open'\))[^()]*)+)+(?(open)(?!))\)", RegexOptions.Singleline);
            foreach(Match m in rx.Matches(text)){
                Console.WriteLine("-----");
                Console.WriteLine(m.Value);
            }
        }
    }
}

Open in new window

0
 
LVL 20

Author Closing Comment

by:Russ Suter
Comment Utility
This is almost perfect. By switching out the leading \w+ with the keyword identifier (SouthWind for example) I was able to match exactly the text I needed AND it uses balancing groups as requested. I modified it slightly to use capture groups. The final product looks like this:

(?:SouthWind\s+=\s+)(\([^()]+(?>(?>(?'open'\()[^()]*)+(?>(?'-open'\))[^()]*)+)+(?(open)(?!))\))

I'll programmatically drop in the appropriate identifier in place of SouthWind as needed using a simple string concatenation.

I'm normally pretty good with Regex but this one is a doozy.
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
Comment Utility
this one is a doozy
Which means that if you're using this in production code, then it is probably not the best approach. Regex is a very good and powerful tool, but that doesn't always mean it's the best tool. Can you really say that in six months you'll be able to digest that regex and know what it does or means? What about people coming behind you? Will they understand what it does?

If this is for some one-off, potentially throw-away utility, then it's of less consequence.
2
 
LVL 11

Expert Comment

by:louisfr
Comment Utility
I think balancing groups are easy in theory but getting details right are tricky.
I always start from the same pre-made regex from the page I linked to earlier.
0
 
LVL 20

Author Comment

by:Russ Suter
Comment Utility
Agreed. While I'm generally quite Regex adept I found exactly as you said, getting the details right is tricky. And since balancing groups aren't supported by most Regex flavors it's a bit of a specialized art.

I had a perfectly working piece of code that knew where to start based on a simple Regex and then just read each character until the parentheses balanced out. It's a simple loop operation in C#. Now that I also have a viable Regex sample my next step is to consider performance and try to throw a few curve balls at the solution to see how it behaves.

@käµfm³d 👽
Can you really say that in six months you'll be able to digest that regex and know what it does or means? What about people coming behind you? Will they understand what it does?
That's what code comments are for. ;)
0
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
This is a post that I quote increasingly often: stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

Regex is a beautiful tool. Just don't use it for everything, as it becomes clunky very quickly.
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
Comment Utility
Good programmers comment why something is done, not what something is doing

= )
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Calculating holidays and working days is a function that is often needed yet it is not one found within the Framework. This article presents one approach to building a working-day calculator for use in .NET.
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now