?
Solved

Regex Balancing Group

Posted on 2016-07-28
30
Medium Priority
?
125 Views
Last Modified: 2016-08-12
I'm trying to parse an Oracle TNS Names file. It looks something like this:
# Generated by Oracle configuration tools.

NorthWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = NorthWind)
    )
  )

SouthWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = SouthWind)
    )
  )
  
WestWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.1.44)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = WestWind)
    )
  )

Open in new window

What I need to be able to do is come up with a regular expression that will allow me to identify the TNS Name and then capture everything inside the parentheses that follow. I've been looking into using a Regex balancing group but haven't quite got the hang of it. Here's what I have so far:
[\n][\s]*[^\(]SouthWind[\s]*=[\s]*((?<Begin>[(]).*(?<End-Begin>[)]))

Open in new window

This isn't working. The capture group overruns the closing parenthesis. There seems to be little good documentation on balancing groups in Regex. They seem like the bastard child that nobody wants to talk about.

This is what I want to get:
(DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = SouthWind)
    )
  )

Open in new window


Can anyone help?
0
Comment
Question by:Russ Suter
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 12
  • 10
  • 3
  • +1
30 Comments
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 41733764
If you don't mind adding a couple of LF at the end of the file, this should work:
^\w+ =\n(.*?\)\n\s*\))\n\s*\n

Open in new window


HTH,
Dan
0
 
LVL 20

Author Comment

by:Russ Suter
ID: 41733911
I just tried that. I added the LFs at the end as you suggested. It didn't match on anything.
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 41735398
OK. What do you use to test?

 Matches in RegeBuddy
0
Monthly Recap

May was a big month for new releases from Linux Academy! Take a look at what our team built recently in our blog. You can access the newest releases from our blog.

 
LVL 20

Author Comment

by:Russ Suter
ID: 41735400
I use Expresso
I also tried this online Regex tester: https://regex101.com/
And Visual Studio

None of them worked
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 41735425
You forgot the modifiers:
1. Dot matches line breaks.
  \s on regex101.com, SingleLine on VS
2. ^$ match at line breaks.
\m on regex101.com, Multiline on VS

Here is the link: https://regex101.com/r/vT4gC5/1
0
 
LVL 20

Author Comment

by:Russ Suter
ID: 41735888
Didn't forget those. Both are enabled. It still doesn't work.
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 41735950
Did you click on the link? It shows the first match.

Add g (global) to see all matches.
0
 
LVL 20

Author Comment

by:Russ Suter
ID: 41735968
It doesn't work in C#. that's where I need it.
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 41736030
subjectString = "
# Generated by Oracle configuration tools.

NorthWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = NorthWind)
    )
  )

SouthWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = SouthWind)
    )
  )
  
WestWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.1.44)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = WestWind)
    )
  )

";

MatchCollection allMatchResults = null;
try {
	Regex regexObj = new Regex(@"^\w+ =\n(.*?\)\n\s*\))\n\s*\n", RegexOptions.Singleline | RegexOptions.Multiline);
	allMatchResults = regexObj.Matches(subjectString);
	if (allMatchResults.Count > 0) {
		// Access individual matches using allMatchResults.Item[]
	} else {
		// Match attempt failed
	} 
} catch (ArgumentException ex) {
	// Syntax error in the regular expression
}

Open in new window


If it does not work, make sure the end line is <LF> (like in the sample you provided), not <CR><LF>.

If the file has Windows line endings, then use this (basically replace \n with \r\n):
^\w+ =\r\n(.*?\)\r\n\s*\))\r\n\s*\r\n

Open in new window

0
 
LVL 11

Expert Comment

by:louisfr
ID: 41737054
If you want documentation on balancing groups, you can check this: http://www.regular-expressions.info/balancing.html
0
 
LVL 20

Author Comment

by:Russ Suter
ID: 41742550
I've already been there and read through it.

Ultimately I just ended up not using Regex for capturing the text. I decided to just use it to determine my start point then just wrote a quick program that parses the following text character by character and keeps track of the parentheses. Sometimes I guess the brute force approach is the best.
0
 
LVL 20

Author Comment

by:Russ Suter
ID: 41742906
I've requested that this question be closed as follows:

Accepted answer: 0 points for Russ Suter's comment #a41742550

for the following reason:

None of the above offered solutions actually worked.
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 41742907
The regular expression provided works. Proof: https://regex101.com/r/vT4gC5/1 . Add g to see all matches.

The first time the author mentioned C# is comment ID: 41735968.

After that I provided sample code in C#.

I only tested with (and provided a solution for) the sample data provided by the author in the question. If it's different from the live data... I can't test on something I can't see.
0
 
LVL 20

Author Comment

by:Russ Suter
ID: 41742939
Your protestation that it works doesn't actually make it work.
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 41743037
So... ignore a working solution (you can protest, but the link on regex101.com proves the regular expression works on the sample you provided), then accept a semi-blind link just in spite.
0
 
LVL 20

Author Comment

by:Russ Suter
ID: 41743064
It doesn't work. It may work on some website but it doesn't work in a real world application. I gave the other solution a C grade because it offered some (but not enough) information. It wasn't personal or spiteful.
0
 
LVL 20

Author Comment

by:Russ Suter
ID: 41743122
You're joking, right?

I provided the actual data (names and IP addresses changed) and the language, admittedly not at first but in a future post I did.

The solution DOES NOT work on what I provided. It may work in your test case but not in my real-world case. I'm really not sure why you fail to understand this.

Furthermore, I'm not the one getting upset. I solved my issue and moved on by using a different solution. You objected so I revisited and gave some (but not full) credit to the only link that offered anything actually useful.

There are plenty of times I answered a question and someone else's answer was accepted even though I thought mine was perfectly valid. I just moved along. I suggest you do the same.

I don't think I'll be able to add anything to this discussion. I'll not reply again. Feel free to get the last word if you wish.
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 41743131
I don't have access to your real world case. I only have access to what you provided on your question.

My regular expression works on your (not my) test data. I only copy/pasted from your post.
Verified in RegexBuddy, regex101.com and Visual Studio.

If it is different from the real world data, how can I (or anyone) provide a solution???

On your next questions, please read and try to provide a SSCCE.

Thank you.
0
 
LVL 20

Author Comment

by:Russ Suter
ID: 41748978
Let's set aside for a moment that the proposed solution didn't work. It actually didn't even properly address the question which involved finding a specific block of text following an identifier. An alternate solution was found. However, I'm happy to wait a while longer for a more complete Regex based solution. Allow me to specify the requirements more fully. I've attached the actual TNS names file (appended with a .txt extension which normally isn't there).

Here are the requirements and restrictions:

1. I cannot in any way modify the file. I must read it as-is on the computer.
2. The file may or may not have additional characters following the last entry. These characters should be irrelevant.
3. I need to be able to extract the text within a balanced block of parentheses following an identifier. In the attached file there are 3 entries and the identifiers are:
    NorthWind =
    SouthWind =
    WestWind =
  The parenthesized block following any one of these (specified by user input) must be extracted. The outer parentheses are optional since I know I can add them back in if they are omitted.

If a C# code block produces the desired result on this website: http://rextester.com/ I will consider it a success.
tnsnames.ora.txt
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 41749182
As I said in my answer above if the pattern does not work with \n it simply means that you have Windows line endings (\r\n).

Here is the link to working proof: http://rextester.com/ADXFEV61390

I'm not a C# programmer so you'll need to write yourself the loop to find the rest of the matches.
If you can't do that, please post a new question in the appropriate TA.

Thank you.
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 41749488
Dan's suggestion appears to work in regexhero.net, which is a Silverlight app (hence it uses .NET's regex engine). You could make the carriage returns optional ( \r? ) to account for either style of line ending:

^\w+ =\r?\n(.*?\)\r?\n\s*\))\r?\n\s*\r?\n

Open in new window


Expresso should work as well if you account for the line ending issue that Dan mentioned.
0
 
LVL 11

Accepted Solution

by:
louisfr earned 2000 total points
ID: 41751639
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;

namespace Rextester
{
    public class Program
    {
        public static void Main(string[] args)
        {
            string text = @"# Generated by Oracle configuration tools.

NorthWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = Northwind)
    )
  )

SouthWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = SouthWind)
    )
  )
 
WestWind = 
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.1.44)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = WestWind)
    )
  )
";
            Regex rx = new Regex(@"\w+\s+=\s+\([^()]+(?>(?>(?'open'\()[^()]*)+(?>(?'-open'\))[^()]*)+)+(?(open)(?!))\)", RegexOptions.Singleline);
            foreach(Match m in rx.Matches(text)){
                Console.WriteLine("-----");
                Console.WriteLine(m.Value);
            }
        }
    }
}

Open in new window

0
 
LVL 20

Author Closing Comment

by:Russ Suter
ID: 41752423
This is almost perfect. By switching out the leading \w+ with the keyword identifier (SouthWind for example) I was able to match exactly the text I needed AND it uses balancing groups as requested. I modified it slightly to use capture groups. The final product looks like this:

(?:SouthWind\s+=\s+)(\([^()]+(?>(?>(?'open'\()[^()]*)+(?>(?'-open'\))[^()]*)+)+(?(open)(?!))\))

I'll programmatically drop in the appropriate identifier in place of SouthWind as needed using a simple string concatenation.

I'm normally pretty good with Regex but this one is a doozy.
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 41753284
this one is a doozy
Which means that if you're using this in production code, then it is probably not the best approach. Regex is a very good and powerful tool, but that doesn't always mean it's the best tool. Can you really say that in six months you'll be able to digest that regex and know what it does or means? What about people coming behind you? Will they understand what it does?

If this is for some one-off, potentially throw-away utility, then it's of less consequence.
2
 
LVL 11

Expert Comment

by:louisfr
ID: 41753996
I think balancing groups are easy in theory but getting details right are tricky.
I always start from the same pre-made regex from the page I linked to earlier.
0
 
LVL 20

Author Comment

by:Russ Suter
ID: 41754013
Agreed. While I'm generally quite Regex adept I found exactly as you said, getting the details right is tricky. And since balancing groups aren't supported by most Regex flavors it's a bit of a specialized art.

I had a perfectly working piece of code that knew where to start based on a simple Regex and then just read each character until the parentheses balanced out. It's a simple loop operation in C#. Now that I also have a viable Regex sample my next step is to consider performance and try to throw a few curve balls at the solution to see how it behaves.

@käµfm³d 👽
Can you really say that in six months you'll be able to digest that regex and know what it does or means? What about people coming behind you? Will they understand what it does?
That's what code comments are for. ;)
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 41754133
This is a post that I quote increasingly often: stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

Regex is a beautiful tool. Just don't use it for everything, as it becomes clunky very quickly.
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 41754466
Good programmers comment why something is done, not what something is doing

= )
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction Hi all and welcome to my first article on Experts Exchange. A while ago, someone asked me if i could do some tutorials on object oriented programming. I decided to do them on C#. Now you may ask me, why's that? Well, one of the re…
Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Suggested Courses

801 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question