• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 189
  • Last Modified:

Regex Balancing Group

I'm trying to parse an Oracle TNS Names file. It looks something like this:
# Generated by Oracle configuration tools.

NorthWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = NorthWind)
    )
  )

SouthWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = SouthWind)
    )
  )
  
WestWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.1.44)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = WestWind)
    )
  )

Open in new window

What I need to be able to do is come up with a regular expression that will allow me to identify the TNS Name and then capture everything inside the parentheses that follow. I've been looking into using a Regex balancing group but haven't quite got the hang of it. Here's what I have so far:
[\n][\s]*[^\(]SouthWind[\s]*=[\s]*((?<Begin>[(]).*(?<End-Begin>[)]))

Open in new window

This isn't working. The capture group overruns the closing parenthesis. There seems to be little good documentation on balancing groups in Regex. They seem like the bastard child that nobody wants to talk about.

This is what I want to get:
(DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = SouthWind)
    )
  )

Open in new window


Can anyone help?
0
Russ Suter
Asked:
Russ Suter
  • 12
  • 10
  • 3
  • +1
1 Solution
 
Dan CraciunIT ConsultantCommented:
If you don't mind adding a couple of LF at the end of the file, this should work:
^\w+ =\n(.*?\)\n\s*\))\n\s*\n

Open in new window


HTH,
Dan
0
 
Russ SuterAuthor Commented:
I just tried that. I added the LFs at the end as you suggested. It didn't match on anything.
0
 
Dan CraciunIT ConsultantCommented:
OK. What do you use to test?

 Matches in RegeBuddy
0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
Russ SuterAuthor Commented:
I use Expresso
I also tried this online Regex tester: https://regex101.com/
And Visual Studio

None of them worked
0
 
Dan CraciunIT ConsultantCommented:
You forgot the modifiers:
1. Dot matches line breaks.
  \s on regex101.com, SingleLine on VS
2. ^$ match at line breaks.
\m on regex101.com, Multiline on VS

Here is the link: https://regex101.com/r/vT4gC5/1
0
 
Russ SuterAuthor Commented:
Didn't forget those. Both are enabled. It still doesn't work.
0
 
Dan CraciunIT ConsultantCommented:
Did you click on the link? It shows the first match.

Add g (global) to see all matches.
0
 
Russ SuterAuthor Commented:
It doesn't work in C#. that's where I need it.
0
 
Dan CraciunIT ConsultantCommented:
subjectString = "
# Generated by Oracle configuration tools.

NorthWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = NorthWind)
    )
  )

SouthWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = SouthWind)
    )
  )
  
WestWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.1.44)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = WestWind)
    )
  )

";

MatchCollection allMatchResults = null;
try {
	Regex regexObj = new Regex(@"^\w+ =\n(.*?\)\n\s*\))\n\s*\n", RegexOptions.Singleline | RegexOptions.Multiline);
	allMatchResults = regexObj.Matches(subjectString);
	if (allMatchResults.Count > 0) {
		// Access individual matches using allMatchResults.Item[]
	} else {
		// Match attempt failed
	} 
} catch (ArgumentException ex) {
	// Syntax error in the regular expression
}

Open in new window


If it does not work, make sure the end line is <LF> (like in the sample you provided), not <CR><LF>.

If the file has Windows line endings, then use this (basically replace \n with \r\n):
^\w+ =\r\n(.*?\)\r\n\s*\))\r\n\s*\r\n

Open in new window

0
 
louisfrCommented:
If you want documentation on balancing groups, you can check this: http://www.regular-expressions.info/balancing.html
0
 
Russ SuterAuthor Commented:
I've already been there and read through it.

Ultimately I just ended up not using Regex for capturing the text. I decided to just use it to determine my start point then just wrote a quick program that parses the following text character by character and keeps track of the parentheses. Sometimes I guess the brute force approach is the best.
0
 
Russ SuterAuthor Commented:
I've requested that this question be closed as follows:

Accepted answer: 0 points for Russ Suter's comment #a41742550

for the following reason:

None of the above offered solutions actually worked.
0
 
Dan CraciunIT ConsultantCommented:
The regular expression provided works. Proof: https://regex101.com/r/vT4gC5/1 . Add g to see all matches.

The first time the author mentioned C# is comment ID: 41735968.

After that I provided sample code in C#.

I only tested with (and provided a solution for) the sample data provided by the author in the question. If it's different from the live data... I can't test on something I can't see.
0
 
Russ SuterAuthor Commented:
Your protestation that it works doesn't actually make it work.
0
 
Dan CraciunIT ConsultantCommented:
So... ignore a working solution (you can protest, but the link on regex101.com proves the regular expression works on the sample you provided), then accept a semi-blind link just in spite.
0
 
Russ SuterAuthor Commented:
It doesn't work. It may work on some website but it doesn't work in a real world application. I gave the other solution a C grade because it offered some (but not enough) information. It wasn't personal or spiteful.
0
 
Russ SuterAuthor Commented:
You're joking, right?

I provided the actual data (names and IP addresses changed) and the language, admittedly not at first but in a future post I did.

The solution DOES NOT work on what I provided. It may work in your test case but not in my real-world case. I'm really not sure why you fail to understand this.

Furthermore, I'm not the one getting upset. I solved my issue and moved on by using a different solution. You objected so I revisited and gave some (but not full) credit to the only link that offered anything actually useful.

There are plenty of times I answered a question and someone else's answer was accepted even though I thought mine was perfectly valid. I just moved along. I suggest you do the same.

I don't think I'll be able to add anything to this discussion. I'll not reply again. Feel free to get the last word if you wish.
0
 
Dan CraciunIT ConsultantCommented:
I don't have access to your real world case. I only have access to what you provided on your question.

My regular expression works on your (not my) test data. I only copy/pasted from your post.
Verified in RegexBuddy, regex101.com and Visual Studio.

If it is different from the real world data, how can I (or anyone) provide a solution???

On your next questions, please read and try to provide a SSCCE.

Thank you.
0
 
Russ SuterAuthor Commented:
Let's set aside for a moment that the proposed solution didn't work. It actually didn't even properly address the question which involved finding a specific block of text following an identifier. An alternate solution was found. However, I'm happy to wait a while longer for a more complete Regex based solution. Allow me to specify the requirements more fully. I've attached the actual TNS names file (appended with a .txt extension which normally isn't there).

Here are the requirements and restrictions:

1. I cannot in any way modify the file. I must read it as-is on the computer.
2. The file may or may not have additional characters following the last entry. These characters should be irrelevant.
3. I need to be able to extract the text within a balanced block of parentheses following an identifier. In the attached file there are 3 entries and the identifiers are:
    NorthWind =
    SouthWind =
    WestWind =
  The parenthesized block following any one of these (specified by user input) must be extracted. The outer parentheses are optional since I know I can add them back in if they are omitted.

If a C# code block produces the desired result on this website: http://rextester.com/ I will consider it a success.
tnsnames.ora.txt
0
 
Dan CraciunIT ConsultantCommented:
As I said in my answer above if the pattern does not work with \n it simply means that you have Windows line endings (\r\n).

Here is the link to working proof: http://rextester.com/ADXFEV61390

I'm not a C# programmer so you'll need to write yourself the loop to find the rest of the matches.
If you can't do that, please post a new question in the appropriate TA.

Thank you.
0
 
käµfm³d 👽Commented:
Dan's suggestion appears to work in regexhero.net, which is a Silverlight app (hence it uses .NET's regex engine). You could make the carriage returns optional ( \r? ) to account for either style of line ending:

^\w+ =\r?\n(.*?\)\r?\n\s*\))\r?\n\s*\r?\n

Open in new window


Expresso should work as well if you account for the line ending issue that Dan mentioned.
0
 
louisfrCommented:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;

namespace Rextester
{
    public class Program
    {
        public static void Main(string[] args)
        {
            string text = @"# Generated by Oracle configuration tools.

NorthWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = Northwind)
    )
  )

SouthWind =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = SouthWind)
    )
  )
 
WestWind = 
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.1.44)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = WestWind)
    )
  )
";
            Regex rx = new Regex(@"\w+\s+=\s+\([^()]+(?>(?>(?'open'\()[^()]*)+(?>(?'-open'\))[^()]*)+)+(?(open)(?!))\)", RegexOptions.Singleline);
            foreach(Match m in rx.Matches(text)){
                Console.WriteLine("-----");
                Console.WriteLine(m.Value);
            }
        }
    }
}

Open in new window

0
 
Russ SuterAuthor Commented:
This is almost perfect. By switching out the leading \w+ with the keyword identifier (SouthWind for example) I was able to match exactly the text I needed AND it uses balancing groups as requested. I modified it slightly to use capture groups. The final product looks like this:

(?:SouthWind\s+=\s+)(\([^()]+(?>(?>(?'open'\()[^()]*)+(?>(?'-open'\))[^()]*)+)+(?(open)(?!))\))

I'll programmatically drop in the appropriate identifier in place of SouthWind as needed using a simple string concatenation.

I'm normally pretty good with Regex but this one is a doozy.
0
 
käµfm³d 👽Commented:
this one is a doozy
Which means that if you're using this in production code, then it is probably not the best approach. Regex is a very good and powerful tool, but that doesn't always mean it's the best tool. Can you really say that in six months you'll be able to digest that regex and know what it does or means? What about people coming behind you? Will they understand what it does?

If this is for some one-off, potentially throw-away utility, then it's of less consequence.
2
 
louisfrCommented:
I think balancing groups are easy in theory but getting details right are tricky.
I always start from the same pre-made regex from the page I linked to earlier.
0
 
Russ SuterAuthor Commented:
Agreed. While I'm generally quite Regex adept I found exactly as you said, getting the details right is tricky. And since balancing groups aren't supported by most Regex flavors it's a bit of a specialized art.

I had a perfectly working piece of code that knew where to start based on a simple Regex and then just read each character until the parentheses balanced out. It's a simple loop operation in C#. Now that I also have a viable Regex sample my next step is to consider performance and try to throw a few curve balls at the solution to see how it behaves.

@käµfm³d 👽
Can you really say that in six months you'll be able to digest that regex and know what it does or means? What about people coming behind you? Will they understand what it does?
That's what code comments are for. ;)
0
 
Dan CraciunIT ConsultantCommented:
This is a post that I quote increasingly often: stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

Regex is a beautiful tool. Just don't use it for everything, as it becomes clunky very quickly.
0
 
käµfm³d 👽Commented:
Good programmers comment why something is done, not what something is doing

= )
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 12
  • 10
  • 3
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now