Russ Suter
asked on
Regex Balancing Group
I'm trying to parse an Oracle TNS Names file. It looks something like this:
This is what I want to get:
Can anyone help?
# Generated by Oracle configuration tools.
NorthWind =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = NorthWind)
)
)
SouthWind =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = SouthWind)
)
)
WestWind =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.1.44)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = WestWind)
)
)
What I need to be able to do is come up with a regular expression that will allow me to identify the TNS Name and then capture everything inside the parentheses that follow. I've been looking into using a Regex balancing group but haven't quite got the hang of it. Here's what I have so far:[\n][\s]*[^\(]SouthWind[\s]*=[\s]*((?<Begin>[(]).*(?<End-Begin>[)]))
This isn't working. The capture group overruns the closing parenthesis. There seems to be little good documentation on balancing groups in Regex. They seem like the bastard child that nobody wants to talk about.This is what I want to get:
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = SouthWind)
)
)
Can anyone help?
ASKER
I just tried that. I added the LFs at the end as you suggested. It didn't match on anything.
ASKER
I use Expresso
I also tried this online Regex tester: https://regex101.com/
And Visual Studio
None of them worked
I also tried this online Regex tester: https://regex101.com/
And Visual Studio
None of them worked
You forgot the modifiers:
1. Dot matches line breaks.
\s on regex101.com, SingleLine on VS
2. ^$ match at line breaks.
\m on regex101.com, Multiline on VS
Here is the link: https://regex101.com/r/vT4gC5/1
1. Dot matches line breaks.
\s on regex101.com, SingleLine on VS
2. ^$ match at line breaks.
\m on regex101.com, Multiline on VS
Here is the link: https://regex101.com/r/vT4gC5/1
ASKER
Didn't forget those. Both are enabled. It still doesn't work.
Did you click on the link? It shows the first match.
Add g (global) to see all matches.
Add g (global) to see all matches.
ASKER
It doesn't work in C#. that's where I need it.
subjectString = "
# Generated by Oracle configuration tools.
NorthWind =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = NorthWind)
)
)
SouthWind =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = SouthWind)
)
)
WestWind =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.1.44)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = WestWind)
)
)
";
MatchCollection allMatchResults = null;
try {
Regex regexObj = new Regex(@"^\w+ =\n(.*?\)\n\s*\))\n\s*\n", RegexOptions.Singleline | RegexOptions.Multiline);
allMatchResults = regexObj.Matches(subjectString);
if (allMatchResults.Count > 0) {
// Access individual matches using allMatchResults.Item[]
} else {
// Match attempt failed
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
If it does not work, make sure the end line is <LF> (like in the sample you provided), not <CR><LF>.
If the file has Windows line endings, then use this (basically replace \n with \r\n):
^\w+ =\r\n(.*?\)\r\n\s*\))\r\n\s*\r\n
If you want documentation on balancing groups, you can check this: http://www.regular-expressions.info/balancing.html
ASKER
I've already been there and read through it.
Ultimately I just ended up not using Regex for capturing the text. I decided to just use it to determine my start point then just wrote a quick program that parses the following text character by character and keeps track of the parentheses. Sometimes I guess the brute force approach is the best.
Ultimately I just ended up not using Regex for capturing the text. I decided to just use it to determine my start point then just wrote a quick program that parses the following text character by character and keeps track of the parentheses. Sometimes I guess the brute force approach is the best.
ASKER
I've requested that this question be closed as follows:
Accepted answer: 0 points for Russ Suter's comment #a41742550
for the following reason:
None of the above offered solutions actually worked.
Accepted answer: 0 points for Russ Suter's comment #a41742550
for the following reason:
None of the above offered solutions actually worked.
The regular expression provided works. Proof: https://regex101.com/r/vT4gC5/1 . Add g to see all matches.
The first time the author mentioned C# is comment ID: 41735968.
After that I provided sample code in C#.
I only tested with (and provided a solution for) the sample data provided by the author in the question. If it's different from the live data... I can't test on something I can't see.
The first time the author mentioned C# is comment ID: 41735968.
After that I provided sample code in C#.
I only tested with (and provided a solution for) the sample data provided by the author in the question. If it's different from the live data... I can't test on something I can't see.
ASKER
Your protestation that it works doesn't actually make it work.
So... ignore a working solution (you can protest, but the link on regex101.com proves the regular expression works on the sample you provided), then accept a semi-blind link just in spite.
ASKER
It doesn't work. It may work on some website but it doesn't work in a real world application. I gave the other solution a C grade because it offered some (but not enough) information. It wasn't personal or spiteful.
ASKER
You're joking, right?
I provided the actual data (names and IP addresses changed) and the language, admittedly not at first but in a future post I did.
The solution DOES NOT work on what I provided. It may work in your test case but not in my real-world case. I'm really not sure why you fail to understand this.
Furthermore, I'm not the one getting upset. I solved my issue and moved on by using a different solution. You objected so I revisited and gave some (but not full) credit to the only link that offered anything actually useful.
There are plenty of times I answered a question and someone else's answer was accepted even though I thought mine was perfectly valid. I just moved along. I suggest you do the same.
I don't think I'll be able to add anything to this discussion. I'll not reply again. Feel free to get the last word if you wish.
I provided the actual data (names and IP addresses changed) and the language, admittedly not at first but in a future post I did.
The solution DOES NOT work on what I provided. It may work in your test case but not in my real-world case. I'm really not sure why you fail to understand this.
Furthermore, I'm not the one getting upset. I solved my issue and moved on by using a different solution. You objected so I revisited and gave some (but not full) credit to the only link that offered anything actually useful.
There are plenty of times I answered a question and someone else's answer was accepted even though I thought mine was perfectly valid. I just moved along. I suggest you do the same.
I don't think I'll be able to add anything to this discussion. I'll not reply again. Feel free to get the last word if you wish.
I don't have access to your real world case. I only have access to what you provided on your question.
My regular expression works on your (not my) test data. I only copy/pasted from your post.
Verified in RegexBuddy, regex101.com and Visual Studio.
If it is different from the real world data, how can I (or anyone) provide a solution???
On your next questions, please read and try to provide a SSCCE.
Thank you.
My regular expression works on your (not my) test data. I only copy/pasted from your post.
Verified in RegexBuddy, regex101.com and Visual Studio.
If it is different from the real world data, how can I (or anyone) provide a solution???
On your next questions, please read and try to provide a SSCCE.
Thank you.
ASKER
Let's set aside for a moment that the proposed solution didn't work. It actually didn't even properly address the question which involved finding a specific block of text following an identifier. An alternate solution was found. However, I'm happy to wait a while longer for a more complete Regex based solution. Allow me to specify the requirements more fully. I've attached the actual TNS names file (appended with a .txt extension which normally isn't there).
Here are the requirements and restrictions:
1. I cannot in any way modify the file. I must read it as-is on the computer.
2. The file may or may not have additional characters following the last entry. These characters should be irrelevant.
3. I need to be able to extract the text within a balanced block of parentheses following an identifier. In the attached file there are 3 entries and the identifiers are:
NorthWind =
SouthWind =
WestWind =
The parenthesized block following any one of these (specified by user input) must be extracted. The outer parentheses are optional since I know I can add them back in if they are omitted.
If a C# code block produces the desired result on this website: http://rextester.com/ I will consider it a success.
tnsnames.ora.txt
Here are the requirements and restrictions:
1. I cannot in any way modify the file. I must read it as-is on the computer.
2. The file may or may not have additional characters following the last entry. These characters should be irrelevant.
3. I need to be able to extract the text within a balanced block of parentheses following an identifier. In the attached file there are 3 entries and the identifiers are:
NorthWind =
SouthWind =
WestWind =
The parenthesized block following any one of these (specified by user input) must be extracted. The outer parentheses are optional since I know I can add them back in if they are omitted.
If a C# code block produces the desired result on this website: http://rextester.com/ I will consider it a success.
tnsnames.ora.txt
As I said in my answer above if the pattern does not work with \n it simply means that you have Windows line endings (\r\n).
Here is the link to working proof: http://rextester.com/ADXFEV61390
I'm not a C# programmer so you'll need to write yourself the loop to find the rest of the matches.
If you can't do that, please post a new question in the appropriate TA.
Thank you.
Here is the link to working proof: http://rextester.com/ADXFEV61390
I'm not a C# programmer so you'll need to write yourself the loop to find the rest of the matches.
If you can't do that, please post a new question in the appropriate TA.
Thank you.
Dan's suggestion appears to work in regexhero.net, which is a Silverlight app (hence it uses .NET's regex engine). You could make the carriage returns optional ( \r? ) to account for either style of line ending:
Expresso should work as well if you account for the line ending issue that Dan mentioned.
^\w+ =\r?\n(.*?\)\r?\n\s*\))\r?\n\s*\r?\n
Expresso should work as well if you account for the line ending issue that Dan mentioned.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
This is almost perfect. By switching out the leading \w+ with the keyword identifier (SouthWind for example) I was able to match exactly the text I needed AND it uses balancing groups as requested. I modified it slightly to use capture groups. The final product looks like this:
(?:SouthWind\s+=\s+)(\([^( )]+(?>(?>( ?'open'\() [^()]*)+(? >(?'-open' \))[^()]*) +)+(?(open )(?!))\))
I'll programmatically drop in the appropriate identifier in place of SouthWind as needed using a simple string concatenation.
I'm normally pretty good with Regex but this one is a doozy.
(?:SouthWind\s+=\s+)(\([^(
I'll programmatically drop in the appropriate identifier in place of SouthWind as needed using a simple string concatenation.
I'm normally pretty good with Regex but this one is a doozy.
this one is a doozyWhich means that if you're using this in production code, then it is probably not the best approach. Regex is a very good and powerful tool, but that doesn't always mean it's the best tool. Can you really say that in six months you'll be able to digest that regex and know what it does or means? What about people coming behind you? Will they understand what it does?
If this is for some one-off, potentially throw-away utility, then it's of less consequence.
I think balancing groups are easy in theory but getting details right are tricky.
I always start from the same pre-made regex from the page I linked to earlier.
I always start from the same pre-made regex from the page I linked to earlier.
ASKER
Agreed. While I'm generally quite Regex adept I found exactly as you said, getting the details right is tricky. And since balancing groups aren't supported by most Regex flavors it's a bit of a specialized art.
I had a perfectly working piece of code that knew where to start based on a simple Regex and then just read each character until the parentheses balanced out. It's a simple loop operation in C#. Now that I also have a viable Regex sample my next step is to consider performance and try to throw a few curve balls at the solution to see how it behaves.
@käµfm³d 👽
I had a perfectly working piece of code that knew where to start based on a simple Regex and then just read each character until the parentheses balanced out. It's a simple loop operation in C#. Now that I also have a viable Regex sample my next step is to consider performance and try to throw a few curve balls at the solution to see how it behaves.
@käµfm³d 👽
Can you really say that in six months you'll be able to digest that regex and know what it does or means? What about people coming behind you? Will they understand what it does?That's what code comments are for. ;)
This is a post that I quote increasingly often: stackoverflow.com/question s/1732348/ regex-matc h-open-tag s-except-x html-self- contained- tags/17324 54#1732454
Regex is a beautiful tool. Just don't use it for everything, as it becomes clunky very quickly.
Regex is a beautiful tool. Just don't use it for everything, as it becomes clunky very quickly.
Good programmers comment why something is done, not what something is doing
= )
= )
Open in new window
HTH,
Dan