fischermx
asked on
Regex to get groups of quoted words within parenthesis.
Someone suggested this regex:
@"(([^""(]*?(""([^""]+)"". *?)+)+)";
But it produce too many groups.
I reduced it to :
@"([^""(]*?(""[^""]+"".*?) +)+";
But still produces two groups for each match success.
May be I'm not using the C# code that I need.
I have this code so far:
private static string str = "blah, blah, new Array(\"word1\", \"word2\", \"word3\"), new Array(\"word4\", \"word5\", \"word6\")), blah, blah";
// private static string regPattern = @"(([^""(]*?(""([^""]+)"". *?)+)+)";
private static string regPattern = @"([^""(]*?(""[^""]+"".*?) +)+";
private static void Test1()
{
string text = str;
string pat = regPattern;
Regex r = new Regex(pat);
// get the list of group numbers
int[] gnums = r.GetGroupNumbers();
// get first match
Match m = r.Match(text);
while (m.Success)
{
Console.WriteLine("------- start success --------------");
// start at group 1
for (int i = 1; i < gnums.Length; i++)
{
Group g = m.Groups[gnums[i]];
// get the group for this match
Console.WriteLine("Group" + gnums[i] + "=[" + g.ToString() + "]");
// get caps for this group
CaptureCollection cc = g.Captures;
for (int j = 0; j < cc.Count; j++)
{
Capture c = cc[j];
Console.WriteLine(" Capture" + j + "├" + c.ToString()
+ "┤ Index=" + c.Index + " Length=" + c.Length);
}
}
// get next match
m = m.NextMatch();
}
}
}
I need to have "word1", "word2" and "word3" in one group and "word4", "word5" and "word6" in another.
@"(([^""(]*?(""([^""]+)"".
But it produce too many groups.
I reduced it to :
@"([^""(]*?(""[^""]+"".*?)
But still produces two groups for each match success.
May be I'm not using the C# code that I need.
I have this code so far:
private static string str = "blah, blah, new Array(\"word1\", \"word2\", \"word3\"), new Array(\"word4\", \"word5\", \"word6\")), blah, blah";
// private static string regPattern = @"(([^""(]*?(""([^""]+)"".
private static string regPattern = @"([^""(]*?(""[^""]+"".*?)
private static void Test1()
{
string text = str;
string pat = regPattern;
Regex r = new Regex(pat);
// get the list of group numbers
int[] gnums = r.GetGroupNumbers();
// get first match
Match m = r.Match(text);
while (m.Success)
{
Console.WriteLine("-------
// start at group 1
for (int i = 1; i < gnums.Length; i++)
{
Group g = m.Groups[gnums[i]];
// get the group for this match
Console.WriteLine("Group" + gnums[i] + "=[" + g.ToString() + "]");
// get caps for this group
CaptureCollection cc = g.Captures;
for (int j = 0; j < cc.Count; j++)
{
Capture c = cc[j];
Console.WriteLine(" Capture" + j + "├" + c.ToString()
+ "┤ Index=" + c.Index + " Length=" + c.Length);
}
}
// get next match
m = m.NextMatch();
}
}
}
I need to have "word1", "word2" and "word3" in one group and "word4", "word5" and "word6" in another.
If that's the case, in testing this worked:
\((.*?)\)
Matches(0) = ("word1", "word2", "word3")
Matches(0).SubMatches(0) = "word1", "word2", "word3"
Matches(1) = ("word4", "word5", "word6")
Matches(1).SubMatches(0) = "word4", "word5", "word6"
\((.*?)\)
Matches(0) = ("word1", "word2", "word3")
Matches(0).SubMatches(0) = "word1", "word2", "word3"
Matches(1) = ("word4", "word5", "word6")
Matches(1).SubMatches(0) = "word4", "word5", "word6"
ASKER
How did you get that? I mean, in code in C#... how?
I tested that in ASP, you could test it online:
http://regexlib.com/RETester.aspx
Or you could download an app test it offline:
http://www.ultrapico.com/ExpressoDownload.htm
Given the general syntax rules here:
http://regexlib.com/CheatSheet.aspx
\( = match starting with parens
( = start of group
.* = any characters 0 or more times
? = minimally match
) = end of group
\) = match ending with parens
Note the parens are escaped with \ since they are special characters.
http://regexlib.com/RETester.aspx
Or you could download an app test it offline:
http://www.ultrapico.com/ExpressoDownload.htm
Given the general syntax rules here:
http://regexlib.com/CheatSheet.aspx
\( = match starting with parens
( = start of group
.* = any characters 0 or more times
? = minimally match
) = end of group
\) = match ending with parens
Note the parens are escaped with \ since they are special characters.
ASKER
Ok, but I still don't know how to test that.
This expression "Matches(0)" it's invalid in C#.
This expression "Matches(0)" it's invalid in C#.
ASKER
In my code, I just see those long strings:
"word1", "word2", "word3"
and
"word4", "word5", "word6"
I don't see how do I access to each individual item. Please show how to do that in C#.
"word1", "word2", "word3"
and
"word4", "word5", "word6"
I don't see how do I access to each individual item. Please show how to do that in C#.
Matches(x) is my named array in my ASP code, used to represent a match.
Here's a regex code routine in C# that looks much like the code you are using already:
http://www.windowsdevcenter.com/pub/a/oreilly/windows/news/csharp_0101.html
I thought you wanted to match on:
"word1", "word2", "word3"
and
"word4", "word5", "word6"
Instead I understand now that you want to match on
"word1"
"word2"
"word3"
and
"word4"
"word5"
"word6"
Here's a regex code routine in C# that looks much like the code you are using already:
http://www.windowsdevcenter.com/pub/a/oreilly/windows/news/csharp_0101.html
I thought you wanted to match on:
"word1", "word2", "word3"
and
"word4", "word5", "word6"
Instead I understand now that you want to match on
"word1"
"word2"
"word3"
and
"word4"
"word5"
"word6"
ASKER
Yes, that's the code I'm using, I took it from there.
I told I want them in groups. But being in a whole string, it is not a group.
I told I want them in groups. But being in a whole string, it is not a group.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
That's not what I want at all. There's text outside the arrays declaration and they might be quoted and I don't want that text matching.
Also, the two array declarations (they are just two) are almost always the same side, and I need them to be treated apart so that later I make pairs taking an item from each of them. If some array its bigger than other, then some items must be left out. But if I take all the quoted words in one single group, I can't do the pairings.
Also, the two array declarations (they are just two) are almost always the same side, and I need them to be treated apart so that later I make pairs taking an item from each of them. If some array its bigger than other, then some items must be left out. But if I take all the quoted words in one single group, I can't do the pairings.
>That's not what I want at all.
OK.
>There's text outside the arrays declaration and they might be quoted and I don't want that text matching.
i.e. my comment from my last post: "As you see from the submatches, looping through the matches, starting at a match equal to "Array(" all of your words in the array will be the submatches(1) until the match equal to ")"."
>Also, the two array declarations (they are just two) are almost always the same side, and I need them to be treated apart so that later I make pairs taking an item from each of them. If some array its bigger than other, then some items must be left out. But if I take all the quoted words in one single group, I can't do the pairings.
Please explain specifically your needs with this so that someone might complete the pattern you need. For example, will there always be two arrays in your string to test for, and if not, how will one know which arrays to use for your matches, selection of elements to drop from the submatches etc?
OK.
>There's text outside the arrays declaration and they might be quoted and I don't want that text matching.
i.e. my comment from my last post: "As you see from the submatches, looping through the matches, starting at a match equal to "Array(" all of your words in the array will be the submatches(1) until the match equal to ")"."
>Also, the two array declarations (they are just two) are almost always the same side, and I need them to be treated apart so that later I make pairs taking an item from each of them. If some array its bigger than other, then some items must be left out. But if I take all the quoted words in one single group, I can't do the pairings.
Please explain specifically your needs with this so that someone might complete the pattern you need. For example, will there always be two arrays in your string to test for, and if not, how will one know which arrays to use for your matches, selection of elements to drop from the submatches etc?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
blah, blah, new Array("word1", "word2", "word3"), new Array("word4", "word5", "word6"), blah, blah
From which you want:
groups(0):"word1", "word2", "word3"
groups(1):"word4", "word5", "word6"