Link to home
Start Free TrialLog in
Avatar of fischermx
fischermxFlag for Mexico

asked on

Regex to get groups of quoted words within parenthesis.

Someone suggested this regex:
@"(([^""(]*?(""([^""]+)"".*?)+)+)";
But it produce too many groups.

I reduced it to :
@"([^""(]*?(""[^""]+"".*?)+)+";
But still produces two groups for each match success.

May be I'm not using the C# code that I need.
I have this code so far:

        private static string str = "blah, blah, new Array(\"word1\", \"word2\", \"word3\"), new Array(\"word4\", \"word5\", \"word6\")), blah, blah";
        // private static string regPattern = @"(([^""(]*?(""([^""]+)"".*?)+)+)";
        private static string regPattern = @"([^""(]*?(""[^""]+"".*?)+)+";

        private static void Test1()
        {
            string text = str;
            string pat = regPattern;

            Regex r = new Regex(pat);
            // get the list of group numbers
            int[] gnums = r.GetGroupNumbers();
            // get first match
            Match m = r.Match(text);
            while (m.Success)
            {
                Console.WriteLine("------- start success --------------");
                // start at group 1
                for (int i = 1; i < gnums.Length; i++)
                {
                    Group g = m.Groups[gnums[i]];
                    // get the group for this match
                    Console.WriteLine("Group" + gnums[i] + "=[" + g.ToString() + "]");
                    // get caps for this group
                    CaptureCollection cc = g.Captures;
                    for (int j = 0; j < cc.Count; j++)
                    {
                        Capture c = cc[j];
                        Console.WriteLine("      Capture" + j + "&#9500;" + c.ToString()
                           + "&#9508; Index=" + c.Index + " Length=" + c.Length);
                    }
                }
                // get next match
                m = m.NextMatch();
            }
        }
    }

I need to have "word1", "word2" and "word3" in one group and "word4", "word5" and "word6"  in another.

Avatar of ddrudik
ddrudik
Flag of United States of America image

To confirm your starting string is:
blah, blah, new Array("word1", "word2", "word3"), new Array("word4", "word5", "word6"), blah, blah
From which you want:
groups(0):"word1", "word2", "word3"
groups(1):"word4", "word5", "word6"


If that's the case, in testing this worked:
\((.*?)\)

Matches(0) = ("word1", "word2", "word3")
Matches(0).SubMatches(0) = "word1", "word2", "word3"
Matches(1) = ("word4", "word5", "word6")
Matches(1).SubMatches(0) = "word4", "word5", "word6"
Avatar of fischermx

ASKER

How did you get that? I mean, in code in C#... how?
I tested that in ASP, you could test it online:
http://regexlib.com/RETester.aspx

Or you could download an app test it offline:
http://www.ultrapico.com/ExpressoDownload.htm

Given the general syntax rules here:
http://regexlib.com/CheatSheet.aspx

\( = match starting with parens
( = start of group
.* = any characters 0 or more times
? = minimally match
) = end of group
\) = match ending with parens

Note the parens are escaped with \ since they are special characters.
Ok, but I still don't know how to test that.
This expression "Matches(0)" it's invalid in C#.
In my code, I just see those long strings:
"word1", "word2", "word3"
and
"word4", "word5", "word6"

I don't see how do I access to each individual item. Please show how to do that in C#.
Matches(x) is my named array in my ASP code, used to represent a match.
Here's a regex code routine in C# that looks much like the code you are using already:
http://www.windowsdevcenter.com/pub/a/oreilly/windows/news/csharp_0101.html

I thought you wanted to match on:
"word1", "word2", "word3"
and
"word4", "word5", "word6"

Instead I understand now that you want to match on
"word1"
"word2"
"word3"
and
"word4"
"word5"
"word6"
Yes, that's the code I'm using, I took it from there.
I told I want them in groups. But being in a whole string, it is not a group.
SOLUTION
Avatar of ddrudik
ddrudik
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
That's not what I want at all. There's text outside the arrays declaration and they might be quoted and I don't want that text matching.
Also, the two array declarations (they are just two) are almost always the same side, and I need them to be treated apart so that later I make pairs taking an item from each of them. If some array its bigger than other, then some items must be left out. But if I take all the quoted words in one single group, I can't do the pairings.
>That's not what I want at all.
OK.

>There's text outside the arrays declaration and they might be quoted and I don't want that text matching.
i.e. my comment from my last post: "As you see from the submatches, looping through the matches, starting at a match equal to "Array(" all of your words in the array will be the submatches(1) until the match equal to ")"."

>Also, the two array declarations (they are just two) are almost always the same side, and I need them to be treated apart so that later I make pairs taking an item from each of them. If some array its bigger than other, then some items must be left out. But if I take all the quoted words in one single group, I can't do the pairings.
Please explain specifically your needs with this so that someone might complete the pattern you need.  For example, will there always be two arrays in your string to test for, and if not, how will one know which arrays to use for your matches, selection of elements to drop from the submatches etc?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial