Regex to get groups of quoted words within parenthesis.

fischermx
fischermx used Ask the Experts™
on
Someone suggested this regex:
@"(([^""(]*?(""([^""]+)"".*?)+)+)";
But it produce too many groups.

I reduced it to :
@"([^""(]*?(""[^""]+"".*?)+)+";
But still produces two groups for each match success.

May be I'm not using the C# code that I need.
I have this code so far:

        private static string str = "blah, blah, new Array(\"word1\", \"word2\", \"word3\"), new Array(\"word4\", \"word5\", \"word6\")), blah, blah";
        // private static string regPattern = @"(([^""(]*?(""([^""]+)"".*?)+)+)";
        private static string regPattern = @"([^""(]*?(""[^""]+"".*?)+)+";

        private static void Test1()
        {
            string text = str;
            string pat = regPattern;

            Regex r = new Regex(pat);
            // get the list of group numbers
            int[] gnums = r.GetGroupNumbers();
            // get first match
            Match m = r.Match(text);
            while (m.Success)
            {
                Console.WriteLine("------- start success --------------");
                // start at group 1
                for (int i = 1; i < gnums.Length; i++)
                {
                    Group g = m.Groups[gnums[i]];
                    // get the group for this match
                    Console.WriteLine("Group" + gnums[i] + "=[" + g.ToString() + "]");
                    // get caps for this group
                    CaptureCollection cc = g.Captures;
                    for (int j = 0; j < cc.Count; j++)
                    {
                        Capture c = cc[j];
                        Console.WriteLine("      Capture" + j + "&#9500;" + c.ToString()
                           + "&#9508; Index=" + c.Index + " Length=" + c.Length);
                    }
                }
                // get next match
                m = m.NextMatch();
            }
        }
    }

I need to have "word1", "word2" and "word3" in one group and "word4", "word5" and "word6"  in another.

Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
To confirm your starting string is:
blah, blah, new Array("word1", "word2", "word3"), new Array("word4", "word5", "word6"), blah, blah
From which you want:
groups(0):"word1", "word2", "word3"
groups(1):"word4", "word5", "word6"


If that's the case, in testing this worked:
\((.*?)\)

Matches(0) = ("word1", "word2", "word3")
Matches(0).SubMatches(0) = "word1", "word2", "word3"
Matches(1) = ("word4", "word5", "word6")
Matches(1).SubMatches(0) = "word4", "word5", "word6"

Author

Commented:
How did you get that? I mean, in code in C#... how?
JavaScript Best Practices

Save hours in development time and avoid common mistakes by learning the best practices to use for JavaScript.

I tested that in ASP, you could test it online:
http://regexlib.com/RETester.aspx

Or you could download an app test it offline:
http://www.ultrapico.com/ExpressoDownload.htm

Given the general syntax rules here:
http://regexlib.com/CheatSheet.aspx

\( = match starting with parens
( = start of group
.* = any characters 0 or more times
? = minimally match
) = end of group
\) = match ending with parens

Note the parens are escaped with \ since they are special characters.

Author

Commented:
Ok, but I still don't know how to test that.
This expression "Matches(0)" it's invalid in C#.

Author

Commented:
In my code, I just see those long strings:
"word1", "word2", "word3"
and
"word4", "word5", "word6"

I don't see how do I access to each individual item. Please show how to do that in C#.
Matches(x) is my named array in my ASP code, used to represent a match.
Here's a regex code routine in C# that looks much like the code you are using already:
http://www.windowsdevcenter.com/pub/a/oreilly/windows/news/csharp_0101.html

I thought you wanted to match on:
"word1", "word2", "word3"
and
"word4", "word5", "word6"

Instead I understand now that you want to match on
"word1"
"word2"
"word3"
and
"word4"
"word5"
"word6"

Author

Commented:
Yes, that's the code I'm using, I took it from there.
I told I want them in groups. But being in a whole string, it is not a group.
There's a number of ways to slice this, here's one pattern:
(Array\(|\))|\"(.*?)\"

With this test string:
blah, blah, new Array("word1", "word2"), new Array("word4", "word5", "word6"), blah, blah, new Array("word7", "word8", "word9", "word10"), blah, blah, "wordx"

The pattern produced these matches/submatches for me:
Matches(0) = "Array("
Matches(0).SubMatches(0) = "Array("
Matches(0).SubMatches(1) = ""
Matches(1) = ""word1""
Matches(1).SubMatches(0) = ""
Matches(1).SubMatches(1) = "word1"
Matches(2) = ""word2""
Matches(2).SubMatches(0) = ""
Matches(2).SubMatches(1) = "word2"
Matches(3) = ")"
Matches(3).SubMatches(0) = ")"
Matches(3).SubMatches(1) = ""
Matches(4) = "Array("
Matches(4).SubMatches(0) = "Array("
Matches(4).SubMatches(1) = ""
Matches(5) = ""word4""
Matches(5).SubMatches(0) = ""
Matches(5).SubMatches(1) = "word4"
Matches(6) = ""word5""
Matches(6).SubMatches(0) = ""
Matches(6).SubMatches(1) = "word5"
Matches(7) = ""word6""
Matches(7).SubMatches(0) = ""
Matches(7).SubMatches(1) = "word6"
Matches(8) = ")"
Matches(8).SubMatches(0) = ")"
Matches(8).SubMatches(1) = ""
Matches(9) = "Array("
Matches(9).SubMatches(0) = "Array("
Matches(9).SubMatches(1) = ""
Matches(10) = ""word7""
Matches(10).SubMatches(0) = ""
Matches(10).SubMatches(1) = "word7"
Matches(11) = ""word8""
Matches(11).SubMatches(0) = ""
Matches(11).SubMatches(1) = "word8"
Matches(12) = ""word9""
Matches(12).SubMatches(0) = ""
Matches(12).SubMatches(1) = "word9"
Matches(13) = ""word10""
Matches(13).SubMatches(0) = ""
Matches(13).SubMatches(1) = "word10"
Matches(14) = ")"
Matches(14).SubMatches(0) = ")"
Matches(14).SubMatches(1) = ""
Matches(15) = ""wordx""
Matches(15).SubMatches(0) = ""
Matches(15).SubMatches(1) = "wordx"

As you see from the submatches, looping through the matches, starting at a match equal to "Array(" all of your words in the array will be the submatches(1) until the match equal to ")".


Author

Commented:
That's not what I want at all. There's text outside the arrays declaration and they might be quoted and I don't want that text matching.
Also, the two array declarations (they are just two) are almost always the same side, and I need them to be treated apart so that later I make pairs taking an item from each of them. If some array its bigger than other, then some items must be left out. But if I take all the quoted words in one single group, I can't do the pairings.
>That's not what I want at all.
OK.

>There's text outside the arrays declaration and they might be quoted and I don't want that text matching.
i.e. my comment from my last post: "As you see from the submatches, looping through the matches, starting at a match equal to "Array(" all of your words in the array will be the submatches(1) until the match equal to ")"."

>Also, the two array declarations (they are just two) are almost always the same side, and I need them to be treated apart so that later I make pairs taking an item from each of them. If some array its bigger than other, then some items must be left out. But if I take all the quoted words in one single group, I can't do the pairings.
Please explain specifically your needs with this so that someone might complete the pattern you need.  For example, will there always be two arrays in your string to test for, and if not, how will one know which arrays to use for your matches, selection of elements to drop from the submatches etc?
Commented:
using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;

namespace ConsoleApplication2
{
    class Program
    {
        private static string str = "blah, blah, new Array(\"word1\", \"word2\", \"word3\"), new Array(\"word4\", \"word5\", \"word6\")), blah, blah";
        // private static string regPattern = @"(([^""(]*?(""([^""]+)"".*?)+)+)";
        private static string regPattern = "\\((?:[^\"(]*?\"([^\"]+?)\")+\\)";

        private static void Test1()
        {
            string text = str;
            string pat = regPattern;

            Regex r = new Regex(pat);
            // get the list of group numbers
            int[] gnums = r.GetGroupNumbers();
            // get first match
            Match m = r.Match(text);
            while (m.Success)
            {
                Console.WriteLine("------- start success --------------");
                // start at group 1
                for (int i = 0; i < gnums.Length; i++)
                {
                    Group g = m.Groups[gnums[i]];
                    // get the group for this match
                    Console.WriteLine("Group" + gnums[i] + "=[" + g.ToString() + "]");
                    // get caps for this group
                    CaptureCollection cc = g.Captures;
                    for (int j = 0; j < cc.Count; j++)
                    {
                        Capture c = cc[j];
                        Console.WriteLine("      Capture" + j + "&#9500;" + c.ToString()
                           + "&#9508; Index=" + c.Index + " Length=" + c.Length);
                    }
                }
                // get next match
                m = m.NextMatch();
            }
        }

        static void Main(string[] args)
        {
            Test1();
            Console.Read();
        }
    }
}

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial