CSV Regular Expression with Curly Brace Groups

RyanAndres
RyanAndres used Ask the Experts™
on
Hi I need help parsing a group of comma delimited strings separated by { and }.

My regex for splitting is: ,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))

Sample source string is: "String",12345,{1,$2.00},"LastName, FirstName"

Currently it is splitting 5 items: [ ""String"" , "12345" , "{1" , "$2.00}" , "LastName, FirstName" ]

I want it to split 4 items: [ ""String"" , "12345" , "{1, $2.00}" , "LastName, FirstName" ]
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Terry WoodsIT Guru
Most Valuable Expert 2011

Commented:
Try this, using logic similar to what you've got for the quotes.
,(?=(?:[^"]*"[^"]*")*(?![^"]*"))(?=(?:[^{]*\{[^}]*\})*(?![^}]*\}))

Open in new window

Terry WoodsIT Guru
Most Valuable Expert 2011

Commented:
Escaping double quotes, like you've got in your example:
,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))(?=(?:[^{]*\{[^}]*\})*(?![^}]*\}))

Open in new window

Author

Commented:
That did not work. I tested it with Expresso and it yields the same results.
Expert Spotlight: Joe Anderson (DatabaseMX)

We’ve posted a new Expert Spotlight!  Joe Anderson (DatabaseMX) has been on Experts Exchange since 2006. Learn more about this database architect, guitar aficionado, and Microsoft MVP.

kaufmedGlanced up at my screen and thought I had coded the Matrix...  Turns out, I just fell asleep on the keyboard.
Most Valuable Expert 2011
Top Expert 2015

Commented:
>>  I tested it with Expresso and it yields the same results.

In my awe of TerryAtOpus' pattern, I tested it in VS and it returned the data as requested. But of course, a picture is worth a thousand words (attached). The only problem I found in TerryAtOpus' pattern was in the escaping of the double-quote (C#-related). I modified it as below to yield the results demonstrated in the image.
string pattern = @",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))(?=(?:[^{]*\{[^}]*\})*(?![^}]*\}))";

Open in new window

untitled.JPG

Author

Commented:
Your example works with your 'test' string. However it does not work with my test string:


Raw:
"A",1,"Bagels","Bagels","","Bagels",{1,$0.99},5,1,1,0,0,0,0,"",,$0.00,0,0,1,1,1,0,0,1,"Bagels",1,0,{},{},0,0

-----------------------------------------------------------------------------

Test Code:
[Test]
public void RegexPattern()
{
    string pattern = @",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))(?=(?:[^{]*\{[^}]*\})*(?![^}]*\}))";

    string string1 = @"""String"",12345,{1,$2.00},""LastName, FirstName""";
    string string2 = "\"A\",1,\"Bagels\",\"Bagels\",\"\",\"Bagels\",{1,$0.99},5,1,1,0,0,0,0,\"\",,$0.00,0,0,1,1,1,0,0,1,\"Bagels\",1,0,{},{},0,0";
    
    string[] parts1 = Regex.Split(string1, pattern);
    string[] parts2 = Regex.Split(string2, pattern);

    Assert.AreEqual(4, parts1.Length); // Passes
    Assert.AreEqual(32, parts2.Length); // Fails. Length is 33. '{1,$0.99}' is split into 2.
}

Open in new window

Glanced up at my screen and thought I had coded the Matrix...  Turns out, I just fell asleep on the keyboard.
Most Valuable Expert 2011
Top Expert 2015
Commented:
So what about something like this?
class Program
{
    static void Main(string[] args)
    {
        string test = "\"A\",1,\"Bagels\",\"Bagels\",\"\",\"Bagels\",{1,$0.99},5,1,1,0,0,0,0,\"\",,$0.00,0,0,1,1,1,0,0,1,\"Bagels\",1,0,{},{},0,0";
        Dictionary<char, char> delimiters = new Dictionary<char, char>();

        delimiters.Add('"', '"');   // " matches "
        delimiters.Add('}', '{');   // } matches {
        delimiters.Add('{', '}');   // { matches }

        string[] results = Split(test, ',', delimiters);

        foreach (string result in results)
        {
            Console.WriteLine(result);
        }

        Console.ReadKey();
    }

    static string[] Split(string source, char splitChar, Dictionary<char, char> delimiters)
    {
        List<string> temp = new List<string>();
        Stack<char> delimStack = new Stack<char>();
        int substIndex = 0;

        for (int i = 0; i < source.Length; i++)
        {
            bool isDelimiter = delimiters.Keys.Contains(source[i]);

            if (isDelimiter && delimStack.Count == 0)
            {
                delimStack.Push(source[i]);
            }
            else if (isDelimiter && delimStack.Peek() != delimiters[source[i]])
            {
                delimStack.Push(source[i]);
            }
            else if (isDelimiter)
            {
                delimStack.Pop();
            }
            else if (source[i] == splitChar && delimStack.Count == 0)
            {
                temp.Add(source.Substring(substIndex, i - substIndex));
                substIndex = i + 1;
            }
        }

        return temp.ToArray();
    }
}

Open in new window

Author

Commented:
Wow kaufmed! That does it! Only one thing though, why won't it include the last item?

ie.

"Test",12345,$2.00,2

'2' is excluded and the array returned from Split is a Length of 3
kaufmedGlanced up at my screen and thought I had coded the Matrix...  Turns out, I just fell asleep on the keyboard.
Most Valuable Expert 2011
Top Expert 2015

Commented:
Add the following @ line 50:
temp.Add(source.Substring(substIndex));

Open in new window

Author

Commented:
You're a rock star. I love your implementation. I need to study it more!

Your solution is working perfectly and I think it's faster than the regex!
kaufmedGlanced up at my screen and thought I had coded the Matrix...  Turns out, I just fell asleep on the keyboard.
Most Valuable Expert 2011
Top Expert 2015

Commented:
:D

Believe me, I love using regexes, but when it comes to parsing against balanced brackets/delimiters I sometimes find it easier to code a stack-based solution.

Glad to help  :)


@TerryAtOpus

That pattern was sick. I was too tired to wrap my head around the internals last night when I read your post, but the result was impressive  :)

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial