Improve company productivity with a Business Account.Sign Up

x
?
Solved

Split string on commas but not when enclosed in parentheses

Posted on 2016-11-28
7
Medium Priority
?
152 Views
Last Modified: 2016-11-29
Given the following input:
	[COMPANY] [VARCHAR](64) NULL,
	[BANKCODE] [CHAR](4) NULL,
	[OPENDATE] [DATETIME] NULL,
	[REFERENCENUMBER] [NUMERIC](13, 0) NULL,
	[VENDORNAME] [CHAR](45) NULL,
	[AMOUNT] [NUMERIC](13, 2) NULL,
	[EMAILADDRESS] [VARCHAR](256) NULL

Open in new window

A basic split would be like this:
            string[] columnDefinitions = inputString.Split(new char[] {','});
            foreach (string s in columnDefinitions)
            {
                Console.WriteLine(s.Trim());
            }

Open in new window

But that splits on every comma and produces the following output (each line represents a single item in the array):
[COMPANY] [VARCHAR](64) NULL
[BANKCODE] [CHAR](4) NULL
[OPENDATE] [DATETIME] NULL
[REFERENCENUMBER] [NUMERIC](13
0) NULL
[VENDORNAME] [CHAR](45) NULL
[AMOUNT] [NUMERIC](13
2) NULL
[EMAILADDRESS] [VARCHAR](256) NULL

Open in new window

What I really want is for it to only split on commas that are not inside parentheses. I'm happy to use Regex for this. I'm sure there's a fairly easy pattern for it. The desired output would be a string array containing this:
[COMPANY] [VARCHAR](64) NULL
[BANKCODE] [CHAR](4) NULL
[OPENDATE] [DATETIME] NULL
[REFERENCENUMBER] [NUMERIC](13, 0) NULL
[VENDORNAME] [CHAR](45) NULL
[AMOUNT] [NUMERIC](13, 2) NULL
[EMAILADDRESS] [VARCHAR](256) NULL

Open in new window

0
Comment
Question by:Russ Suter
7 Comments
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 41904879
I'm sure I'm missing something.

Why don't you just split on \n? (endline character).

HTH,
Dan
0
 
LVL 20

Author Comment

by:Russ Suter
ID: 41904921
Indeed you are missing something. SQL works perfectly well if all statements are on a single line. The following 3 statements are identical as far as SQL is concerned:
CREATE TABLE FOO(
	[COMPANY] [VARCHAR](64) NULL,
	[BANKCODE] [CHAR](4) NULL
)

Open in new window

CREATE TABLE FOO(	[COMPANY] [VARCHAR](64) NULL, [BANKCODE] [CHAR](4) NULL)

Open in new window

CREATE TABLE FOO(	[COMPANY] [VARCHAR](64) NULL,[BANKCODE] [CHAR](4) NULL)

Open in new window

Note in the 3rd example there need not even be whitespace between the column definitions. I have no way of guaranteeing that the input will be formatted in any specific way so the split needs to strictly adhere to SQL parsing rules.
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 41904930
Try this:
(\[.*?\)\s?(?:NULL)?),
The results should be in group 1.
regex
0
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

 
LVL 20

Author Comment

by:Russ Suter
ID: 41904954
There's no guarantee that the word NULL will be there either.
0
 
LVL 53

Expert Comment

by:Gustav Brock
ID: 41905339
You could just parse the SQL skipping the pertinent commas:
string sql = "[COMPANY] [VARCHAR](64) NULL,[BANKCODE] [CHAR](4) NULL,[OPENDATE] [DATETIME] NULL, 	[REFERENCENUMBER] [NUMERIC] (13, 0) NULL, 	[VENDORNAME] [CHAR] (45) NULL, 	[AMOUNT] [NUMERIC] (13, 2) NULL, 	[EMAILADDRESS] [VARCHAR] (256) NULL";

System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
byte[] bytes = encoding.GetBytes(sql);
int part = 0;
List<string> sqlParts = new List<string>();
bool skipComma = false;

foreach (byte b in bytes)
{
	if (sqlParts.Count != part+1)
	{
		sqlParts.Add(string.Empty);		
	}
        // Skip splitting by comma if we are inside a set of parenthesis.
	if (b == 40)
	{
		skipComma = true;
	}
	else if (b == 41)
	{
		skipComma=false;
	}

        // Cut the comma and add a new line of SQL 
        // or append the char to the current line of SQL.
	if (skipComma == false && b == 44)
	{
		part++;
	}
	else
	{
		sqlParts[part] = (sqlParts[part] += Convert.ToChar(b)).TrimStart();
	}
}

// List the SQL lines.
for (int i = 0; i <= part; i++)
{
	sqlParts[i].Dump();
} 

Open in new window

This will produce:
[COMPANY] [VARCHAR](64) NULL
[BANKCODE] [CHAR](4) NULL
[OPENDATE] [DATETIME] NULL
[REFERENCENUMBER] [NUMERIC] (13, 0) NULL
[VENDORNAME] [CHAR] (45) NULL
[AMOUNT] [NUMERIC] (13, 2) NULL
[EMAILADDRESS] [VARCHAR] (256) NULL

Open in new window

/gustav
0
 
LVL 64

Accepted Solution

by:
Fernando Soto earned 2000 total points
ID: 41905828
Hi Russ;

This code snippet should do what you need.
// Input string
var input = "[COMPANY] [VARCHAR](64) NULL, [BANKCODE] [CHAR](4) NULL, [OPENDATE] [DATETIME] NULL, [REFERENCENUMBER] [NUMERIC](13, 0) NULL, [VENDORNAME] [CHAR](45) NULL, [AMOUNT] [NUMERIC](13, 2) NULL, [EMAILADDRESS] [VARCHAR](256) NULL";
// Build new string
StringBuilder sb = new StringBuilder();
// Used to bypass the , inside of ( ... )
bool bypass = false;

foreach(var c in input) {
  // Switch on or off bypass depending on character ( or )
	if (c == '(' || c == ')') {
    bypass = !bypass;
    sb.Append(c);
    continue;
	}
	
	if (c == ',' && bypass == false)
	  // bypass ,
    sb.Append(' ');
	else
	  // Don't bypass ,
    sb.Append(c);	
}

// Display the new string
Console.WriteLine(sb.ToString());

Open in new window

0
 
LVL 20

Author Closing Comment

by:Russ Suter
ID: 41905921
I went with a modified version of Fernando's approach. It looks like this:

private List<string> ExtractIndividualColumnDefinitions(string columnDefinitions)
        {
            int parenLevel = 0;
            List<string> resultSet = new List<string>();
            string currentColumnDefinition = string.Empty;
            for (int i = 1; i < columnDefinitions.Length - 1; ++i)
            {
                if (columnDefinitions[i] == ',' && parenLevel == 0)
                {
                    resultSet.Add(currentColumnDefinition.Trim());
                    currentColumnDefinition = string.Empty;
                }
                else
                {
                    if (columnDefinitions[i] == '(')
                    {
                        ++parenLevel;
                    }
                    else if (columnDefinitions[i] == ')')
                    {
                        --parenLevel;
                    }
                    currentColumnDefinition += columnDefinitions[i];
                }
            }
            if (!string.IsNullOrEmpty(currentColumnDefinition))
            {
                resultSet.Add(currentColumnDefinition.Trim());
            }
            return resultSet;
        }

I'm still fairly sure there's a workable Regex solution to this problem but the Regex is probably more complex than my knowledge level and I just need to move on.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
In real business world data are crucial and sometimes data are shared among different information systems. Hence, an agreeable file transfer protocol need to be established.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

595 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question