Solved

Split string on commas but not when enclosed in parentheses

Posted on 2016-11-28
7
67 Views
Last Modified: 2016-11-29
Given the following input:
	[COMPANY] [VARCHAR](64) NULL,
	[BANKCODE] [CHAR](4) NULL,
	[OPENDATE] [DATETIME] NULL,
	[REFERENCENUMBER] [NUMERIC](13, 0) NULL,
	[VENDORNAME] [CHAR](45) NULL,
	[AMOUNT] [NUMERIC](13, 2) NULL,
	[EMAILADDRESS] [VARCHAR](256) NULL

Open in new window

A basic split would be like this:
            string[] columnDefinitions = inputString.Split(new char[] {','});
            foreach (string s in columnDefinitions)
            {
                Console.WriteLine(s.Trim());
            }

Open in new window

But that splits on every comma and produces the following output (each line represents a single item in the array):
[COMPANY] [VARCHAR](64) NULL
[BANKCODE] [CHAR](4) NULL
[OPENDATE] [DATETIME] NULL
[REFERENCENUMBER] [NUMERIC](13
0) NULL
[VENDORNAME] [CHAR](45) NULL
[AMOUNT] [NUMERIC](13
2) NULL
[EMAILADDRESS] [VARCHAR](256) NULL

Open in new window

What I really want is for it to only split on commas that are not inside parentheses. I'm happy to use Regex for this. I'm sure there's a fairly easy pattern for it. The desired output would be a string array containing this:
[COMPANY] [VARCHAR](64) NULL
[BANKCODE] [CHAR](4) NULL
[OPENDATE] [DATETIME] NULL
[REFERENCENUMBER] [NUMERIC](13, 0) NULL
[VENDORNAME] [CHAR](45) NULL
[AMOUNT] [NUMERIC](13, 2) NULL
[EMAILADDRESS] [VARCHAR](256) NULL

Open in new window

0
Comment
Question by:Russ Suter
7 Comments
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 41904879
I'm sure I'm missing something.

Why don't you just split on \n? (endline character).

HTH,
Dan
0
 
LVL 20

Author Comment

by:Russ Suter
ID: 41904921
Indeed you are missing something. SQL works perfectly well if all statements are on a single line. The following 3 statements are identical as far as SQL is concerned:
CREATE TABLE FOO(
	[COMPANY] [VARCHAR](64) NULL,
	[BANKCODE] [CHAR](4) NULL
)

Open in new window

CREATE TABLE FOO(	[COMPANY] [VARCHAR](64) NULL, [BANKCODE] [CHAR](4) NULL)

Open in new window

CREATE TABLE FOO(	[COMPANY] [VARCHAR](64) NULL,[BANKCODE] [CHAR](4) NULL)

Open in new window

Note in the 3rd example there need not even be whitespace between the column definitions. I have no way of guaranteeing that the input will be formatted in any specific way so the split needs to strictly adhere to SQL parsing rules.
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 41904930
Try this:
(\[.*?\)\s?(?:NULL)?),
The results should be in group 1.
regex
0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 
LVL 20

Author Comment

by:Russ Suter
ID: 41904954
There's no guarantee that the word NULL will be there either.
0
 
LVL 50

Expert Comment

by:Gustav Brock
ID: 41905339
You could just parse the SQL skipping the pertinent commas:
string sql = "[COMPANY] [VARCHAR](64) NULL,[BANKCODE] [CHAR](4) NULL,[OPENDATE] [DATETIME] NULL, 	[REFERENCENUMBER] [NUMERIC] (13, 0) NULL, 	[VENDORNAME] [CHAR] (45) NULL, 	[AMOUNT] [NUMERIC] (13, 2) NULL, 	[EMAILADDRESS] [VARCHAR] (256) NULL";

System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
byte[] bytes = encoding.GetBytes(sql);
int part = 0;
List<string> sqlParts = new List<string>();
bool skipComma = false;

foreach (byte b in bytes)
{
	if (sqlParts.Count != part+1)
	{
		sqlParts.Add(string.Empty);		
	}
        // Skip splitting by comma if we are inside a set of parenthesis.
	if (b == 40)
	{
		skipComma = true;
	}
	else if (b == 41)
	{
		skipComma=false;
	}

        // Cut the comma and add a new line of SQL 
        // or append the char to the current line of SQL.
	if (skipComma == false && b == 44)
	{
		part++;
	}
	else
	{
		sqlParts[part] = (sqlParts[part] += Convert.ToChar(b)).TrimStart();
	}
}

// List the SQL lines.
for (int i = 0; i <= part; i++)
{
	sqlParts[i].Dump();
} 

Open in new window

This will produce:
[COMPANY] [VARCHAR](64) NULL
[BANKCODE] [CHAR](4) NULL
[OPENDATE] [DATETIME] NULL
[REFERENCENUMBER] [NUMERIC] (13, 0) NULL
[VENDORNAME] [CHAR] (45) NULL
[AMOUNT] [NUMERIC] (13, 2) NULL
[EMAILADDRESS] [VARCHAR] (256) NULL

Open in new window

/gustav
0
 
LVL 63

Accepted Solution

by:
Fernando Soto earned 500 total points
ID: 41905828
Hi Russ;

This code snippet should do what you need.
// Input string
var input = "[COMPANY] [VARCHAR](64) NULL, [BANKCODE] [CHAR](4) NULL, [OPENDATE] [DATETIME] NULL, [REFERENCENUMBER] [NUMERIC](13, 0) NULL, [VENDORNAME] [CHAR](45) NULL, [AMOUNT] [NUMERIC](13, 2) NULL, [EMAILADDRESS] [VARCHAR](256) NULL";
// Build new string
StringBuilder sb = new StringBuilder();
// Used to bypass the , inside of ( ... )
bool bypass = false;

foreach(var c in input) {
  // Switch on or off bypass depending on character ( or )
	if (c == '(' || c == ')') {
    bypass = !bypass;
    sb.Append(c);
    continue;
	}
	
	if (c == ',' && bypass == false)
	  // bypass ,
    sb.Append(' ');
	else
	  // Don't bypass ,
    sb.Append(c);	
}

// Display the new string
Console.WriteLine(sb.ToString());

Open in new window

0
 
LVL 20

Author Closing Comment

by:Russ Suter
ID: 41905921
I went with a modified version of Fernando's approach. It looks like this:

private List<string> ExtractIndividualColumnDefinitions(string columnDefinitions)
        {
            int parenLevel = 0;
            List<string> resultSet = new List<string>();
            string currentColumnDefinition = string.Empty;
            for (int i = 1; i < columnDefinitions.Length - 1; ++i)
            {
                if (columnDefinitions[i] == ',' && parenLevel == 0)
                {
                    resultSet.Add(currentColumnDefinition.Trim());
                    currentColumnDefinition = string.Empty;
                }
                else
                {
                    if (columnDefinitions[i] == '(')
                    {
                        ++parenLevel;
                    }
                    else if (columnDefinitions[i] == ')')
                    {
                        --parenLevel;
                    }
                    currentColumnDefinition += columnDefinitions[i];
                }
            }
            if (!string.IsNullOrEmpty(currentColumnDefinition))
            {
                resultSet.Add(currentColumnDefinition.Trim());
            }
            return resultSet;
        }

I'm still fairly sure there's a workable Regex solution to this problem but the Regex is probably more complex than my knowledge level and I just need to move on.
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Wouldn’t it be nice if you could test whether an element is contained in an array by using a Contains method just like the one available on List objects? Wouldn’t it be good if you could write code like this? (CODE) In .NET 3.5, this is possible…
The article shows the basic steps of integrating an HTML theme template into an ASP.NET MVC project
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question