Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Split string on commas but not when enclosed in parentheses

Posted on 2016-11-28
7
Medium Priority
?
108 Views
Last Modified: 2016-11-29
Given the following input:
	[COMPANY] [VARCHAR](64) NULL,
	[BANKCODE] [CHAR](4) NULL,
	[OPENDATE] [DATETIME] NULL,
	[REFERENCENUMBER] [NUMERIC](13, 0) NULL,
	[VENDORNAME] [CHAR](45) NULL,
	[AMOUNT] [NUMERIC](13, 2) NULL,
	[EMAILADDRESS] [VARCHAR](256) NULL

Open in new window

A basic split would be like this:
            string[] columnDefinitions = inputString.Split(new char[] {','});
            foreach (string s in columnDefinitions)
            {
                Console.WriteLine(s.Trim());
            }

Open in new window

But that splits on every comma and produces the following output (each line represents a single item in the array):
[COMPANY] [VARCHAR](64) NULL
[BANKCODE] [CHAR](4) NULL
[OPENDATE] [DATETIME] NULL
[REFERENCENUMBER] [NUMERIC](13
0) NULL
[VENDORNAME] [CHAR](45) NULL
[AMOUNT] [NUMERIC](13
2) NULL
[EMAILADDRESS] [VARCHAR](256) NULL

Open in new window

What I really want is for it to only split on commas that are not inside parentheses. I'm happy to use Regex for this. I'm sure there's a fairly easy pattern for it. The desired output would be a string array containing this:
[COMPANY] [VARCHAR](64) NULL
[BANKCODE] [CHAR](4) NULL
[OPENDATE] [DATETIME] NULL
[REFERENCENUMBER] [NUMERIC](13, 0) NULL
[VENDORNAME] [CHAR](45) NULL
[AMOUNT] [NUMERIC](13, 2) NULL
[EMAILADDRESS] [VARCHAR](256) NULL

Open in new window

0
Comment
Question by:Russ Suter
7 Comments
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 41904879
I'm sure I'm missing something.

Why don't you just split on \n? (endline character).

HTH,
Dan
0
 
LVL 20

Author Comment

by:Russ Suter
ID: 41904921
Indeed you are missing something. SQL works perfectly well if all statements are on a single line. The following 3 statements are identical as far as SQL is concerned:
CREATE TABLE FOO(
	[COMPANY] [VARCHAR](64) NULL,
	[BANKCODE] [CHAR](4) NULL
)

Open in new window

CREATE TABLE FOO(	[COMPANY] [VARCHAR](64) NULL, [BANKCODE] [CHAR](4) NULL)

Open in new window

CREATE TABLE FOO(	[COMPANY] [VARCHAR](64) NULL,[BANKCODE] [CHAR](4) NULL)

Open in new window

Note in the 3rd example there need not even be whitespace between the column definitions. I have no way of guaranteeing that the input will be formatted in any specific way so the split needs to strictly adhere to SQL parsing rules.
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 41904930
Try this:
(\[.*?\)\s?(?:NULL)?),
The results should be in group 1.
regex
0
Prepare for your VMware VCP6-DCV exam.

Josh Coen and Jason Langer have prepared the latest edition of VCP study guide. Both authors have been working in the IT field for more than a decade, and both hold VMware certifications. This 163-page guide covers all 10 of the exam blueprint sections.

 
LVL 20

Author Comment

by:Russ Suter
ID: 41904954
There's no guarantee that the word NULL will be there either.
0
 
LVL 52

Expert Comment

by:Gustav Brock
ID: 41905339
You could just parse the SQL skipping the pertinent commas:
string sql = "[COMPANY] [VARCHAR](64) NULL,[BANKCODE] [CHAR](4) NULL,[OPENDATE] [DATETIME] NULL, 	[REFERENCENUMBER] [NUMERIC] (13, 0) NULL, 	[VENDORNAME] [CHAR] (45) NULL, 	[AMOUNT] [NUMERIC] (13, 2) NULL, 	[EMAILADDRESS] [VARCHAR] (256) NULL";

System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
byte[] bytes = encoding.GetBytes(sql);
int part = 0;
List<string> sqlParts = new List<string>();
bool skipComma = false;

foreach (byte b in bytes)
{
	if (sqlParts.Count != part+1)
	{
		sqlParts.Add(string.Empty);		
	}
        // Skip splitting by comma if we are inside a set of parenthesis.
	if (b == 40)
	{
		skipComma = true;
	}
	else if (b == 41)
	{
		skipComma=false;
	}

        // Cut the comma and add a new line of SQL 
        // or append the char to the current line of SQL.
	if (skipComma == false && b == 44)
	{
		part++;
	}
	else
	{
		sqlParts[part] = (sqlParts[part] += Convert.ToChar(b)).TrimStart();
	}
}

// List the SQL lines.
for (int i = 0; i <= part; i++)
{
	sqlParts[i].Dump();
} 

Open in new window

This will produce:
[COMPANY] [VARCHAR](64) NULL
[BANKCODE] [CHAR](4) NULL
[OPENDATE] [DATETIME] NULL
[REFERENCENUMBER] [NUMERIC] (13, 0) NULL
[VENDORNAME] [CHAR] (45) NULL
[AMOUNT] [NUMERIC] (13, 2) NULL
[EMAILADDRESS] [VARCHAR] (256) NULL

Open in new window

/gustav
0
 
LVL 64

Accepted Solution

by:
Fernando Soto earned 2000 total points
ID: 41905828
Hi Russ;

This code snippet should do what you need.
// Input string
var input = "[COMPANY] [VARCHAR](64) NULL, [BANKCODE] [CHAR](4) NULL, [OPENDATE] [DATETIME] NULL, [REFERENCENUMBER] [NUMERIC](13, 0) NULL, [VENDORNAME] [CHAR](45) NULL, [AMOUNT] [NUMERIC](13, 2) NULL, [EMAILADDRESS] [VARCHAR](256) NULL";
// Build new string
StringBuilder sb = new StringBuilder();
// Used to bypass the , inside of ( ... )
bool bypass = false;

foreach(var c in input) {
  // Switch on or off bypass depending on character ( or )
	if (c == '(' || c == ')') {
    bypass = !bypass;
    sb.Append(c);
    continue;
	}
	
	if (c == ',' && bypass == false)
	  // bypass ,
    sb.Append(' ');
	else
	  // Don't bypass ,
    sb.Append(c);	
}

// Display the new string
Console.WriteLine(sb.ToString());

Open in new window

0
 
LVL 20

Author Closing Comment

by:Russ Suter
ID: 41905921
I went with a modified version of Fernando's approach. It looks like this:

private List<string> ExtractIndividualColumnDefinitions(string columnDefinitions)
        {
            int parenLevel = 0;
            List<string> resultSet = new List<string>();
            string currentColumnDefinition = string.Empty;
            for (int i = 1; i < columnDefinitions.Length - 1; ++i)
            {
                if (columnDefinitions[i] == ',' && parenLevel == 0)
                {
                    resultSet.Add(currentColumnDefinition.Trim());
                    currentColumnDefinition = string.Empty;
                }
                else
                {
                    if (columnDefinitions[i] == '(')
                    {
                        ++parenLevel;
                    }
                    else if (columnDefinitions[i] == ')')
                    {
                        --parenLevel;
                    }
                    currentColumnDefinition += columnDefinitions[i];
                }
            }
            if (!string.IsNullOrEmpty(currentColumnDefinition))
            {
                resultSet.Add(currentColumnDefinition.Trim());
            }
            return resultSet;
        }

I'm still fairly sure there's a workable Regex solution to this problem but the Regex is probably more complex than my knowledge level and I just need to move on.
0

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Calculating holidays and working days is a function that is often needed yet it is not one found within the Framework. This article presents one approach to building a working-day calculator for use in .NET.
Hello there! As a developer I have modified and refactored the unit tests which was written by fellow developers in the past. On the course, I have gone through various misconceptions and technical challenges when it comes to implementation. I would…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Suggested Courses

783 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question