Solved

Split string on commas but not when enclosed in parentheses

Posted on 2016-11-28
7
89 Views
Last Modified: 2016-11-29
Given the following input:
	[COMPANY] [VARCHAR](64) NULL,
	[BANKCODE] [CHAR](4) NULL,
	[OPENDATE] [DATETIME] NULL,
	[REFERENCENUMBER] [NUMERIC](13, 0) NULL,
	[VENDORNAME] [CHAR](45) NULL,
	[AMOUNT] [NUMERIC](13, 2) NULL,
	[EMAILADDRESS] [VARCHAR](256) NULL

Open in new window

A basic split would be like this:
            string[] columnDefinitions = inputString.Split(new char[] {','});
            foreach (string s in columnDefinitions)
            {
                Console.WriteLine(s.Trim());
            }

Open in new window

But that splits on every comma and produces the following output (each line represents a single item in the array):
[COMPANY] [VARCHAR](64) NULL
[BANKCODE] [CHAR](4) NULL
[OPENDATE] [DATETIME] NULL
[REFERENCENUMBER] [NUMERIC](13
0) NULL
[VENDORNAME] [CHAR](45) NULL
[AMOUNT] [NUMERIC](13
2) NULL
[EMAILADDRESS] [VARCHAR](256) NULL

Open in new window

What I really want is for it to only split on commas that are not inside parentheses. I'm happy to use Regex for this. I'm sure there's a fairly easy pattern for it. The desired output would be a string array containing this:
[COMPANY] [VARCHAR](64) NULL
[BANKCODE] [CHAR](4) NULL
[OPENDATE] [DATETIME] NULL
[REFERENCENUMBER] [NUMERIC](13, 0) NULL
[VENDORNAME] [CHAR](45) NULL
[AMOUNT] [NUMERIC](13, 2) NULL
[EMAILADDRESS] [VARCHAR](256) NULL

Open in new window

0
Comment
Question by:Russ Suter
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
7 Comments
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 41904879
I'm sure I'm missing something.

Why don't you just split on \n? (endline character).

HTH,
Dan
0
 
LVL 20

Author Comment

by:Russ Suter
ID: 41904921
Indeed you are missing something. SQL works perfectly well if all statements are on a single line. The following 3 statements are identical as far as SQL is concerned:
CREATE TABLE FOO(
	[COMPANY] [VARCHAR](64) NULL,
	[BANKCODE] [CHAR](4) NULL
)

Open in new window

CREATE TABLE FOO(	[COMPANY] [VARCHAR](64) NULL, [BANKCODE] [CHAR](4) NULL)

Open in new window

CREATE TABLE FOO(	[COMPANY] [VARCHAR](64) NULL,[BANKCODE] [CHAR](4) NULL)

Open in new window

Note in the 3rd example there need not even be whitespace between the column definitions. I have no way of guaranteeing that the input will be formatted in any specific way so the split needs to strictly adhere to SQL parsing rules.
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 41904930
Try this:
(\[.*?\)\s?(?:NULL)?),
The results should be in group 1.
regex
0
PeopleSoft Has Never Been Easier

PeopleSoft Adoption Made Smooth & Simple!

On-The-Job Training Is made Intuitive & Easy With WalkMe's On-Screen Guidance Tool.  Claim Your Free WalkMe Account Now

 
LVL 20

Author Comment

by:Russ Suter
ID: 41904954
There's no guarantee that the word NULL will be there either.
0
 
LVL 51

Expert Comment

by:Gustav Brock
ID: 41905339
You could just parse the SQL skipping the pertinent commas:
string sql = "[COMPANY] [VARCHAR](64) NULL,[BANKCODE] [CHAR](4) NULL,[OPENDATE] [DATETIME] NULL, 	[REFERENCENUMBER] [NUMERIC] (13, 0) NULL, 	[VENDORNAME] [CHAR] (45) NULL, 	[AMOUNT] [NUMERIC] (13, 2) NULL, 	[EMAILADDRESS] [VARCHAR] (256) NULL";

System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
byte[] bytes = encoding.GetBytes(sql);
int part = 0;
List<string> sqlParts = new List<string>();
bool skipComma = false;

foreach (byte b in bytes)
{
	if (sqlParts.Count != part+1)
	{
		sqlParts.Add(string.Empty);		
	}
        // Skip splitting by comma if we are inside a set of parenthesis.
	if (b == 40)
	{
		skipComma = true;
	}
	else if (b == 41)
	{
		skipComma=false;
	}

        // Cut the comma and add a new line of SQL 
        // or append the char to the current line of SQL.
	if (skipComma == false && b == 44)
	{
		part++;
	}
	else
	{
		sqlParts[part] = (sqlParts[part] += Convert.ToChar(b)).TrimStart();
	}
}

// List the SQL lines.
for (int i = 0; i <= part; i++)
{
	sqlParts[i].Dump();
} 

Open in new window

This will produce:
[COMPANY] [VARCHAR](64) NULL
[BANKCODE] [CHAR](4) NULL
[OPENDATE] [DATETIME] NULL
[REFERENCENUMBER] [NUMERIC] (13, 0) NULL
[VENDORNAME] [CHAR] (45) NULL
[AMOUNT] [NUMERIC] (13, 2) NULL
[EMAILADDRESS] [VARCHAR] (256) NULL

Open in new window

/gustav
0
 
LVL 63

Accepted Solution

by:
Fernando Soto earned 500 total points
ID: 41905828
Hi Russ;

This code snippet should do what you need.
// Input string
var input = "[COMPANY] [VARCHAR](64) NULL, [BANKCODE] [CHAR](4) NULL, [OPENDATE] [DATETIME] NULL, [REFERENCENUMBER] [NUMERIC](13, 0) NULL, [VENDORNAME] [CHAR](45) NULL, [AMOUNT] [NUMERIC](13, 2) NULL, [EMAILADDRESS] [VARCHAR](256) NULL";
// Build new string
StringBuilder sb = new StringBuilder();
// Used to bypass the , inside of ( ... )
bool bypass = false;

foreach(var c in input) {
  // Switch on or off bypass depending on character ( or )
	if (c == '(' || c == ')') {
    bypass = !bypass;
    sb.Append(c);
    continue;
	}
	
	if (c == ',' && bypass == false)
	  // bypass ,
    sb.Append(' ');
	else
	  // Don't bypass ,
    sb.Append(c);	
}

// Display the new string
Console.WriteLine(sb.ToString());

Open in new window

0
 
LVL 20

Author Closing Comment

by:Russ Suter
ID: 41905921
I went with a modified version of Fernando's approach. It looks like this:

private List<string> ExtractIndividualColumnDefinitions(string columnDefinitions)
        {
            int parenLevel = 0;
            List<string> resultSet = new List<string>();
            string currentColumnDefinition = string.Empty;
            for (int i = 1; i < columnDefinitions.Length - 1; ++i)
            {
                if (columnDefinitions[i] == ',' && parenLevel == 0)
                {
                    resultSet.Add(currentColumnDefinition.Trim());
                    currentColumnDefinition = string.Empty;
                }
                else
                {
                    if (columnDefinitions[i] == '(')
                    {
                        ++parenLevel;
                    }
                    else if (columnDefinitions[i] == ')')
                    {
                        --parenLevel;
                    }
                    currentColumnDefinition += columnDefinitions[i];
                }
            }
            if (!string.IsNullOrEmpty(currentColumnDefinition))
            {
                resultSet.Add(currentColumnDefinition.Trim());
            }
            return resultSet;
        }

I'm still fairly sure there's a workable Regex solution to this problem but the Regex is probably more complex than my knowledge level and I just need to move on.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction Hi all and welcome to my first article on Experts Exchange. A while ago, someone asked me if i could do some tutorials on object oriented programming. I decided to do them on C#. Now you may ask me, why's that? Well, one of the re…
More often than not, we developers are confronted with a need: a need to make some kind of magic happen via code. Whether it is for a client, for the boss, or for our own personal projects, the need must be satisfied. Most of the time, the Framework…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

632 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question