Link to home
Start Free TrialLog in
Avatar of suresh pondicherry
suresh pondicherryFlag for United States of America

asked on

regular expression to ignore comma inside the double quoted strings in CSV file

Hi,
I need the regular expression to ignore comma inside the double quotes in CSV file. I am using this expression and it is not working for one particular row in the CSV file.
 am not able to continue after 38th row.

Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);

Kind regards,
Pooja


Avatar of nepaluz
nepaluz
Flag of United Kingdom of Great Britain and Northern Ireland image

I have an article on parising a CSV file using regex and LINQ here:
http://www.experts-exchange.com/Programming/Languages/.NET/Visual_Basic.NET/A_7329-Parsing-a-CSV-file-into-a-datatable-using-LINQ-and-Regex.html
The regex pattern used is:
Dim pattern As String = ",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))"

Open in new window

Avatar of suresh pondicherry

ASKER

hi nepaluz,
The expression is not working for me.

Kind regards,
pooja
Avatar of kaufmed
Why not use TextFieldParser?
give us an example of the file you are trying to parse
Hi Kaufmed,
How to use this in C#.
My csv file has comma inside the double quotes . I need to  ignore comma inside the double quotes.

please find the sample csv file.


Kind regards,
Pooja
test.csv
I posted an example here:  https://www.experts-exchange.com/questions/26390451/replace-commas-in-the-double-quotes-so-that-I-am-able-to-use-String-Split.html#33395609 . Please make note of the reference mentioned as well. You will need that in order to use the class.
Hi Kaufmed,
How to use Microsoft.VisualBasic.FileIO in this wpf project. Can you send me link to get the dll so that i can reference it in my project.

Kind regards,
Pooja
Do Project->Add Reference. Go to .NET tab and scroll to Microsoft.VisualBasic.dll.
Hi Kaufmed,
My csv file has comma inside the double quotes . I need to  ignore comma inside the double quotes.

please find the sample csv file.


Kind regards,
Pooja
Sample.csv
Hi Kaufmed,
I need to store the csv into C# object. I can send my code if you need it.

Kind regards,
Pooja
Your CSV isn't quite what I am accustomed to seeing in a CSV. Firstly, you have quotes at the start and end of each line (except for the header). This basically makes the line one field. Secondly, you have double-quotes internal to the line. I have a feeling this would screw up CSV parsing.

Try this logic for dealing with the file as you have posted it:

using System;
using System.Data;
using Microsoft.VisualBasic.FileIO;
using System.IO;

namespace _27527330
{
    class Program
    {
        static void Main(string[] args)
        {
            DataTable dt = new DataTable();
            string tempFile = Path.GetTempFileName();

            using (StreamReader reader = new StreamReader("Sample.csv"))
            {
                using (StreamWriter writer = new StreamWriter(tempFile))
                {
                    while (!reader.EndOfStream)
                    {
                        string line = reader.ReadLine();

                        line = line.Replace("\"\"", "\"");
                        line = line.Substring(1, line.Length - 2);
                        writer.WriteLine(line);
                    }
                }
            }

            using (TextFieldParser parser = new TextFieldParser(tempFile))
            {
                parser.TextFieldType = FieldType.Delimited;
                parser.Delimiters = new string[] { "," };
                parser.HasFieldsEnclosedInQuotes = true;

                string[] columns = parser.ReadFields();

                for (int i = 0; i < columns.Length; i++)
                {
                    dt.Columns.Add(columns[i], typeof(string));
                }

                while (!parser.EndOfData)
                {
                    string[] fields = parser.ReadFields();
                    DataRow newRow = dt.NewRow();

                    for (int i = 0; i < fields.Length; i++)
                    {
                        newRow[i] = fields[i];
                    }

                    dt.Rows.Add(newRow);
                }
            }
        }
    }
}

Open in new window

P.S.

You mentioned putting this data into a C# object. I used a DataTable above. You can use what you like. The above should give you a good start.
Hi Kaufmed,
I am working on TextFieldParser . So far it is not working for  me (for this CSV file). I need some time and in the mean time ,if you able to get  working code , please help me,

Kind regards,
Pooja
So far it is not working for  me (for this CSV file).
That's because of the leading and trailing quotes I mentioned above. Please see http:#37411237 .
Hi Kaufmed,
Also there are multiple comma within the same quoted string. Please find the sample csv reader.


Kind regards,
Pooja
sample.csv
Also there are multiple comma within the same quoted string.
That shouldn't matter.

The newest file you posted does not match the format as your previous submission. How are you generating these files?
Hi Kaufmed,
I am generating CSV file from Java code. The newest file is the correct one. We can see comma inside the double quotes.


Kind regards,
Pooja
But the newest file just isn't properly formatted for a CSV file. Take line 39 for example. In the very last field, you begin a quoted field, but you don't close it. Line 41 appears to begin with a quoted field, but there is no opening quote. You CSV will need to be valid in order to use TextFieldParser or even the previously mentioned regex approach.
Hi Kaufmed,
This file is used in all the applications. Java applications able to parse it and create objects. We don't think there is any wrong in this csv file.

Also i could see the last field is quoted well. ("User=root (0), Permissions=-rw-rw-r-- (100664)")

Please find the last line..
"172.19.254.27",AIX Server,"IBM AIX 5.3 High Security Control Catalog - NIST SP 800-53 Rev 3 - FISMA","AC-3.53 Verify inetd.conf File Permissions","Verify inetd.conf File Permissions","The inetd, called also the super server, will load a network program based upon a request from the network. The inetd.conf file tells inetd which ports to listen to and what server to start for each port. This test verifies that the /etc/inetd.conf file has permissions set to 440 or more restrictive and that it is owned by 'root' or 'bin'.

Proper permissions help to prevent unauthorized access or modification of the file.","/etc/inetd.conf",8/22/11 5:28 PM,failed,"User=root (0), Permissions=-rw-rw-r-- (100664)"


I am glad that you are helping me.

Kind regards,
Pooja
OK. I see what's going on. You have embedded newlines in those fields. That is rather peculiar for a CSV file--at least from what I am accustomed to seeing. I'm not sure TextFieldParser will be able to handle this. I'll give it some thought.
ok Kaufmed.

Thanks,
Pooja
ASKER CERTIFIED SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks Kaufmed