Solved

C# remove trailing commas from csv file

Posted on 2016-10-26
10
67 Views
Last Modified: 2016-10-27
We get csv files that have been written by macros in Excel and have no control over the content.  so some rows have trailing commas, some have leading commas and some have both.

My goal is to remove these but I am having an issue either reading or writing the file and then applying the trimend, trimstart, etc.

So I get the file as a byte but you cannot perform those functions on a byte.  I get the file to a String array String mystring[] but you cannot perform that either once you File.ReadAllLines into the String array....StringBuilder has no "row" so it isn't like a DataTable where I can do a foreach (DataRow DR in DT.Rows.Count)
DR[0]....TrimEnd(Char) etc.

There HAS to be an easy way to do this right?  I am using file I/O native to the .NET framework version 4.51

Any help would be greatly appreciated.

Thanks
0
Comment
Question by:cyimxtck
  • 3
  • 2
  • 2
  • +2
10 Comments
 
LVL 40

Expert Comment

by:Kyle Abrahams
ID: 41861240
    string line;
    while ((line = reader.ReadLine()) != null)
    {
        line = line.Trim(new char[] {','});
        //output line to a NEW file.
    }

Open in new window

1
 
LVL 12

Expert Comment

by:Dustin Saunders
ID: 41861243
Can you give an example of a file and the expected output?

I'm not entirely sure I understand the reasoning, the commas indicate vacant spaces in a table structure, removing them destroys the ability to associate them with a specific column value.  Maybe there is a different way to accomplish your objective, with an example it might be more clear.
1
 

Author Comment

by:cyimxtck
ID: 41861257
I get what you are saying about the integrity of the file but the macro creates it with a header which is correct and then a bunch of "phantom" columns that are nothing more than commas that do not exist in header.

I just left the office so don't have a sample handy.

Kyle I will try your solution tomorrow morning to see what happens.

Thanks for the speedy replies
0
 
LVL 40

Assisted Solution

by:Kyle Abrahams
Kyle Abrahams earned 125 total points
ID: 41861272
Saunders brings up a good point though.  Trimming won't be enough.

After you trim you should get a count of commas, and ensure that you have the same count in the header row.  If not, you'll need to determine which commas to re-add.  (Probably taking away from the end first).

EG:
Header with 3 columns
A,          B,       C,
1,          2,       3,
,1,          2,       3,,
,,            ,          3,,,,,

ETC.
1
 
LVL 29

Assisted Solution

by:anarki_jimbel
anarki_jimbel earned 125 total points
ID: 41861463
I believe that there is some chance that commas at the end and at the start are valid - if "empty" values are valid. E.g.:
lastname, firstname, middlename
John, Smith,

Above the middlename is an empty string. This may also happen for "inside" values:
 firstname, middlename,lastname
John, ,Smith

If "empty" values are not allowed the task becomes easier. May be something like below:

        static void Main(string[] args)
        {
             string[] csvLines = new string[]{"name,price,quantity","plate,5,6",",spoon,2,6","kettle,33,1,",",knife,4,2,", ",bowl,,1,,7,,"};
            printSplit(csvLines);
            Console.ReadLine();

        }
        private static void printSplit(string[] csvLines)
        {
            foreach (string s in csvLines)
            {
                string[] values = s.Split(new char[]{','}, StringSplitOptions.RemoveEmptyEntries);
                foreach (string value in values)
                {
                    Console.Write(value + "\t");
                }
                Console.WriteLine();
            }
        }

Open in new window


0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 
LVL 44

Expert Comment

by:AndyAinscow
ID: 41861584
I agree with anarki_jimbel.
If you think about it empty cells still have to be exported to the csv file.
consider the following
A,B,C,D,E
2;3
p,q,r,s

What is the first column in the excel sheet?
A  2  p
A      p
A
   2  p
The correct answer from the excel sheet I envisage is in fact
       p

but all of the others were valid possibilities
0
 

Author Comment

by:cyimxtck
ID: 41862253
I totally get what you guys are saying and you have seemingly valid concerns BUT the array is jagged and the software we are handing this into doesn't take trailing commas; the process fails.

Here is the file which when opened in Excel looks fine.  Open it in notepad and you can see these trailing commas.  The number of trailing commas can vary but the issue is that if any trailing commas exist the entire process fails and managers are alerted.

We just need to remove the nth number of commas from this csv file and write it out "corrected" so we can process it.

PROP TYPE,TICKET,AUDIT,AUDIT TYPE,METERED ID,MONTH,TOTAL COST,STARTED,ENDED,FINALIZED
FUEL,ARK-2112,BYRON,EDIT,SWPA512,1/1/2017,91899,1/1/2017,12/31/2017,1102788
,,,,SWPA513,2/1/2017,91899,,,
,,,,SWPA514,3/1/2017,91899,,,
,,,,SWPA515,4/1/2017,91899,,,
,,,,SWPA516,5/1/2017,91899,,,
,,,,SWPA517,6/1/2017,91899,,,
,,,,SWPA518,7/1/2017,91899,,,
,,,,SWPA519,8/1/2017,91899,,,
,,,,SWPA520,9/1/2017,91899,,,
,,,,SWPA521,10/1/2017,91899,,,
,,,,SWPA522,11/1/2017,91899,,,
,,,,SWPA523,12/1/2017,91899,,,
FUEL,ARK-2112,ARLEEN,INSERT,NEPA12,1/1/2017,91899,1/1/2017,12/31/2017,1102788
,,,,NEPA13,2/1/2017,91899,,,
,,,,NEPA14,3/1/2017,91899,,,
,,,,NEPA15,4/1/2017,91899,,,
,,,,NEPA16,5/1/2017,91899,,,
,,,,NEPA17,6/1/2017,91899,,,
,,,,NEPA18,7/1/2017,91899,,,
,,,,NEPA19,8/1/2017,91899,,,
,,,,NEPA20,9/1/2017,91899,,,
,,,,NEPA21,10/1/2017,91899,,,
,,,,NEPA22,11/1/2017,91899,,,
,,,,NEPA23,12/1/2017,91899,,,

Leading commas are fine after further investigation (and the point to which you folks alluded just in a different way).

This upload can have 20 rows or 1K rows and is very similar as above.

Thanks for all the help so far!
0
 
LVL 12

Accepted Solution

by:
Dustin Saunders earned 250 total points
ID: 41862325
So it sounds like there are a certain number of columns that come with every record, and then some have multiple start/end/finalized dates that others don't?  If that's the case, then it can be that the trailing commas technically aren't needed.  But that seems like some bad code you might want to address in the macro.  You should push back on those coders.

Removing only the trailing commas is pretty simple, just use TrimEnd.

string line = ",,,,abc, def, efg, hik, jkl, mno, pqr,,,,";
            line = line.TrimEnd(',');

Open in new window


Result:
,,,,abc, def, efg, hik, jkl, mno, pqr

Open in new window


If you need to know how many commas were trimmed for whatever reason, you could do something like:
string line = ",,,,abc, def, efg, hik, jkl, mno, pqr,,,,";
            int i = line.Length;
            line = line.TrimEnd(',');
            i = i - line.Length;

Open in new window


or another way:
 int x = 0;
            while (line.Substring(line.Length - 1) == ",")
            {
                line = line.Substring(0, line.Length - 1);
                x++;
            }

Open in new window

0
 
LVL 29

Expert Comment

by:anarki_jimbel
ID: 41862946
PROP TYPE,TICKET,AUDIT,AUDIT TYPE,METERED ID,MONTH,TOTAL COST,STARTED,ENDED,FINALIZED
FUEL,ARK-2112,BYRON,EDIT,SWPA512,1/1/2017,91899,1/1/2017,12/31/2017,1102788
,,,,SWPA513,2/1/2017,91899,,,
,,,,SWPA514,3/1/2017,91899,,,
,,,,SWPA518,7/1/2017,91899,,,
,,,,SWPA519,8/1/2017,91899,,,
,,,,SWPA520,9/1/2017,91899,,,
,,,,SWPA521,10/1/2017,91899,,,
,,,,SWPA522,11/1/2017,91899,,,
,,,,SWPA523,12/1/2017,91899,,,
FUEL,ARK-2112,ARLEEN,INSERT,NEPA12,1/1/2017,91899,1/1/2017,12/31/2017,1102788
,,,,NEPA13,2/1/2017,91899,,,
,,,,NEPA14,3/1/2017,91899,,,

Hmmmmm...

May be I'm missing something but the above looks as a valid csv with 10 columns, and trailing commas   just show empty values...
1
 
LVL 12

Expert Comment

by:Dustin Saunders
ID: 41862978
@anarki I agree with you, the example provided is a normal CSV structure, and I think it would be better to pull these into a datatable or array of classes.  The only reason to trim would be if there are extra commas at the end that get added beyond the normal amount by the macro.

Good thing about CSV at least is that if only trailing commas are lost, you don't lose any information in terms of column correlation (any following columns would be NULL)
1

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

We all know that functional code is the leg that any good program stands on when it comes right down to it, however, if your program lacks a good user interface your product may not have the appeal needed to keep your customers happy. This issue can…
This document covers how to connect to SQL Server and browse its contents.  It is meant for those new to Visual Studio and/or working with Microsoft SQL Server.  It is not a guide to building SQL Server database connections in your code.  This is mo…
With the power of JIRA, there's an unlimited number of ways you can customize it, use it and benefit from it. With that in mind, there's bound to be things that I wasn't able to cover in this course. With this summary we'll look at some places to go…
Both in life and business – not all partnerships are created equal. As the demand for cloud services increases, so do the number of self-proclaimed cloud partners. Asking the right questions up front in the partnership, will enable both parties …

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now