C# remove trailing commas from csv file

We get csv files that have been written by macros in Excel and have no control over the content.  so some rows have trailing commas, some have leading commas and some have both.

My goal is to remove these but I am having an issue either reading or writing the file and then applying the trimend, trimstart, etc.

So I get the file as a byte but you cannot perform those functions on a byte.  I get the file to a String array String mystring[] but you cannot perform that either once you File.ReadAllLines into the String array....StringBuilder has no "row" so it isn't like a DataTable where I can do a foreach (DataRow DR in DT.Rows.Count)
DR[0]....TrimEnd(Char) etc.

There HAS to be an easy way to do this right?  I am using file I/O native to the .NET framework version 4.51

Any help would be greatly appreciated.

Thanks
LVL 1
cyimxtckAsked:
Who is Participating?
 
Dustin SaundersConnect With a Mentor Director of OperationsCommented:
So it sounds like there are a certain number of columns that come with every record, and then some have multiple start/end/finalized dates that others don't?  If that's the case, then it can be that the trailing commas technically aren't needed.  But that seems like some bad code you might want to address in the macro.  You should push back on those coders.

Removing only the trailing commas is pretty simple, just use TrimEnd.

string line = ",,,,abc, def, efg, hik, jkl, mno, pqr,,,,";
            line = line.TrimEnd(',');

Open in new window


Result:
,,,,abc, def, efg, hik, jkl, mno, pqr

Open in new window


If you need to know how many commas were trimmed for whatever reason, you could do something like:
string line = ",,,,abc, def, efg, hik, jkl, mno, pqr,,,,";
            int i = line.Length;
            line = line.TrimEnd(',');
            i = i - line.Length;

Open in new window


or another way:
 int x = 0;
            while (line.Substring(line.Length - 1) == ",")
            {
                line = line.Substring(0, line.Length - 1);
                x++;
            }

Open in new window

0
 
Kyle AbrahamsSenior .Net DeveloperCommented:
    string line;
    while ((line = reader.ReadLine()) != null)
    {
        line = line.Trim(new char[] {','});
        //output line to a NEW file.
    }

Open in new window

1
 
Dustin SaundersDirector of OperationsCommented:
Can you give an example of a file and the expected output?

I'm not entirely sure I understand the reasoning, the commas indicate vacant spaces in a table structure, removing them destroys the ability to associate them with a specific column value.  Maybe there is a different way to accomplish your objective, with an example it might be more clear.
1
Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

 
cyimxtckAuthor Commented:
I get what you are saying about the integrity of the file but the macro creates it with a header which is correct and then a bunch of "phantom" columns that are nothing more than commas that do not exist in header.

I just left the office so don't have a sample handy.

Kyle I will try your solution tomorrow morning to see what happens.

Thanks for the speedy replies
0
 
Kyle AbrahamsConnect With a Mentor Senior .Net DeveloperCommented:
Saunders brings up a good point though.  Trimming won't be enough.

After you trim you should get a count of commas, and ensure that you have the same count in the header row.  If not, you'll need to determine which commas to re-add.  (Probably taking away from the end first).

EG:
Header with 3 columns
A,          B,       C,
1,          2,       3,
,1,          2,       3,,
,,            ,          3,,,,,

ETC.
1
 
anarki_jimbelConnect With a Mentor Commented:
I believe that there is some chance that commas at the end and at the start are valid - if "empty" values are valid. E.g.:
lastname, firstname, middlename
John, Smith,

Above the middlename is an empty string. This may also happen for "inside" values:
 firstname, middlename,lastname
John, ,Smith

If "empty" values are not allowed the task becomes easier. May be something like below:

        static void Main(string[] args)
        {
             string[] csvLines = new string[]{"name,price,quantity","plate,5,6",",spoon,2,6","kettle,33,1,",",knife,4,2,", ",bowl,,1,,7,,"};
            printSplit(csvLines);
            Console.ReadLine();

        }
        private static void printSplit(string[] csvLines)
        {
            foreach (string s in csvLines)
            {
                string[] values = s.Split(new char[]{','}, StringSplitOptions.RemoveEmptyEntries);
                foreach (string value in values)
                {
                    Console.Write(value + "\t");
                }
                Console.WriteLine();
            }
        }

Open in new window


0
 
AndyAinscowFreelance programmer / ConsultantCommented:
I agree with anarki_jimbel.
If you think about it empty cells still have to be exported to the csv file.
consider the following
A,B,C,D,E
2;3
p,q,r,s

What is the first column in the excel sheet?
A  2  p
A      p
A
   2  p
The correct answer from the excel sheet I envisage is in fact
       p

but all of the others were valid possibilities
0
 
cyimxtckAuthor Commented:
I totally get what you guys are saying and you have seemingly valid concerns BUT the array is jagged and the software we are handing this into doesn't take trailing commas; the process fails.

Here is the file which when opened in Excel looks fine.  Open it in notepad and you can see these trailing commas.  The number of trailing commas can vary but the issue is that if any trailing commas exist the entire process fails and managers are alerted.

We just need to remove the nth number of commas from this csv file and write it out "corrected" so we can process it.

PROP TYPE,TICKET,AUDIT,AUDIT TYPE,METERED ID,MONTH,TOTAL COST,STARTED,ENDED,FINALIZED
FUEL,ARK-2112,BYRON,EDIT,SWPA512,1/1/2017,91899,1/1/2017,12/31/2017,1102788
,,,,SWPA513,2/1/2017,91899,,,
,,,,SWPA514,3/1/2017,91899,,,
,,,,SWPA515,4/1/2017,91899,,,
,,,,SWPA516,5/1/2017,91899,,,
,,,,SWPA517,6/1/2017,91899,,,
,,,,SWPA518,7/1/2017,91899,,,
,,,,SWPA519,8/1/2017,91899,,,
,,,,SWPA520,9/1/2017,91899,,,
,,,,SWPA521,10/1/2017,91899,,,
,,,,SWPA522,11/1/2017,91899,,,
,,,,SWPA523,12/1/2017,91899,,,
FUEL,ARK-2112,ARLEEN,INSERT,NEPA12,1/1/2017,91899,1/1/2017,12/31/2017,1102788
,,,,NEPA13,2/1/2017,91899,,,
,,,,NEPA14,3/1/2017,91899,,,
,,,,NEPA15,4/1/2017,91899,,,
,,,,NEPA16,5/1/2017,91899,,,
,,,,NEPA17,6/1/2017,91899,,,
,,,,NEPA18,7/1/2017,91899,,,
,,,,NEPA19,8/1/2017,91899,,,
,,,,NEPA20,9/1/2017,91899,,,
,,,,NEPA21,10/1/2017,91899,,,
,,,,NEPA22,11/1/2017,91899,,,
,,,,NEPA23,12/1/2017,91899,,,

Leading commas are fine after further investigation (and the point to which you folks alluded just in a different way).

This upload can have 20 rows or 1K rows and is very similar as above.

Thanks for all the help so far!
0
 
anarki_jimbelCommented:
PROP TYPE,TICKET,AUDIT,AUDIT TYPE,METERED ID,MONTH,TOTAL COST,STARTED,ENDED,FINALIZED
FUEL,ARK-2112,BYRON,EDIT,SWPA512,1/1/2017,91899,1/1/2017,12/31/2017,1102788
,,,,SWPA513,2/1/2017,91899,,,
,,,,SWPA514,3/1/2017,91899,,,
,,,,SWPA518,7/1/2017,91899,,,
,,,,SWPA519,8/1/2017,91899,,,
,,,,SWPA520,9/1/2017,91899,,,
,,,,SWPA521,10/1/2017,91899,,,
,,,,SWPA522,11/1/2017,91899,,,
,,,,SWPA523,12/1/2017,91899,,,
FUEL,ARK-2112,ARLEEN,INSERT,NEPA12,1/1/2017,91899,1/1/2017,12/31/2017,1102788
,,,,NEPA13,2/1/2017,91899,,,
,,,,NEPA14,3/1/2017,91899,,,

Hmmmmm...

May be I'm missing something but the above looks as a valid csv with 10 columns, and trailing commas   just show empty values...
1
 
Dustin SaundersDirector of OperationsCommented:
@anarki I agree with you, the example provided is a normal CSV structure, and I think it would be better to pull these into a datatable or array of classes.  The only reason to trim would be if there are extra commas at the end that get added beyond the normal amount by the macro.

Good thing about CSV at least is that if only trailing commas are lost, you don't lose any information in terms of column correlation (any following columns would be NULL)
1
All Courses

From novice to tech pro — start learning today.