Solved

C# remove trailing commas from csv file

Posted on 2016-10-26
10
44 Views
Last Modified: 2016-10-27
We get csv files that have been written by macros in Excel and have no control over the content.  so some rows have trailing commas, some have leading commas and some have both.

My goal is to remove these but I am having an issue either reading or writing the file and then applying the trimend, trimstart, etc.

So I get the file as a byte but you cannot perform those functions on a byte.  I get the file to a String array String mystring[] but you cannot perform that either once you File.ReadAllLines into the String array....StringBuilder has no "row" so it isn't like a DataTable where I can do a foreach (DataRow DR in DT.Rows.Count)
DR[0]....TrimEnd(Char) etc.

There HAS to be an easy way to do this right?  I am using file I/O native to the .NET framework version 4.51

Any help would be greatly appreciated.

Thanks
0
Comment
Question by:cyimxtck
  • 3
  • 2
  • 2
  • +2
10 Comments
 
LVL 39

Expert Comment

by:Kyle Abrahams
Comment Utility
    string line;
    while ((line = reader.ReadLine()) != null)
    {
        line = line.Trim(new char[] {','});
        //output line to a NEW file.
    }

Open in new window

1
 
LVL 12

Expert Comment

by:Dustin Saunders
Comment Utility
Can you give an example of a file and the expected output?

I'm not entirely sure I understand the reasoning, the commas indicate vacant spaces in a table structure, removing them destroys the ability to associate them with a specific column value.  Maybe there is a different way to accomplish your objective, with an example it might be more clear.
1
 

Author Comment

by:cyimxtck
Comment Utility
I get what you are saying about the integrity of the file but the macro creates it with a header which is correct and then a bunch of "phantom" columns that are nothing more than commas that do not exist in header.

I just left the office so don't have a sample handy.

Kyle I will try your solution tomorrow morning to see what happens.

Thanks for the speedy replies
0
 
LVL 39

Assisted Solution

by:Kyle Abrahams
Kyle Abrahams earned 125 total points
Comment Utility
Saunders brings up a good point though.  Trimming won't be enough.

After you trim you should get a count of commas, and ensure that you have the same count in the header row.  If not, you'll need to determine which commas to re-add.  (Probably taking away from the end first).

EG:
Header with 3 columns
A,          B,       C,
1,          2,       3,
,1,          2,       3,,
,,            ,          3,,,,,

ETC.
1
 
LVL 29

Assisted Solution

by:anarki_jimbel
anarki_jimbel earned 125 total points
Comment Utility
I believe that there is some chance that commas at the end and at the start are valid - if "empty" values are valid. E.g.:
lastname, firstname, middlename
John, Smith,

Above the middlename is an empty string. This may also happen for "inside" values:
 firstname, middlename,lastname
John, ,Smith

If "empty" values are not allowed the task becomes easier. May be something like below:

        static void Main(string[] args)
        {
             string[] csvLines = new string[]{"name,price,quantity","plate,5,6",",spoon,2,6","kettle,33,1,",",knife,4,2,", ",bowl,,1,,7,,"};
            printSplit(csvLines);
            Console.ReadLine();

        }
        private static void printSplit(string[] csvLines)
        {
            foreach (string s in csvLines)
            {
                string[] values = s.Split(new char[]{','}, StringSplitOptions.RemoveEmptyEntries);
                foreach (string value in values)
                {
                    Console.Write(value + "\t");
                }
                Console.WriteLine();
            }
        }

Open in new window


0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 
LVL 44

Expert Comment

by:AndyAinscow
Comment Utility
I agree with anarki_jimbel.
If you think about it empty cells still have to be exported to the csv file.
consider the following
A,B,C,D,E
2;3
p,q,r,s

What is the first column in the excel sheet?
A  2  p
A      p
A
   2  p
The correct answer from the excel sheet I envisage is in fact
       p

but all of the others were valid possibilities
0
 

Author Comment

by:cyimxtck
Comment Utility
I totally get what you guys are saying and you have seemingly valid concerns BUT the array is jagged and the software we are handing this into doesn't take trailing commas; the process fails.

Here is the file which when opened in Excel looks fine.  Open it in notepad and you can see these trailing commas.  The number of trailing commas can vary but the issue is that if any trailing commas exist the entire process fails and managers are alerted.

We just need to remove the nth number of commas from this csv file and write it out "corrected" so we can process it.

PROP TYPE,TICKET,AUDIT,AUDIT TYPE,METERED ID,MONTH,TOTAL COST,STARTED,ENDED,FINALIZED
FUEL,ARK-2112,BYRON,EDIT,SWPA512,1/1/2017,91899,1/1/2017,12/31/2017,1102788
,,,,SWPA513,2/1/2017,91899,,,
,,,,SWPA514,3/1/2017,91899,,,
,,,,SWPA515,4/1/2017,91899,,,
,,,,SWPA516,5/1/2017,91899,,,
,,,,SWPA517,6/1/2017,91899,,,
,,,,SWPA518,7/1/2017,91899,,,
,,,,SWPA519,8/1/2017,91899,,,
,,,,SWPA520,9/1/2017,91899,,,
,,,,SWPA521,10/1/2017,91899,,,
,,,,SWPA522,11/1/2017,91899,,,
,,,,SWPA523,12/1/2017,91899,,,
FUEL,ARK-2112,ARLEEN,INSERT,NEPA12,1/1/2017,91899,1/1/2017,12/31/2017,1102788
,,,,NEPA13,2/1/2017,91899,,,
,,,,NEPA14,3/1/2017,91899,,,
,,,,NEPA15,4/1/2017,91899,,,
,,,,NEPA16,5/1/2017,91899,,,
,,,,NEPA17,6/1/2017,91899,,,
,,,,NEPA18,7/1/2017,91899,,,
,,,,NEPA19,8/1/2017,91899,,,
,,,,NEPA20,9/1/2017,91899,,,
,,,,NEPA21,10/1/2017,91899,,,
,,,,NEPA22,11/1/2017,91899,,,
,,,,NEPA23,12/1/2017,91899,,,

Leading commas are fine after further investigation (and the point to which you folks alluded just in a different way).

This upload can have 20 rows or 1K rows and is very similar as above.

Thanks for all the help so far!
0
 
LVL 12

Accepted Solution

by:
Dustin Saunders earned 250 total points
Comment Utility
So it sounds like there are a certain number of columns that come with every record, and then some have multiple start/end/finalized dates that others don't?  If that's the case, then it can be that the trailing commas technically aren't needed.  But that seems like some bad code you might want to address in the macro.  You should push back on those coders.

Removing only the trailing commas is pretty simple, just use TrimEnd.

string line = ",,,,abc, def, efg, hik, jkl, mno, pqr,,,,";
            line = line.TrimEnd(',');

Open in new window


Result:
,,,,abc, def, efg, hik, jkl, mno, pqr

Open in new window


If you need to know how many commas were trimmed for whatever reason, you could do something like:
string line = ",,,,abc, def, efg, hik, jkl, mno, pqr,,,,";
            int i = line.Length;
            line = line.TrimEnd(',');
            i = i - line.Length;

Open in new window


or another way:
 int x = 0;
            while (line.Substring(line.Length - 1) == ",")
            {
                line = line.Substring(0, line.Length - 1);
                x++;
            }

Open in new window

0
 
LVL 29

Expert Comment

by:anarki_jimbel
Comment Utility
PROP TYPE,TICKET,AUDIT,AUDIT TYPE,METERED ID,MONTH,TOTAL COST,STARTED,ENDED,FINALIZED
FUEL,ARK-2112,BYRON,EDIT,SWPA512,1/1/2017,91899,1/1/2017,12/31/2017,1102788
,,,,SWPA513,2/1/2017,91899,,,
,,,,SWPA514,3/1/2017,91899,,,
,,,,SWPA518,7/1/2017,91899,,,
,,,,SWPA519,8/1/2017,91899,,,
,,,,SWPA520,9/1/2017,91899,,,
,,,,SWPA521,10/1/2017,91899,,,
,,,,SWPA522,11/1/2017,91899,,,
,,,,SWPA523,12/1/2017,91899,,,
FUEL,ARK-2112,ARLEEN,INSERT,NEPA12,1/1/2017,91899,1/1/2017,12/31/2017,1102788
,,,,NEPA13,2/1/2017,91899,,,
,,,,NEPA14,3/1/2017,91899,,,

Hmmmmm...

May be I'm missing something but the above looks as a valid csv with 10 columns, and trailing commas   just show empty values...
1
 
LVL 12

Expert Comment

by:Dustin Saunders
Comment Utility
@anarki I agree with you, the example provided is a normal CSV structure, and I think it would be better to pull these into a datatable or array of classes.  The only reason to trim would be if there are extra commas at the end that get added beyond the normal amount by the macro.

Good thing about CSV at least is that if only trailing commas are lost, you don't lose any information in terms of column correlation (any following columns would be NULL)
1

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Wpf develop 5 29
Refresh Datagridview from another form 1 25
Hide Tab Page 3 18
Why don't I see this table in EDMX file? 2 19
This article describes relatively difficult and non-obvious issues that are likely to arise when creating COM class in Visual Studio and deploying it by professional MSI-authoring tools. It is assumed that the reader is already familiar with the cla…
A long time ago (May 2011), I have written an article showing you how to create a DLL using Visual Studio 2005 to be hosted in SQL Server 2005. That was valid at that time and it is still valid if you are still using these versions. You can still re…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now