Solved

C# -- remove comma when reading TXT file ?

Posted on 2011-03-04
17
788 Views
Last Modified: 2013-12-17
The below code reads in my below "John Doe" variance.txt correctly.

NAME,FILE#,SSN,CHECKVIEWPAYDATE,Total_Hrs,CHECKVIEWGROSSPAYA,HOMEDEPARTMENT
"John Doe",229999,9263,12/06/2010,56,278.72,104002
"Jane Doe",181,6331,12/06/2010,40,74,101301
"John Smith",1041,1207,12/06/2010,40,12,101001
----------------------------------------------------------------
How can I change the code to remove the , or replacing it with
a space or something so it reads in the below "John, Doe"
as one column ?

NAME,FILE#,SSN,CHECKVIEWPAYDATE,Total_Hrs,CHECKVIEWGROSSPAYA,HOMEDEPARTMENT
"John, Doe",229999,9263,12/06/2010,56,278.72,104002
"Jane, Doe",181,6331,12/06/2010,40,74,101301
"John, Smith",1041,1207,12/06/2010,40,12,101001

----------------------------------------------------------------
        private DataSet ReadFile(string filePath)
        {
            OleDbConnection conn = new OleDbConnection("Provider=Microsoft.Jet.OleDb.4.0; Data Source = " + System.IO.Path.GetDirectoryName(filePath) + "; Extended Properties = \"Text;HDR=YES;FMT=Delimited\"");
            conn.Open();
            OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM " + System.IO.Path.GetFileName(filePath), conn);
            DataSet ds = new DataSet("Temp");
            adapter.Fill(ds);

            return ds;
        }
----------------------------------------------------------------
        public bool ImportFileVariance()
        {
            string filePath = @"C:\import\variance.txt";

            DataSet ds = ReadFile(filePath);

            return dbAccess.ImportVarianceRecords(ds.Tables[0]);
        }
----------------------------------------------------------------
0
Comment
Question by:finance_teacher
  • 8
  • 3
  • 2
  • +3
17 Comments
 
LVL 9

Expert Comment

by:rawinnlnx9
ID: 35038313
Read the text into a string.


string txtFileContents = "";

// execute the code to open the file and serialize it to a string.

//Now use .Replace()

txtFileContents = txtFileContents.Replace(",", "_"); // the underscore can be a space or anything you want.

Open in new window

0
 
LVL 62

Expert Comment

by:Fernando Soto
ID: 35038376
Hi finance_teacher;

This code snippet should give you the results you want.

// Reading data from a text file.
string data = System.IO.File.ReadAllText( "TestData.txt" );
// Remove all commas that are in between quotes
data = System.Text.RegularExpressions.Regex.Replace( data, @"(\x22.*)(,)(.*\x22)", "$1$3" );

Open in new window


Fernando
0
 
LVL 7

Expert Comment

by:dimaj
ID: 35038409
You could also use a Split() function.

Basically, do something like this:
string[] parts = curLine.Split(',');

That should give you an array of every field. Then you can manipulate your fields the way you want to.
0
 

Author Comment

by:finance_teacher
ID: 35038598
How can I get the below "data" string into my "ds" DataSet ?

        public bool ImportFileVariance()
        {
            string filePath = @"C:\import\variance.txt";
            string data = System.IO.File.ReadAllText(filePath);

           //below works, replacing , with space
            data = System.Text.RegularExpressions.Regex.Replace(data, @"(\x22.*)(,)(.*\x22)", "$1$3");
   
            //errors when running below step
            DataSet ds = ReadFile(data);
            return dbAccess.ImportVarianceRecords(ds.Tables[0]);
        }
0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35038937
What does your original CSV text file look like (your question is a little unclear):
Doe,John,1234,123-45-6789, etc
John,Doe,1234,123-45-6789, etc
John Doe,1234,123-45-6789, etc
"Doe", "John",1234,123-45-6789, etc
"Doe, John",1234,123-45-6789, etc
"John", "Doe",1234,123-45-6789, etc
"John, Doe",1234,123-45-6789, etc

And to be clear you want to "transform" it like below?
"John Doe",1234,123-45-6789, etc

0
 

Author Comment

by:finance_teacher
ID: 35039194
tgerbert, that is correct
0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35039195
Ahh, sorry - didn't notice you were using the OleDb provider to read the source file.  Try this ReadFile() method:

static DataSet ReadFile(string filePath)
{
	string connectionString = String.Format(
		@"Provider=Microsoft.Jet.OleDb.4.0;Data Source={0};Extended Properties=""Text;HDR=Yes;FMT=Delimited""",
		System.IO.Path.GetDirectoryName(filePath));

	using (OleDbConnection conn = new OleDbConnection(connectionString))
	{
		string command = String.Format(
			@"SELECT * FROM {0}", System.IO.Path.GetFileName(filePath));
		using (OleDbDataAdapter adapter = new OleDbDataAdapter(command, conn))
		{
			DataSet ds = new DataSet("Temp");
			adapter.Fill(ds);

			// Loop through the data table and change names like "Doe,John" to "John Doe"
			foreach (DataRow row in ds.Tables[0].Rows)
				row[0] = Regex.Replace((string)row[0], @"([^,]*),([^,]*)", "$2 $1");

			return ds;
		}
	}

Open in new window

0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35039216
In my quick test I was using it as a static method, you don't necessarily have to do that in your project - change "static" to "private" to keep it the way you had it (I also missed the closing curly-brace when I copy/pasted).
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35039244
And I didn't show it, but you'll need a using System.Text.RegularExpressions; at the top of your code file.
0
 
LVL 62

Expert Comment

by:Fernando Soto
ID: 35039349
Hi finance_teacher;

To your last question. You have a couple of options, 1 save the modified data to the file system and then use your code. 2 create the data table in code as shown in the code snippet below.

private void button1_Click( object sender, EventArgs e )
{
    string data = System.IO.File.ReadAllText("TestData.txt" );
    data = System.Text.RegularExpressions.Regex.Replace( data, @"(\x22.*)(,)(.*\x22)", "$1$3" );
    DataTable dt = CtreateTable( data );
    dataGridView1.DataSource = dt;
}

private DataTable CtreateTable( string data )
{
    string[] dataRows = data.Split( "\r\n".ToCharArray( ), StringSplitOptions.RemoveEmptyEntries );
    DataTable dt = new DataTable( );

    // First line has the column headders
    foreach( string headder in dataRows[ 0 ].Split( ",".ToCharArray( ) ) )
    {
        dt.Columns.Add( new DataColumn( headder ) );
    }

    // Load the data
    for( int idx = 1; idx < dataRows.Length; idx++)
    {
        string[] fields = dataRows[ idx ].Split( ",".ToCharArray( ) );
        int column = 0;
        DataRow dr = dt.NewRow( );
        foreach( string field in fields )
        {
            dr[ column ] = field.Trim("\"".ToCharArray());
            column++;
        }
        dt.Rows.Add( dr );
    }
    return dt;
}

Open in new window


Fernando
0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35039407
I think the below line would incorrectly split "Doe,John" into two fields:
string[] fields = dataRows[ idx ].Split( ",".ToCharArray( ) );

Open in new window


If you wanted to go that route I think I would use StreamReader.ReadLine (as opposed to looping through a string[]), and I think you would need to use Regex.Replace(dataRows[idx], @"[""']([^""']*),([^""']*)[""']", "$2 $1"); before using String.Split, so that "Doe,John",123-45-6789,1234 would become John Doe,123-45-6789,1234 allowing you to split on the comma.
0
 
LVL 62

Expert Comment

by:Fernando Soto
ID: 35039637
@ tgerbert;

This is the code that tales care of the commas in "Doe, John" field.

data = System.Text.RegularExpressions.Regex.Replace( data, @"(\x22.*)(,)(.*\x22)", "$1$3" )

Open in new window


this code correctly splits the filed in each line. At this point "Doe, John" is "Doe John" without comma.

string[] fields = dataRows[ idx ].Split( ",".ToCharArray( ) );

Open in new window


Fernando
0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35039804
Ah, I missed that it was in the other method.

I still would use Regex.Replace(data, @"[""]([^""]*),([^""]*)[""']($|,)", "$2 $1$3") to reverse the order of the names, ensure there's a space between them ("Doe,John" should become "John Doe", not "DoeJohn" nor "Doe John"), skip the enclosing quotes (that way you won't need .Trim() later).


0
 
LVL 2

Expert Comment

by:hariprasadrj
ID: 35040500
Following code should help -
 public System.Data.DataSet LoadData()
        {
            string path = @"C:\a.txt";

            string fileContents = System.IO.File.ReadAllText(path);
            fileContents = System.Text.RegularExpressions.Regex.Replace(fileContents, @"(\x22.*)(,)(.*\x22)", "$1$3");

            System.IO.File.WriteAllText(path,fileContents.ToString());

            System.Data.OleDb.OleDbConnection conn = new System.Data.OleDb.OleDbConnection("Provider=Microsoft.Jet.OleDb.4.0; Data Source = " + System.IO.Path.GetDirectoryName(path) + "; Extended Properties = \"Text;HDR=YES;FMT=Delimited\"");
            conn.Open();
            System.Data.OleDb.OleDbDataAdapter adapter = new System.Data.OleDb.OleDbDataAdapter("SELECT * FROM " + System.IO.Path.GetFileName(path), conn);
            System.Data.DataSet ds = new System.Data.DataSet("Temp");
            adapter.Fill(ds);

            return ds;
            
        }

Open in new window

0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35041145
That Regex statement above leaves the name as DoeJohn instead of John Doe.  That approach also requires a read from disk (line 5), write to disk (line 8) and another read from disk (line 14).

In my comment above http:#35039195 I'm only performing one disk operation, which is to read the file with the adapter.Fill(ds) statement, subsequent operations are done on in-memory data.

I suppose both approaches have their merits, depending on the desired end result - but it's not necessarily uncommon to see text data files up to the tens of thousands of records; and reading, writing then reading a file that big will be noticeably slower than just reading it.  With small files, to the hundreds of lines, I would imagine there wouldn't be a perceivable impact on performance one way or the other.
0
 
LVL 2

Expert Comment

by:hariprasadrj
ID: 35041824
Thats true, it is just sample code. if somebody is loading data from txt file and processing million records then ETL tool is best but not big fat .Net dataset. I assume here the data set is for showing in grid in this context expected to have hundreds of records.

the sample code illustrates solution for the given problem but not the best pattern, there could be tons of patterns to follow depends on the requirement.
0
 
LVL 33

Accepted Solution

by:
Todd Gerbert earned 500 total points
ID: 35041961
Like I said, both approaches have their pros and cons, but the regular expression is still wrong regardless.  The asker specified "LastName,FirstName" should become "FirstName LastName" (http:#a35039194).  Given an input string of "LastName,FirstName" the statement Regex.Replace(fileContents, @"(\x22.*)(,)(.*\x22)", "$1$3") results in "LastNameFirstName" (no space, and the name's are in the wrong order).  That regular expression also won't correctly handle instances with two consecutive text fields, removing the comma between the fields instead of the comma in the field.

string sample = @"""Name"",""Street Address""";
Regex.Replace(sample, @"(\x22.*)(,)(.*\x22)", "$1$3"); // "Name""Street Address"
Regex.Replace(sample, @"[""]([^""]*),([^""]*)[""']($|,)", "$2 $1$3"); // "Name","Street Address"

Open in new window

0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

In my previous two articles we discussed Binary Serialization (http://www.experts-exchange.com/A_4362.html) and XML Serialization (http://www.experts-exchange.com/A_4425.html). In this article we will try to know more about SOAP (Simple Object Acces…
Performance in games development is paramount: every microsecond counts to be able to do everything in less than 33ms (aiming at 16ms). C# foreach statement is one of the worst performance killers, and here I explain why.
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now