Avatar of finance_teacher
finance_teacher
 asked on

C# -- remove comma when reading TXT file ?

The below code reads in my below "John Doe" variance.txt correctly.

NAME,FILE#,SSN,CHECKVIEWPAYDATE,Total_Hrs,CHECKVIEWGROSSPAYA,HOMEDEPARTMENT
"John Doe",229999,9263,12/06/2010,56,278.72,104002
"Jane Doe",181,6331,12/06/2010,40,74,101301
"John Smith",1041,1207,12/06/2010,40,12,101001
----------------------------------------------------------------
How can I change the code to remove the , or replacing it with
a space or something so it reads in the below "John, Doe"
as one column ?

NAME,FILE#,SSN,CHECKVIEWPAYDATE,Total_Hrs,CHECKVIEWGROSSPAYA,HOMEDEPARTMENT
"John, Doe",229999,9263,12/06/2010,56,278.72,104002
"Jane, Doe",181,6331,12/06/2010,40,74,101301
"John, Smith",1041,1207,12/06/2010,40,12,101001

----------------------------------------------------------------
        private DataSet ReadFile(string filePath)
        {
            OleDbConnection conn = new OleDbConnection("Provider=Microsoft.Jet.OleDb.4.0; Data Source = " + System.IO.Path.GetDirectoryName(filePath) + "; Extended Properties = \"Text;HDR=YES;FMT=Delimited\"");
            conn.Open();
            OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM " + System.IO.Path.GetFileName(filePath), conn);
            DataSet ds = new DataSet("Temp");
            adapter.Fill(ds);

            return ds;
        }
----------------------------------------------------------------
        public bool ImportFileVariance()
        {
            string filePath = @"C:\import\variance.txt";

            DataSet ds = ReadFile(filePath);

            return dbAccess.ImportVarianceRecords(ds.Tables[0]);
        }
----------------------------------------------------------------
.NET ProgrammingC#

Avatar of undefined
Last Comment
Todd Gerbert

8/22/2022 - Mon
rawinnlnx9

Read the text into a string.


string txtFileContents = "";

// execute the code to open the file and serialize it to a string.

//Now use .Replace()

txtFileContents = txtFileContents.Replace(",", "_"); // the underscore can be a space or anything you want.

Open in new window

Fernando Soto

Hi finance_teacher;

This code snippet should give you the results you want.

// Reading data from a text file.
string data = System.IO.File.ReadAllText( "TestData.txt" );
// Remove all commas that are in between quotes
data = System.Text.RegularExpressions.Regex.Replace( data, @"(\x22.*)(,)(.*\x22)", "$1$3" );

Open in new window


Fernando
dimaj

You could also use a Split() function.

Basically, do something like this:
string[] parts = curLine.Split(',');

That should give you an array of every field. Then you can manipulate your fields the way you want to.
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
finance_teacher

ASKER
How can I get the below "data" string into my "ds" DataSet ?

        public bool ImportFileVariance()
        {
            string filePath = @"C:\import\variance.txt";
            string data = System.IO.File.ReadAllText(filePath);

           //below works, replacing , with space
            data = System.Text.RegularExpressions.Regex.Replace(data, @"(\x22.*)(,)(.*\x22)", "$1$3");
   
            //errors when running below step
            DataSet ds = ReadFile(data);
            return dbAccess.ImportVarianceRecords(ds.Tables[0]);
        }
Todd Gerbert

What does your original CSV text file look like (your question is a little unclear):
Doe,John,1234,123-45-6789, etc
John,Doe,1234,123-45-6789, etc
John Doe,1234,123-45-6789, etc
"Doe", "John",1234,123-45-6789, etc
"Doe, John",1234,123-45-6789, etc
"John", "Doe",1234,123-45-6789, etc
"John, Doe",1234,123-45-6789, etc

And to be clear you want to "transform" it like below?
"John Doe",1234,123-45-6789, etc

finance_teacher

ASKER
tgerbert, that is correct
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Todd Gerbert

Ahh, sorry - didn't notice you were using the OleDb provider to read the source file.  Try this ReadFile() method:

static DataSet ReadFile(string filePath)
{
	string connectionString = String.Format(
		@"Provider=Microsoft.Jet.OleDb.4.0;Data Source={0};Extended Properties=""Text;HDR=Yes;FMT=Delimited""",
		System.IO.Path.GetDirectoryName(filePath));

	using (OleDbConnection conn = new OleDbConnection(connectionString))
	{
		string command = String.Format(
			@"SELECT * FROM {0}", System.IO.Path.GetFileName(filePath));
		using (OleDbDataAdapter adapter = new OleDbDataAdapter(command, conn))
		{
			DataSet ds = new DataSet("Temp");
			adapter.Fill(ds);

			// Loop through the data table and change names like "Doe,John" to "John Doe"
			foreach (DataRow row in ds.Tables[0].Rows)
				row[0] = Regex.Replace((string)row[0], @"([^,]*),([^,]*)", "$2 $1");

			return ds;
		}
	}

Open in new window

Todd Gerbert

In my quick test I was using it as a static method, you don't necessarily have to do that in your project - change "static" to "private" to keep it the way you had it (I also missed the closing curly-brace when I copy/pasted).
Todd Gerbert

And I didn't show it, but you'll need a using System.Text.RegularExpressions; at the top of your code file.
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
Fernando Soto

Hi finance_teacher;

To your last question. You have a couple of options, 1 save the modified data to the file system and then use your code. 2 create the data table in code as shown in the code snippet below.

private void button1_Click( object sender, EventArgs e )
{
    string data = System.IO.File.ReadAllText("TestData.txt" );
    data = System.Text.RegularExpressions.Regex.Replace( data, @"(\x22.*)(,)(.*\x22)", "$1$3" );
    DataTable dt = CtreateTable( data );
    dataGridView1.DataSource = dt;
}

private DataTable CtreateTable( string data )
{
    string[] dataRows = data.Split( "\r\n".ToCharArray( ), StringSplitOptions.RemoveEmptyEntries );
    DataTable dt = new DataTable( );

    // First line has the column headders
    foreach( string headder in dataRows[ 0 ].Split( ",".ToCharArray( ) ) )
    {
        dt.Columns.Add( new DataColumn( headder ) );
    }

    // Load the data
    for( int idx = 1; idx < dataRows.Length; idx++)
    {
        string[] fields = dataRows[ idx ].Split( ",".ToCharArray( ) );
        int column = 0;
        DataRow dr = dt.NewRow( );
        foreach( string field in fields )
        {
            dr[ column ] = field.Trim("\"".ToCharArray());
            column++;
        }
        dt.Rows.Add( dr );
    }
    return dt;
}

Open in new window


Fernando
Todd Gerbert

I think the below line would incorrectly split "Doe,John" into two fields:
string[] fields = dataRows[ idx ].Split( ",".ToCharArray( ) );

Open in new window


If you wanted to go that route I think I would use StreamReader.ReadLine (as opposed to looping through a string[]), and I think you would need to use Regex.Replace(dataRows[idx], @"[""']([^""']*),([^""']*)[""']", "$2 $1"); before using String.Split, so that "Doe,John",123-45-6789,1234 would become John Doe,123-45-6789,1234 allowing you to split on the comma.
Fernando Soto

@ tgerbert;

This is the code that tales care of the commas in "Doe, John" field.

data = System.Text.RegularExpressions.Regex.Replace( data, @"(\x22.*)(,)(.*\x22)", "$1$3" )

Open in new window


this code correctly splits the filed in each line. At this point "Doe, John" is "Doe John" without comma.

string[] fields = dataRows[ idx ].Split( ",".ToCharArray( ) );

Open in new window


Fernando
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Todd Gerbert

Ah, I missed that it was in the other method.

I still would use Regex.Replace(data, @"[""]([^""]*),([^""]*)[""']($|,)", "$2 $1$3") to reverse the order of the names, ensure there's a space between them ("Doe,John" should become "John Doe", not "DoeJohn" nor "Doe John"), skip the enclosing quotes (that way you won't need .Trim() later).


hariprasadrj

Following code should help -
 public System.Data.DataSet LoadData()
        {
            string path = @"C:\a.txt";

            string fileContents = System.IO.File.ReadAllText(path);
            fileContents = System.Text.RegularExpressions.Regex.Replace(fileContents, @"(\x22.*)(,)(.*\x22)", "$1$3");

            System.IO.File.WriteAllText(path,fileContents.ToString());

            System.Data.OleDb.OleDbConnection conn = new System.Data.OleDb.OleDbConnection("Provider=Microsoft.Jet.OleDb.4.0; Data Source = " + System.IO.Path.GetDirectoryName(path) + "; Extended Properties = \"Text;HDR=YES;FMT=Delimited\"");
            conn.Open();
            System.Data.OleDb.OleDbDataAdapter adapter = new System.Data.OleDb.OleDbDataAdapter("SELECT * FROM " + System.IO.Path.GetFileName(path), conn);
            System.Data.DataSet ds = new System.Data.DataSet("Temp");
            adapter.Fill(ds);

            return ds;
            
        }

Open in new window

Todd Gerbert

That Regex statement above leaves the name as DoeJohn instead of John Doe.  That approach also requires a read from disk (line 5), write to disk (line 8) and another read from disk (line 14).

In my comment above http:#35039195 I'm only performing one disk operation, which is to read the file with the adapter.Fill(ds) statement, subsequent operations are done on in-memory data.

I suppose both approaches have their merits, depending on the desired end result - but it's not necessarily uncommon to see text data files up to the tens of thousands of records; and reading, writing then reading a file that big will be noticeably slower than just reading it.  With small files, to the hundreds of lines, I would imagine there wouldn't be a perceivable impact on performance one way or the other.
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
hariprasadrj

Thats true, it is just sample code. if somebody is loading data from txt file and processing million records then ETL tool is best but not big fat .Net dataset. I assume here the data set is for showing in grid in this context expected to have hundreds of records.

the sample code illustrates solution for the given problem but not the best pattern, there could be tons of patterns to follow depends on the requirement.
ASKER CERTIFIED SOLUTION
Todd Gerbert

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.