[Webinar] Streamline your web hosting managementRegister Today

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 855
  • Last Modified:

C# -- remove comma when reading TXT file ?

The below code reads in my below "John Doe" variance.txt correctly.

NAME,FILE#,SSN,CHECKVIEWPAYDATE,Total_Hrs,CHECKVIEWGROSSPAYA,HOMEDEPARTMENT
"John Doe",229999,9263,12/06/2010,56,278.72,104002
"Jane Doe",181,6331,12/06/2010,40,74,101301
"John Smith",1041,1207,12/06/2010,40,12,101001
----------------------------------------------------------------
How can I change the code to remove the , or replacing it with
a space or something so it reads in the below "John, Doe"
as one column ?

NAME,FILE#,SSN,CHECKVIEWPAYDATE,Total_Hrs,CHECKVIEWGROSSPAYA,HOMEDEPARTMENT
"John, Doe",229999,9263,12/06/2010,56,278.72,104002
"Jane, Doe",181,6331,12/06/2010,40,74,101301
"John, Smith",1041,1207,12/06/2010,40,12,101001

----------------------------------------------------------------
        private DataSet ReadFile(string filePath)
        {
            OleDbConnection conn = new OleDbConnection("Provider=Microsoft.Jet.OleDb.4.0; Data Source = " + System.IO.Path.GetDirectoryName(filePath) + "; Extended Properties = \"Text;HDR=YES;FMT=Delimited\"");
            conn.Open();
            OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM " + System.IO.Path.GetFileName(filePath), conn);
            DataSet ds = new DataSet("Temp");
            adapter.Fill(ds);

            return ds;
        }
----------------------------------------------------------------
        public bool ImportFileVariance()
        {
            string filePath = @"C:\import\variance.txt";

            DataSet ds = ReadFile(filePath);

            return dbAccess.ImportVarianceRecords(ds.Tables[0]);
        }
----------------------------------------------------------------
0
finance_teacher
Asked:
finance_teacher
  • 8
  • 3
  • 2
  • +3
1 Solution
 
rawinnlnx9Commented:
Read the text into a string.


string txtFileContents = "";

// execute the code to open the file and serialize it to a string.

//Now use .Replace()

txtFileContents = txtFileContents.Replace(",", "_"); // the underscore can be a space or anything you want.

Open in new window

0
 
Fernando SotoRetiredCommented:
Hi finance_teacher;

This code snippet should give you the results you want.

// Reading data from a text file.
string data = System.IO.File.ReadAllText( "TestData.txt" );
// Remove all commas that are in between quotes
data = System.Text.RegularExpressions.Regex.Replace( data, @"(\x22.*)(,)(.*\x22)", "$1$3" );

Open in new window


Fernando
0
 
dimajCommented:
You could also use a Split() function.

Basically, do something like this:
string[] parts = curLine.Split(',');

That should give you an array of every field. Then you can manipulate your fields the way you want to.
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
finance_teacherAuthor Commented:
How can I get the below "data" string into my "ds" DataSet ?

        public bool ImportFileVariance()
        {
            string filePath = @"C:\import\variance.txt";
            string data = System.IO.File.ReadAllText(filePath);

           //below works, replacing , with space
            data = System.Text.RegularExpressions.Regex.Replace(data, @"(\x22.*)(,)(.*\x22)", "$1$3");
   
            //errors when running below step
            DataSet ds = ReadFile(data);
            return dbAccess.ImportVarianceRecords(ds.Tables[0]);
        }
0
 
Todd GerbertIT ConsultantCommented:
What does your original CSV text file look like (your question is a little unclear):
Doe,John,1234,123-45-6789, etc
John,Doe,1234,123-45-6789, etc
John Doe,1234,123-45-6789, etc
"Doe", "John",1234,123-45-6789, etc
"Doe, John",1234,123-45-6789, etc
"John", "Doe",1234,123-45-6789, etc
"John, Doe",1234,123-45-6789, etc

And to be clear you want to "transform" it like below?
"John Doe",1234,123-45-6789, etc

0
 
finance_teacherAuthor Commented:
tgerbert, that is correct
0
 
Todd GerbertIT ConsultantCommented:
Ahh, sorry - didn't notice you were using the OleDb provider to read the source file.  Try this ReadFile() method:

static DataSet ReadFile(string filePath)
{
	string connectionString = String.Format(
		@"Provider=Microsoft.Jet.OleDb.4.0;Data Source={0};Extended Properties=""Text;HDR=Yes;FMT=Delimited""",
		System.IO.Path.GetDirectoryName(filePath));

	using (OleDbConnection conn = new OleDbConnection(connectionString))
	{
		string command = String.Format(
			@"SELECT * FROM {0}", System.IO.Path.GetFileName(filePath));
		using (OleDbDataAdapter adapter = new OleDbDataAdapter(command, conn))
		{
			DataSet ds = new DataSet("Temp");
			adapter.Fill(ds);

			// Loop through the data table and change names like "Doe,John" to "John Doe"
			foreach (DataRow row in ds.Tables[0].Rows)
				row[0] = Regex.Replace((string)row[0], @"([^,]*),([^,]*)", "$2 $1");

			return ds;
		}
	}

Open in new window

0
 
Todd GerbertIT ConsultantCommented:
In my quick test I was using it as a static method, you don't necessarily have to do that in your project - change "static" to "private" to keep it the way you had it (I also missed the closing curly-brace when I copy/pasted).
0
 
Todd GerbertIT ConsultantCommented:
And I didn't show it, but you'll need a using System.Text.RegularExpressions; at the top of your code file.
0
 
Fernando SotoRetiredCommented:
Hi finance_teacher;

To your last question. You have a couple of options, 1 save the modified data to the file system and then use your code. 2 create the data table in code as shown in the code snippet below.

private void button1_Click( object sender, EventArgs e )
{
    string data = System.IO.File.ReadAllText("TestData.txt" );
    data = System.Text.RegularExpressions.Regex.Replace( data, @"(\x22.*)(,)(.*\x22)", "$1$3" );
    DataTable dt = CtreateTable( data );
    dataGridView1.DataSource = dt;
}

private DataTable CtreateTable( string data )
{
    string[] dataRows = data.Split( "\r\n".ToCharArray( ), StringSplitOptions.RemoveEmptyEntries );
    DataTable dt = new DataTable( );

    // First line has the column headders
    foreach( string headder in dataRows[ 0 ].Split( ",".ToCharArray( ) ) )
    {
        dt.Columns.Add( new DataColumn( headder ) );
    }

    // Load the data
    for( int idx = 1; idx < dataRows.Length; idx++)
    {
        string[] fields = dataRows[ idx ].Split( ",".ToCharArray( ) );
        int column = 0;
        DataRow dr = dt.NewRow( );
        foreach( string field in fields )
        {
            dr[ column ] = field.Trim("\"".ToCharArray());
            column++;
        }
        dt.Rows.Add( dr );
    }
    return dt;
}

Open in new window


Fernando
0
 
Todd GerbertIT ConsultantCommented:
I think the below line would incorrectly split "Doe,John" into two fields:
string[] fields = dataRows[ idx ].Split( ",".ToCharArray( ) );

Open in new window


If you wanted to go that route I think I would use StreamReader.ReadLine (as opposed to looping through a string[]), and I think you would need to use Regex.Replace(dataRows[idx], @"[""']([^""']*),([^""']*)[""']", "$2 $1"); before using String.Split, so that "Doe,John",123-45-6789,1234 would become John Doe,123-45-6789,1234 allowing you to split on the comma.
0
 
Fernando SotoRetiredCommented:
@ tgerbert;

This is the code that tales care of the commas in "Doe, John" field.

data = System.Text.RegularExpressions.Regex.Replace( data, @"(\x22.*)(,)(.*\x22)", "$1$3" )

Open in new window


this code correctly splits the filed in each line. At this point "Doe, John" is "Doe John" without comma.

string[] fields = dataRows[ idx ].Split( ",".ToCharArray( ) );

Open in new window


Fernando
0
 
Todd GerbertIT ConsultantCommented:
Ah, I missed that it was in the other method.

I still would use Regex.Replace(data, @"[""]([^""]*),([^""]*)[""']($|,)", "$2 $1$3") to reverse the order of the names, ensure there's a space between them ("Doe,John" should become "John Doe", not "DoeJohn" nor "Doe John"), skip the enclosing quotes (that way you won't need .Trim() later).


0
 
hariprasadrjCommented:
Following code should help -
 public System.Data.DataSet LoadData()
        {
            string path = @"C:\a.txt";

            string fileContents = System.IO.File.ReadAllText(path);
            fileContents = System.Text.RegularExpressions.Regex.Replace(fileContents, @"(\x22.*)(,)(.*\x22)", "$1$3");

            System.IO.File.WriteAllText(path,fileContents.ToString());

            System.Data.OleDb.OleDbConnection conn = new System.Data.OleDb.OleDbConnection("Provider=Microsoft.Jet.OleDb.4.0; Data Source = " + System.IO.Path.GetDirectoryName(path) + "; Extended Properties = \"Text;HDR=YES;FMT=Delimited\"");
            conn.Open();
            System.Data.OleDb.OleDbDataAdapter adapter = new System.Data.OleDb.OleDbDataAdapter("SELECT * FROM " + System.IO.Path.GetFileName(path), conn);
            System.Data.DataSet ds = new System.Data.DataSet("Temp");
            adapter.Fill(ds);

            return ds;
            
        }

Open in new window

0
 
Todd GerbertIT ConsultantCommented:
That Regex statement above leaves the name as DoeJohn instead of John Doe.  That approach also requires a read from disk (line 5), write to disk (line 8) and another read from disk (line 14).

In my comment above http:#35039195 I'm only performing one disk operation, which is to read the file with the adapter.Fill(ds) statement, subsequent operations are done on in-memory data.

I suppose both approaches have their merits, depending on the desired end result - but it's not necessarily uncommon to see text data files up to the tens of thousands of records; and reading, writing then reading a file that big will be noticeably slower than just reading it.  With small files, to the hundreds of lines, I would imagine there wouldn't be a perceivable impact on performance one way or the other.
0
 
hariprasadrjCommented:
Thats true, it is just sample code. if somebody is loading data from txt file and processing million records then ETL tool is best but not big fat .Net dataset. I assume here the data set is for showing in grid in this context expected to have hundreds of records.

the sample code illustrates solution for the given problem but not the best pattern, there could be tons of patterns to follow depends on the requirement.
0
 
Todd GerbertIT ConsultantCommented:
Like I said, both approaches have their pros and cons, but the regular expression is still wrong regardless.  The asker specified "LastName,FirstName" should become "FirstName LastName" (http:#a35039194).  Given an input string of "LastName,FirstName" the statement Regex.Replace(fileContents, @"(\x22.*)(,)(.*\x22)", "$1$3") results in "LastNameFirstName" (no space, and the name's are in the wrong order).  That regular expression also won't correctly handle instances with two consecutive text fields, removing the comma between the fields instead of the comma in the field.

string sample = @"""Name"",""Street Address""";
Regex.Replace(sample, @"(\x22.*)(,)(.*\x22)", "$1$3"); // "Name""Street Address"
Regex.Replace(sample, @"[""]([^""]*),([^""]*)[""']($|,)", "$2 $1$3"); // "Name","Street Address"

Open in new window

0

Featured Post

Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

  • 8
  • 3
  • 2
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now