Solved

C# -- remove comma when reading TXT file ?

Posted on 2011-03-04
17
794 Views
Last Modified: 2013-12-17
The below code reads in my below "John Doe" variance.txt correctly.

NAME,FILE#,SSN,CHECKVIEWPAYDATE,Total_Hrs,CHECKVIEWGROSSPAYA,HOMEDEPARTMENT
"John Doe",229999,9263,12/06/2010,56,278.72,104002
"Jane Doe",181,6331,12/06/2010,40,74,101301
"John Smith",1041,1207,12/06/2010,40,12,101001
----------------------------------------------------------------
How can I change the code to remove the , or replacing it with
a space or something so it reads in the below "John, Doe"
as one column ?

NAME,FILE#,SSN,CHECKVIEWPAYDATE,Total_Hrs,CHECKVIEWGROSSPAYA,HOMEDEPARTMENT
"John, Doe",229999,9263,12/06/2010,56,278.72,104002
"Jane, Doe",181,6331,12/06/2010,40,74,101301
"John, Smith",1041,1207,12/06/2010,40,12,101001

----------------------------------------------------------------
        private DataSet ReadFile(string filePath)
        {
            OleDbConnection conn = new OleDbConnection("Provider=Microsoft.Jet.OleDb.4.0; Data Source = " + System.IO.Path.GetDirectoryName(filePath) + "; Extended Properties = \"Text;HDR=YES;FMT=Delimited\"");
            conn.Open();
            OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM " + System.IO.Path.GetFileName(filePath), conn);
            DataSet ds = new DataSet("Temp");
            adapter.Fill(ds);

            return ds;
        }
----------------------------------------------------------------
        public bool ImportFileVariance()
        {
            string filePath = @"C:\import\variance.txt";

            DataSet ds = ReadFile(filePath);

            return dbAccess.ImportVarianceRecords(ds.Tables[0]);
        }
----------------------------------------------------------------
0
Comment
Question by:finance_teacher
  • 8
  • 3
  • 2
  • +3
17 Comments
 
LVL 9

Expert Comment

by:rawinnlnx9
ID: 35038313
Read the text into a string.


string txtFileContents = "";

// execute the code to open the file and serialize it to a string.

//Now use .Replace()

txtFileContents = txtFileContents.Replace(",", "_"); // the underscore can be a space or anything you want.

Open in new window

0
 
LVL 62

Expert Comment

by:Fernando Soto
ID: 35038376
Hi finance_teacher;

This code snippet should give you the results you want.

// Reading data from a text file.
string data = System.IO.File.ReadAllText( "TestData.txt" );
// Remove all commas that are in between quotes
data = System.Text.RegularExpressions.Regex.Replace( data, @"(\x22.*)(,)(.*\x22)", "$1$3" );

Open in new window


Fernando
0
 
LVL 7

Expert Comment

by:dimaj
ID: 35038409
You could also use a Split() function.

Basically, do something like this:
string[] parts = curLine.Split(',');

That should give you an array of every field. Then you can manipulate your fields the way you want to.
0
Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

 

Author Comment

by:finance_teacher
ID: 35038598
How can I get the below "data" string into my "ds" DataSet ?

        public bool ImportFileVariance()
        {
            string filePath = @"C:\import\variance.txt";
            string data = System.IO.File.ReadAllText(filePath);

           //below works, replacing , with space
            data = System.Text.RegularExpressions.Regex.Replace(data, @"(\x22.*)(,)(.*\x22)", "$1$3");
   
            //errors when running below step
            DataSet ds = ReadFile(data);
            return dbAccess.ImportVarianceRecords(ds.Tables[0]);
        }
0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35038937
What does your original CSV text file look like (your question is a little unclear):
Doe,John,1234,123-45-6789, etc
John,Doe,1234,123-45-6789, etc
John Doe,1234,123-45-6789, etc
"Doe", "John",1234,123-45-6789, etc
"Doe, John",1234,123-45-6789, etc
"John", "Doe",1234,123-45-6789, etc
"John, Doe",1234,123-45-6789, etc

And to be clear you want to "transform" it like below?
"John Doe",1234,123-45-6789, etc

0
 

Author Comment

by:finance_teacher
ID: 35039194
tgerbert, that is correct
0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35039195
Ahh, sorry - didn't notice you were using the OleDb provider to read the source file.  Try this ReadFile() method:

static DataSet ReadFile(string filePath)
{
	string connectionString = String.Format(
		@"Provider=Microsoft.Jet.OleDb.4.0;Data Source={0};Extended Properties=""Text;HDR=Yes;FMT=Delimited""",
		System.IO.Path.GetDirectoryName(filePath));

	using (OleDbConnection conn = new OleDbConnection(connectionString))
	{
		string command = String.Format(
			@"SELECT * FROM {0}", System.IO.Path.GetFileName(filePath));
		using (OleDbDataAdapter adapter = new OleDbDataAdapter(command, conn))
		{
			DataSet ds = new DataSet("Temp");
			adapter.Fill(ds);

			// Loop through the data table and change names like "Doe,John" to "John Doe"
			foreach (DataRow row in ds.Tables[0].Rows)
				row[0] = Regex.Replace((string)row[0], @"([^,]*),([^,]*)", "$2 $1");

			return ds;
		}
	}

Open in new window

0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35039216
In my quick test I was using it as a static method, you don't necessarily have to do that in your project - change "static" to "private" to keep it the way you had it (I also missed the closing curly-brace when I copy/pasted).
0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35039244
And I didn't show it, but you'll need a using System.Text.RegularExpressions; at the top of your code file.
0
 
LVL 62

Expert Comment

by:Fernando Soto
ID: 35039349
Hi finance_teacher;

To your last question. You have a couple of options, 1 save the modified data to the file system and then use your code. 2 create the data table in code as shown in the code snippet below.

private void button1_Click( object sender, EventArgs e )
{
    string data = System.IO.File.ReadAllText("TestData.txt" );
    data = System.Text.RegularExpressions.Regex.Replace( data, @"(\x22.*)(,)(.*\x22)", "$1$3" );
    DataTable dt = CtreateTable( data );
    dataGridView1.DataSource = dt;
}

private DataTable CtreateTable( string data )
{
    string[] dataRows = data.Split( "\r\n".ToCharArray( ), StringSplitOptions.RemoveEmptyEntries );
    DataTable dt = new DataTable( );

    // First line has the column headders
    foreach( string headder in dataRows[ 0 ].Split( ",".ToCharArray( ) ) )
    {
        dt.Columns.Add( new DataColumn( headder ) );
    }

    // Load the data
    for( int idx = 1; idx < dataRows.Length; idx++)
    {
        string[] fields = dataRows[ idx ].Split( ",".ToCharArray( ) );
        int column = 0;
        DataRow dr = dt.NewRow( );
        foreach( string field in fields )
        {
            dr[ column ] = field.Trim("\"".ToCharArray());
            column++;
        }
        dt.Rows.Add( dr );
    }
    return dt;
}

Open in new window


Fernando
0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35039407
I think the below line would incorrectly split "Doe,John" into two fields:
string[] fields = dataRows[ idx ].Split( ",".ToCharArray( ) );

Open in new window


If you wanted to go that route I think I would use StreamReader.ReadLine (as opposed to looping through a string[]), and I think you would need to use Regex.Replace(dataRows[idx], @"[""']([^""']*),([^""']*)[""']", "$2 $1"); before using String.Split, so that "Doe,John",123-45-6789,1234 would become John Doe,123-45-6789,1234 allowing you to split on the comma.
0
 
LVL 62

Expert Comment

by:Fernando Soto
ID: 35039637
@ tgerbert;

This is the code that tales care of the commas in "Doe, John" field.

data = System.Text.RegularExpressions.Regex.Replace( data, @"(\x22.*)(,)(.*\x22)", "$1$3" )

Open in new window


this code correctly splits the filed in each line. At this point "Doe, John" is "Doe John" without comma.

string[] fields = dataRows[ idx ].Split( ",".ToCharArray( ) );

Open in new window


Fernando
0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35039804
Ah, I missed that it was in the other method.

I still would use Regex.Replace(data, @"[""]([^""]*),([^""]*)[""']($|,)", "$2 $1$3") to reverse the order of the names, ensure there's a space between them ("Doe,John" should become "John Doe", not "DoeJohn" nor "Doe John"), skip the enclosing quotes (that way you won't need .Trim() later).


0
 
LVL 2

Expert Comment

by:hariprasadrj
ID: 35040500
Following code should help -
 public System.Data.DataSet LoadData()
        {
            string path = @"C:\a.txt";

            string fileContents = System.IO.File.ReadAllText(path);
            fileContents = System.Text.RegularExpressions.Regex.Replace(fileContents, @"(\x22.*)(,)(.*\x22)", "$1$3");

            System.IO.File.WriteAllText(path,fileContents.ToString());

            System.Data.OleDb.OleDbConnection conn = new System.Data.OleDb.OleDbConnection("Provider=Microsoft.Jet.OleDb.4.0; Data Source = " + System.IO.Path.GetDirectoryName(path) + "; Extended Properties = \"Text;HDR=YES;FMT=Delimited\"");
            conn.Open();
            System.Data.OleDb.OleDbDataAdapter adapter = new System.Data.OleDb.OleDbDataAdapter("SELECT * FROM " + System.IO.Path.GetFileName(path), conn);
            System.Data.DataSet ds = new System.Data.DataSet("Temp");
            adapter.Fill(ds);

            return ds;
            
        }

Open in new window

0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35041145
That Regex statement above leaves the name as DoeJohn instead of John Doe.  That approach also requires a read from disk (line 5), write to disk (line 8) and another read from disk (line 14).

In my comment above http:#35039195 I'm only performing one disk operation, which is to read the file with the adapter.Fill(ds) statement, subsequent operations are done on in-memory data.

I suppose both approaches have their merits, depending on the desired end result - but it's not necessarily uncommon to see text data files up to the tens of thousands of records; and reading, writing then reading a file that big will be noticeably slower than just reading it.  With small files, to the hundreds of lines, I would imagine there wouldn't be a perceivable impact on performance one way or the other.
0
 
LVL 2

Expert Comment

by:hariprasadrj
ID: 35041824
Thats true, it is just sample code. if somebody is loading data from txt file and processing million records then ETL tool is best but not big fat .Net dataset. I assume here the data set is for showing in grid in this context expected to have hundreds of records.

the sample code illustrates solution for the given problem but not the best pattern, there could be tons of patterns to follow depends on the requirement.
0
 
LVL 33

Accepted Solution

by:
Todd Gerbert earned 500 total points
ID: 35041961
Like I said, both approaches have their pros and cons, but the regular expression is still wrong regardless.  The asker specified "LastName,FirstName" should become "FirstName LastName" (http:#a35039194).  Given an input string of "LastName,FirstName" the statement Regex.Replace(fileContents, @"(\x22.*)(,)(.*\x22)", "$1$3") results in "LastNameFirstName" (no space, and the name's are in the wrong order).  That regular expression also won't correctly handle instances with two consecutive text fields, removing the comma between the fields instead of the comma in the field.

string sample = @"""Name"",""Street Address""";
Regex.Replace(sample, @"(\x22.*)(,)(.*\x22)", "$1$3"); // "Name""Street Address"
Regex.Replace(sample, @"[""]([^""]*),([^""]*)[""']($|,)", "$2 $1$3"); // "Name","Street Address"

Open in new window

0

Featured Post

DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

For those of you who don't follow the news, or just happen to live under rocks, Microsoft Research released a beta SDK (http://www.microsoft.com/en-us/download/details.aspx?id=27876) for the Xbox 360 Kinect. If you don't know what a Kinect is (http:…
A long time ago (May 2011), I have written an article showing you how to create a DLL using Visual Studio 2005 to be hosted in SQL Server 2005. That was valid at that time and it is still valid if you are still using these versions. You can still re…
Along with being a a promotional video for my three-day Annielytics Dashboard Seminor, this Micro Tutorial is an intro to Google Analytics API data.
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question