Solved

Text file processing

Posted on 2011-09-20
3
237 Views
Last Modified: 2012-05-12
I have a very large (1gb+) text file containing 3 columns.  In the 2nd column I have a date in the format dd/mm/yy.

I need to split the file up based on the date field i.e. create a txt file for each group of dates.
I thought one way to do this would be using the streamreader object to read each line and compare it with the one before to see if it is the same.  Having a look at my attempt below.
I'm not sure how to read the line before.   Not even sure if this is the best way to do.

My C# skill is very basic as you can tell.   I would appreciate any direction.



using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;


namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {

            DateTime dat_first;
            DateTime dat_second;

            try
            {
                
                using (StreamReader sr = new StreamReader("C:\\TEMP\\split\\bigfile.txt"))
                {
                    string line;
                    // Read and display lines from the file until the end of 
                    // the file is reached.
                    
                    while ((line = sr.ReadLine()) != null)
                    {
                        string[] words = line.Split('|');
                        dat_first = DateTime.Parse(words[1]);   

                        // Need dat_second from the next line and check if the the date is the same as dat_first then 
                        // create a new text file containing this row and subsequent rows whilst the dates are the same.
                        // only when the dates differ should we create a new file.  files titled with the dat_first date.

                      
                    }
                }
            }
            catch (Exception e)
            {
                // Let the user know what went wrong.
                Console.WriteLine("The file could not be read:");
                Console.WriteLine(e.Message);
            }
        
        }
    }
}

Open in new window

0
Comment
Question by:lee_jd
  • 2
3 Comments
 
LVL 9

Expert Comment

by:dexterrajesh
ID: 36568421
hi,

Instead you can do

string text = sr.ReadToEnd();

and then get the substrings based on the LastIndexOf() dates  instead iterating line by line...
0
 
LVL 13

Accepted Solution

by:
jonnidip earned 500 total points
ID: 36571997
I would approach your problem in this way:
- While reading your bigfile.txt (StreamReader and ReadLine() is correct for me)
- You read the date in the column
- And write (append) that line in an output file having that date in the name.

I think there is no need to keep 2 dates to compare, you only need the "actual" date in the line you are reading.
A sample of what I mean:
using (StreamReader sr = new StreamReader(@"d:\temp\test1.txt"))
{
    string line;
    while ((line = sr.ReadLine()) != null)
    {
        // Check if the line contains at least one separator:
        if (line.Contains("|"))
        {
            string[] words = line.Split('|');
            dateRead = DateTime.Parse(words[1]);

            System.IO.File.AppendAllText(String.Format(@"d:\temp\output_{0}.txt", dateRead.ToString("yyyyMMdd")), line);
        }
    }
}

Open in new window


Regards.
0
 
LVL 13

Expert Comment

by:jonnidip
ID: 36572032
Please note that you can really avoid splitting the line and parsing the value to DateTime, since what you need is the actual value contained in the column.
You can try this:
using (StreamReader sr = new StreamReader(@"d:\temp\test1.txt"))
{
    string line;
    while ((line = sr.ReadLine()) != null)
    {
        Int32 DateStart = line.IndexOf('|') + 1;
        Int32 DateEnd = line.IndexOf('|', DateStart);

        if (DateStart >= 0 && DateEnd >= 0)
            File.AppendAllText(String.Format(@"d:\temp\output_{0}.txt", line.Substring(DateStart, DateEnd - DateStart).Replace("/", "-")), line);
    }
}

Open in new window


...but it really depends on how the date is written in the file and how you want to write it in the output file...

Regards.
0

Featured Post

Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction Although it is an old technology, serial ports are still being used by many hardware manufacturers. If you develop applications in C#, Microsoft .NET framework has SerialPort class to communicate with the serial ports.  I needed to…
Summary: Persistence is the capability of an application to store the state of objects and recover it when necessary. This article compares the two common types of serialization in aspects of data access, readability, and runtime cost. A ready-to…
In a recent question (https://www.experts-exchange.com/questions/28997919/Pagination-in-Adobe-Acrobat.html) here at Experts Exchange, a member asked how to add page numbers to a PDF file using Adobe Acrobat XI Pro. This short video Micro Tutorial sh…
Two types of users will appreciate AOMEI Backupper Pro: 1 - Those with PCIe drives (and haven't found cloning software that works on them). 2 - Those who want a fast clone of their boot drive (no re-boots needed) and it can clone your drive wh…

773 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question