• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 247
  • Last Modified:

Text file processing

I have a very large (1gb+) text file containing 3 columns.  In the 2nd column I have a date in the format dd/mm/yy.

I need to split the file up based on the date field i.e. create a txt file for each group of dates.
I thought one way to do this would be using the streamreader object to read each line and compare it with the one before to see if it is the same.  Having a look at my attempt below.
I'm not sure how to read the line before.   Not even sure if this is the best way to do.

My C# skill is very basic as you can tell.   I would appreciate any direction.



using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;


namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {

            DateTime dat_first;
            DateTime dat_second;

            try
            {
                
                using (StreamReader sr = new StreamReader("C:\\TEMP\\split\\bigfile.txt"))
                {
                    string line;
                    // Read and display lines from the file until the end of 
                    // the file is reached.
                    
                    while ((line = sr.ReadLine()) != null)
                    {
                        string[] words = line.Split('|');
                        dat_first = DateTime.Parse(words[1]);   

                        // Need dat_second from the next line and check if the the date is the same as dat_first then 
                        // create a new text file containing this row and subsequent rows whilst the dates are the same.
                        // only when the dates differ should we create a new file.  files titled with the dat_first date.

                      
                    }
                }
            }
            catch (Exception e)
            {
                // Let the user know what went wrong.
                Console.WriteLine("The file could not be read:");
                Console.WriteLine(e.Message);
            }
        
        }
    }
}

Open in new window

0
lee_jd
Asked:
lee_jd
  • 2
1 Solution
 
dexterrajeshCommented:
hi,

Instead you can do

string text = sr.ReadToEnd();

and then get the substrings based on the LastIndexOf() dates  instead iterating line by line...
0
 
jonnidipCommented:
I would approach your problem in this way:
- While reading your bigfile.txt (StreamReader and ReadLine() is correct for me)
- You read the date in the column
- And write (append) that line in an output file having that date in the name.

I think there is no need to keep 2 dates to compare, you only need the "actual" date in the line you are reading.
A sample of what I mean:
using (StreamReader sr = new StreamReader(@"d:\temp\test1.txt"))
{
    string line;
    while ((line = sr.ReadLine()) != null)
    {
        // Check if the line contains at least one separator:
        if (line.Contains("|"))
        {
            string[] words = line.Split('|');
            dateRead = DateTime.Parse(words[1]);

            System.IO.File.AppendAllText(String.Format(@"d:\temp\output_{0}.txt", dateRead.ToString("yyyyMMdd")), line);
        }
    }
}

Open in new window


Regards.
0
 
jonnidipCommented:
Please note that you can really avoid splitting the line and parsing the value to DateTime, since what you need is the actual value contained in the column.
You can try this:
using (StreamReader sr = new StreamReader(@"d:\temp\test1.txt"))
{
    string line;
    while ((line = sr.ReadLine()) != null)
    {
        Int32 DateStart = line.IndexOf('|') + 1;
        Int32 DateEnd = line.IndexOf('|', DateStart);

        if (DateStart >= 0 && DateEnd >= 0)
            File.AppendAllText(String.Format(@"d:\temp\output_{0}.txt", line.Substring(DateStart, DateEnd - DateStart).Replace("/", "-")), line);
    }
}

Open in new window


...but it really depends on how the date is written in the file and how you want to write it in the output file...

Regards.
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now