Solved

Text file processing

Posted on 2011-09-20
3
240 Views
Last Modified: 2012-05-12
I have a very large (1gb+) text file containing 3 columns.  In the 2nd column I have a date in the format dd/mm/yy.

I need to split the file up based on the date field i.e. create a txt file for each group of dates.
I thought one way to do this would be using the streamreader object to read each line and compare it with the one before to see if it is the same.  Having a look at my attempt below.
I'm not sure how to read the line before.   Not even sure if this is the best way to do.

My C# skill is very basic as you can tell.   I would appreciate any direction.



using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;


namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {

            DateTime dat_first;
            DateTime dat_second;

            try
            {
                
                using (StreamReader sr = new StreamReader("C:\\TEMP\\split\\bigfile.txt"))
                {
                    string line;
                    // Read and display lines from the file until the end of 
                    // the file is reached.
                    
                    while ((line = sr.ReadLine()) != null)
                    {
                        string[] words = line.Split('|');
                        dat_first = DateTime.Parse(words[1]);   

                        // Need dat_second from the next line and check if the the date is the same as dat_first then 
                        // create a new text file containing this row and subsequent rows whilst the dates are the same.
                        // only when the dates differ should we create a new file.  files titled with the dat_first date.

                      
                    }
                }
            }
            catch (Exception e)
            {
                // Let the user know what went wrong.
                Console.WriteLine("The file could not be read:");
                Console.WriteLine(e.Message);
            }
        
        }
    }
}

Open in new window

0
Comment
Question by:lee_jd
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 9

Expert Comment

by:dexterrajesh
ID: 36568421
hi,

Instead you can do

string text = sr.ReadToEnd();

and then get the substrings based on the LastIndexOf() dates  instead iterating line by line...
0
 
LVL 13

Accepted Solution

by:
jonnidip earned 500 total points
ID: 36571997
I would approach your problem in this way:
- While reading your bigfile.txt (StreamReader and ReadLine() is correct for me)
- You read the date in the column
- And write (append) that line in an output file having that date in the name.

I think there is no need to keep 2 dates to compare, you only need the "actual" date in the line you are reading.
A sample of what I mean:
using (StreamReader sr = new StreamReader(@"d:\temp\test1.txt"))
{
    string line;
    while ((line = sr.ReadLine()) != null)
    {
        // Check if the line contains at least one separator:
        if (line.Contains("|"))
        {
            string[] words = line.Split('|');
            dateRead = DateTime.Parse(words[1]);

            System.IO.File.AppendAllText(String.Format(@"d:\temp\output_{0}.txt", dateRead.ToString("yyyyMMdd")), line);
        }
    }
}

Open in new window


Regards.
0
 
LVL 13

Expert Comment

by:jonnidip
ID: 36572032
Please note that you can really avoid splitting the line and parsing the value to DateTime, since what you need is the actual value contained in the column.
You can try this:
using (StreamReader sr = new StreamReader(@"d:\temp\test1.txt"))
{
    string line;
    while ((line = sr.ReadLine()) != null)
    {
        Int32 DateStart = line.IndexOf('|') + 1;
        Int32 DateEnd = line.IndexOf('|', DateStart);

        if (DateStart >= 0 && DateEnd >= 0)
            File.AppendAllText(String.Format(@"d:\temp\output_{0}.txt", line.Substring(DateStart, DateEnd - DateStart).Replace("/", "-")), line);
    }
}

Open in new window


...but it really depends on how the date is written in the file and how you want to write it in the output file...

Regards.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Extention Methods in C# 3.0 by Ivo Stoykov C# 3.0 offers extension methods. They allow extending existing classes without changing the class's source code or relying on inheritance. These are static methods invoked as instance method. This…
Performance in games development is paramount: every microsecond counts to be able to do everything in less than 33ms (aiming at 16ms). C# foreach statement is one of the worst performance killers, and here I explain why.
NetCrunch network monitor is a highly extensive platform for network monitoring and alert generation. In this video you'll see a live demo of NetCrunch with most notable features explained in a walk-through manner. You'll also get to know the philos…
Michael from AdRem Software explains how to view the most utilized and worst performing nodes in your network, by accessing the Top Charts view in NetCrunch network monitor (https://www.adremsoft.com/). Top Charts is a view in which you can set seve…

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question