Solved

Text file processing

Posted on 2011-09-20
3
239 Views
Last Modified: 2012-05-12
I have a very large (1gb+) text file containing 3 columns.  In the 2nd column I have a date in the format dd/mm/yy.

I need to split the file up based on the date field i.e. create a txt file for each group of dates.
I thought one way to do this would be using the streamreader object to read each line and compare it with the one before to see if it is the same.  Having a look at my attempt below.
I'm not sure how to read the line before.   Not even sure if this is the best way to do.

My C# skill is very basic as you can tell.   I would appreciate any direction.



using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;


namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {

            DateTime dat_first;
            DateTime dat_second;

            try
            {
                
                using (StreamReader sr = new StreamReader("C:\\TEMP\\split\\bigfile.txt"))
                {
                    string line;
                    // Read and display lines from the file until the end of 
                    // the file is reached.
                    
                    while ((line = sr.ReadLine()) != null)
                    {
                        string[] words = line.Split('|');
                        dat_first = DateTime.Parse(words[1]);   

                        // Need dat_second from the next line and check if the the date is the same as dat_first then 
                        // create a new text file containing this row and subsequent rows whilst the dates are the same.
                        // only when the dates differ should we create a new file.  files titled with the dat_first date.

                      
                    }
                }
            }
            catch (Exception e)
            {
                // Let the user know what went wrong.
                Console.WriteLine("The file could not be read:");
                Console.WriteLine(e.Message);
            }
        
        }
    }
}

Open in new window

0
Comment
Question by:lee_jd
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 9

Expert Comment

by:dexterrajesh
ID: 36568421
hi,

Instead you can do

string text = sr.ReadToEnd();

and then get the substrings based on the LastIndexOf() dates  instead iterating line by line...
0
 
LVL 13

Accepted Solution

by:
jonnidip earned 500 total points
ID: 36571997
I would approach your problem in this way:
- While reading your bigfile.txt (StreamReader and ReadLine() is correct for me)
- You read the date in the column
- And write (append) that line in an output file having that date in the name.

I think there is no need to keep 2 dates to compare, you only need the "actual" date in the line you are reading.
A sample of what I mean:
using (StreamReader sr = new StreamReader(@"d:\temp\test1.txt"))
{
    string line;
    while ((line = sr.ReadLine()) != null)
    {
        // Check if the line contains at least one separator:
        if (line.Contains("|"))
        {
            string[] words = line.Split('|');
            dateRead = DateTime.Parse(words[1]);

            System.IO.File.AppendAllText(String.Format(@"d:\temp\output_{0}.txt", dateRead.ToString("yyyyMMdd")), line);
        }
    }
}

Open in new window


Regards.
0
 
LVL 13

Expert Comment

by:jonnidip
ID: 36572032
Please note that you can really avoid splitting the line and parsing the value to DateTime, since what you need is the actual value contained in the column.
You can try this:
using (StreamReader sr = new StreamReader(@"d:\temp\test1.txt"))
{
    string line;
    while ((line = sr.ReadLine()) != null)
    {
        Int32 DateStart = line.IndexOf('|') + 1;
        Int32 DateEnd = line.IndexOf('|', DateStart);

        if (DateStart >= 0 && DateEnd >= 0)
            File.AppendAllText(String.Format(@"d:\temp\output_{0}.txt", line.Substring(DateStart, DateEnd - DateStart).Replace("/", "-")), line);
    }
}

Open in new window


...but it really depends on how the date is written in the file and how you want to write it in the output file...

Regards.
0

Featured Post

Instantly Create Instructional Tutorials

Contextual Guidance at the moment of need helps your employees adopt to new software or processes instantly. Boost knowledge retention and employee engagement step-by-step with one easy solution.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article series is supposed to shed some light on the use of IDisposable and objects that inherit from it. In essence, a more apt title for this article would be: using (IDisposable) {}. I’m just not sure how many people would ge…
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
In an interesting question (https://www.experts-exchange.com/questions/29008360/) here at Experts Exchange, a member asked how to split a single image into multiple images. The primary usage for this is to place many photographs on a flatbed scanner…
Are you ready to implement Active Directory best practices without reading 300+ pages? You're in luck. In this webinar hosted by Skyport Systems, you gain insight into Microsoft's latest comprehensive guide, with tips on the best and easiest way…
Suggested Courses

740 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question