Solved

Text file processing

Posted on 2011-09-20
3
238 Views
Last Modified: 2012-05-12
I have a very large (1gb+) text file containing 3 columns.  In the 2nd column I have a date in the format dd/mm/yy.

I need to split the file up based on the date field i.e. create a txt file for each group of dates.
I thought one way to do this would be using the streamreader object to read each line and compare it with the one before to see if it is the same.  Having a look at my attempt below.
I'm not sure how to read the line before.   Not even sure if this is the best way to do.

My C# skill is very basic as you can tell.   I would appreciate any direction.



using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;


namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {

            DateTime dat_first;
            DateTime dat_second;

            try
            {
                
                using (StreamReader sr = new StreamReader("C:\\TEMP\\split\\bigfile.txt"))
                {
                    string line;
                    // Read and display lines from the file until the end of 
                    // the file is reached.
                    
                    while ((line = sr.ReadLine()) != null)
                    {
                        string[] words = line.Split('|');
                        dat_first = DateTime.Parse(words[1]);   

                        // Need dat_second from the next line and check if the the date is the same as dat_first then 
                        // create a new text file containing this row and subsequent rows whilst the dates are the same.
                        // only when the dates differ should we create a new file.  files titled with the dat_first date.

                      
                    }
                }
            }
            catch (Exception e)
            {
                // Let the user know what went wrong.
                Console.WriteLine("The file could not be read:");
                Console.WriteLine(e.Message);
            }
        
        }
    }
}

Open in new window

0
Comment
Question by:lee_jd
  • 2
3 Comments
 
LVL 9

Expert Comment

by:dexterrajesh
ID: 36568421
hi,

Instead you can do

string text = sr.ReadToEnd();

and then get the substrings based on the LastIndexOf() dates  instead iterating line by line...
0
 
LVL 13

Accepted Solution

by:
jonnidip earned 500 total points
ID: 36571997
I would approach your problem in this way:
- While reading your bigfile.txt (StreamReader and ReadLine() is correct for me)
- You read the date in the column
- And write (append) that line in an output file having that date in the name.

I think there is no need to keep 2 dates to compare, you only need the "actual" date in the line you are reading.
A sample of what I mean:
using (StreamReader sr = new StreamReader(@"d:\temp\test1.txt"))
{
    string line;
    while ((line = sr.ReadLine()) != null)
    {
        // Check if the line contains at least one separator:
        if (line.Contains("|"))
        {
            string[] words = line.Split('|');
            dateRead = DateTime.Parse(words[1]);

            System.IO.File.AppendAllText(String.Format(@"d:\temp\output_{0}.txt", dateRead.ToString("yyyyMMdd")), line);
        }
    }
}

Open in new window


Regards.
0
 
LVL 13

Expert Comment

by:jonnidip
ID: 36572032
Please note that you can really avoid splitting the line and parsing the value to DateTime, since what you need is the actual value contained in the column.
You can try this:
using (StreamReader sr = new StreamReader(@"d:\temp\test1.txt"))
{
    string line;
    while ((line = sr.ReadLine()) != null)
    {
        Int32 DateStart = line.IndexOf('|') + 1;
        Int32 DateEnd = line.IndexOf('|', DateStart);

        if (DateStart >= 0 && DateEnd >= 0)
            File.AppendAllText(String.Format(@"d:\temp\output_{0}.txt", line.Substring(DateStart, DateEnd - DateStart).Replace("/", "-")), line);
    }
}

Open in new window


...but it really depends on how the date is written in the file and how you want to write it in the output file...

Regards.
0

Featured Post

Active Directory Webinar

We all know we need to protect and secure our privileges, but where to start? Join Experts Exchange and ManageEngine on Tuesday, April 11, 2017 10:00 AM PDT to learn how to track and secure privileged users in Active Directory.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Summary: Persistence is the capability of an application to store the state of objects and recover it when necessary. This article compares the two common types of serialization in aspects of data access, readability, and runtime cost. A ready-to…
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
This video shows how to quickly and easily add an email signature for all users on Exchange 2016. The resulting signature is applied on a server level by Exchange Online. The email signature template has been downloaded from: www.mail-signatures…
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question