handling/parsing LARGE text files in c#

Posted on 2006-03-26
Last Modified: 2010-05-19

What is the best way to handle/parse large text files in c# (i.e. 500+ MB). Here's what the process would look like; there are large text files comming in, and going out, 24 hours a day 7 days a week. These files (i.e. EDI files, format is X12) have to be parsed properly and passed on to next process. File sizes vary but they are large. So, I was thinking, which way should I go...

I thought using unsafe technique in c#, also thought about using asyncronious StremReader/StreamWriter, thought about using both of these, and I still cannot deside. Any advise is highly appreciated.

Question by:davidlars99
    LVL 13

    Author Comment

    ohh almost forgot... for parsing and enparsing, there will be a custom "Regex" engine developed.

    LVL 13

    Author Comment

    article on this website says that changing strings using unsafe code is a bad aidea

    In the .NET Framework, strings are immutable, meaning that once created, they cannot be changed (some counter this assertion stating that strings can be changed using unsafe code and direct memory access, but doing so is a very bad idea). As such, instead of a String, the first parameter in the P/Invoke declaration is a mutable StringBuilder, marshaled as an unmanaged LPWSTR. You also need to ensure that StringBuilder is large enough to hold the path of the folder, which on a Win32 system is at most 260 characters. The resulting C# method looks like the following:
    LVL 5

    Accepted Solution

    can open file in FileStream and use binary mode - dynamic direct access, good for large file.

    you have to workout many functions related to bytes reading and translating, as well as encoding type (System.Text.Encoding).

    LVL 4

    Assisted Solution

    by:Joni Kettunen
    LVL 13

    Author Comment


    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Top 6 Sources for Identifying Threat Actor TTPs

    Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

    Introduction Although it is an old technology, serial ports are still being used by many hardware manufacturers. If you develop applications in C#, Microsoft .NET framework has SerialPort class to communicate with the serial ports.  I needed to…
    Summary: Persistence is the capability of an application to store the state of objects and recover it when necessary. This article compares the two common types of serialization in aspects of data access, readability, and runtime cost. A ready-to…
    Migrating to Microsoft Office 365 is becoming increasingly popular for organizations both large and small. If you have made the leap to Microsoft’s cloud platform, you know that you will need to create a corporate email signature for your Office 365…
    how to add IIS SMTP to handle application/Scanner relays into office 365.

    760 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    12 Experts available now in Live!

    Get 1:1 Help Now