Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win


Best Way to Parse a Text File...

Posted on 2004-11-01
Medium Priority
Last Modified: 2010-04-23
Hello everyone,  
  I'm wondering what would be the BEST way to parse a file.  I've come up with 3 ideas.

1) Read File Line by line and parse each line specifically (INSTR, or Regular expressions)
    This just seems like it should be slow..

2) Read in whole file, and use INSTR or regular expressions to parse for data.
   Faster then #1..but I would think 3 would be the fastest.

3) Read in whole file, and use Regular Expressions to find, store, and then remove each match in the file
    saving the Temp File as the routine goes, So the next parse would be even faster since there's not
    as much log to search though.

I'm looking for suggestions, ideas,  and/or code for the fastest and most reliable way to parse a file.

The project that I'm designing would require the parse of MULTIPLE items in the file, and saving the
parsed data into a database, and to return unused (IE unknown) line items.
Question by:pogowolf
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
LVL 27

Expert Comment

ID: 12465536
#3 is the best way...
do you have a sample of the input data to the  Regular Expressions code?

Author Comment

ID: 12465689
Sure do, here you are:

[10-26-04 18_16_02] Opening chat log...
[10/26/04 18:16:55] Message of the day: Istarian Auctions have begun!  Join the 'auction' chat channel if you wish to see reports of who has won each plot.  Good luck to all the bidders!
[10/26/04 18:18:21] Duxx casts Raise Strength II.
[10/26/04 18:18:22] Duxx casts Enhance Health II.
[10/26/04 18:18:24] Duxx casts Enhance Armor II.
[10/26/04 18:18:25] Duxx casts Gift of Speed II.
[10/26/04 18:18:27] Duxx casts Gift of Health II.
[10/26/04 18:18:28] Duxx casts Raise Strength II.
[10/26/04 18:20:49] You begin casting Gift of Speed II on yourself.
[10/26/04 18:20:52] You cast Gift of Speed II on yourself.
[10/26/04 18:20:53] You begin casting Swift Feet II on yourself.
[10/26/04 18:20:57] You cast Swift Feet II on yourself.
[10/26/04 18:21:32] Ice Golem hits you with Icy Spray I for 44 damage.
[10/26/04 18:21:34] Ice Golem hits you with Hurl Chunk III for 252 damage.
[10/26/04 18:21:49] [MarketPlace: Kimbala] so whos looking for this jacquess dude
[10/26/04 18:22:07] [MarketPlace: Sscortha] anyone check east blight?
[10/26/04 18:22:20] [MarketPlace: Jerret] where is he
[10/26/04 18:22:46] [MarketPlace: Kimbala] what?
[10/26/04 18:22:46] [MarketPlace: PersonalJustic] [<!--LI 9630196 884448>Grey Necrofly Wing<!--/LI>] sweet
[10/26/04 18:22:49] [MarketPlace: Avispa] I bet he has something to do with abandon island - that island is full of resoiurces
[10/26/04 18:23:06] [MarketPlace: Maguai] great, about the greys.. where are the browns?!
[10/26/04 18:23:32] [MarketPlace: Ivy] Ok huge exploit available now... how do they do it?
[10/26/04 18:23:33] Stinging Cold III has faded.
[10/26/04 18:23:39] [MarketPlace: Kimbala] what?
[10/26/04 18:23:41] [MarketPlace: Ivy] How does AE always mess up something simple
[10/26/04 18:23:46] [MarketPlace: Avispa] yup - heading over there now to scout it again
[10/26/04 18:23:49] [MarketPlace: T`rekannor] so ne thing new?
[10/26/04 18:23:57] [MarketPlace: Kimbala] ivy what are you talking about
[10/26/04 18:23:58] Swift Feet II has faded.
[10/26/04 18:24:00] You use Sprint.
--Log END

As you can see, the line items have a pattern, and those patterns are what I would like to use as the Regular expressions patterns to pull the data from.

For example in this line from the log:
[10/26/04 18:23:57] [MarketPlace: Kimbala] ivy what are you talking about

I would need to pull the Date/Time Stamp, the channel name (MarketPlace)
the username (Kimbala) and the Message (ivy what are you talking about)

but for this line from the log:
[10/26/04 18:21:32] Ice Golem hits you with Icy Spray I for 44 damage.

I would also need to pull the date/Time stamp, the name of the monster (Ice Golem) the Ability (Icy Spray I) and the Damage (44)
LVL 27

Expert Comment

ID: 12467117
Here is a small sample of how you can use the Regular Expressions .....

    Public Function StreamReaderReadCharFile(ByVal sFileName As String)
        Dim myStreamReader As StreamReader
        Dim myLine As String
        Dim sTimeStamp As String

            ' Create a StreamReader using a Shared (static) File class.
            myStreamReader = File.OpenText(sFileName)

             myLine = myStreamReader.ReadToEnd
            TextBox1.Text = ParseBlocks(myLine)        '<----- I used textbox1 and an output to view code
         Catch exc As Exception
            MsgBox("File could not be opened or read." + vbCrLf + _
                "Please verify that the filename is correct, " + _
                "and that you have read permissions for the desired " + _
                "directory." + vbCrLf + vbCrLf + "Exception: " + exc.Message)
            ' Close the object if it has been created.
            If Not myStreamReader Is Nothing Then
            End If
            StreamReaderReadCharFile = sData
        End Try
    End Function
    Public Function ParseBlocks(ByVal input As String)

        'Regular Expression:  ([\[])([\w\d\s:])+([\]])
        Dim pattern As String = "([\[])([\w\d\s:])+([\]])"
        Dim rx As Regex = New Regex(pattern, RegexOptions.Multiline)
        Dim sData As String
        Dim m As Match
        Dim rowCount As Integer = 1

        For Each m In rx.Matches(input)
            sData += rowCount.ToString() + ": " + _
                         m.ToString + vbCrLf
            rowCount += 1
        Next m
        Return sData
    End Function
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!


Author Comment

ID: 12467518
Well that would be part of the issue.  I could modify the ParseBlocks function to accept the pattern as input.. hmm.. How could I return the Matches Object so that the main code could work with the data?

Also, this example doesn't take into account the want to delete a matched line...  Though I guess it might work to
stream out Non-matched patternes, and then reload the file for the next 'round' of searches...

LVL 12

Expert Comment

ID: 12500602
[Not an answer so much as a few extra tips.]

If you have huge files, it's quite possible that #1 (or something like it) would be faster.  Loading huge files into memory, especially if you have multiple copies of it (the Temp file) can take a lot of time depending on the amount of resources on your system.  If allocating the memory causes other memory to be swapped out to disk, you'll be hurtin'.

It wouldn't be too hard to write several implementations, and actually test to see which is faster for your specific cases.

P.S. For more speed, look into compiling your Regex patterns if and only if you can reuse a single pattern 100's of times.  A _very_ brief summary is here: http://www.regular-expressions.info/dotnet.html (Search that page for "compile".)
LVL 12

Accepted Solution

farsight earned 375 total points
ID: 12500625
What I meant by "#1 (or something like it)" is that it would be much better to read large blocks (256K, maybe even 1M) at a time, rather than line-at-a-time.    This increases the complexity though, because you need to handle the lines that go across the block boundaries.

Author Comment

ID: 12504151
Good Point FarSight,
  I'll take a look into that.  I am worried about the amount of memory it's going to take, even if the adverage size of the log is only about 150k..  

Thanks for the link, I'll take a look at the site!

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: Kraeven
Introduction Remote Share is a simple remote sharing tool, enabling you to see, add and remove remote or local shares. The application is written in VB.NET targeting the .NET framework 2.0. The source code and the compiled programs have been in…
Since .Net 2.0, Visual Basic has made it easy to create a splash screen and set it via the "Splash Screen" drop down in the Project Properties.  A splash screen set in this manner is automatically created, displayed and closed by the framework itsel…
This tutorial will teach you the special effect of super speed similar to the fictional character Wally West aka "The Flash" After Shake : http://www.videocopilot.net/presets/after_shake/ All lightning effects with instructions : http://www.mediaf…
This lesson discusses how to use a Mainform + Subforms in Microsoft Access to find and enter data for payments on orders. The sample data comes from a custom shop that builds and sells movable storage structures that are delivered to your property. …

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question