Best Way to Parse a Text File...

Hello everyone,  
  I'm wondering what would be the BEST way to parse a file.  I've come up with 3 ideas.

1) Read File Line by line and parse each line specifically (INSTR, or Regular expressions)
    This just seems like it should be slow..

2) Read in whole file, and use INSTR or regular expressions to parse for data.
   Faster then #1..but I would think 3 would be the fastest.

3) Read in whole file, and use Regular Expressions to find, store, and then remove each match in the file
    saving the Temp File as the routine goes, So the next parse would be even faster since there's not
    as much log to search though.

I'm looking for suggestions, ideas,  and/or code for the fastest and most reliable way to parse a file.

The project that I'm designing would require the parse of MULTIPLE items in the file, and saving the
parsed data into a database, and to return unused (IE unknown) line items.
pogowolfAsked:
Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

x
 
farsightConnect With a Mentor Commented:
What I meant by "#1 (or something like it)" is that it would be much better to read large blocks (256K, maybe even 1M) at a time, rather than line-at-a-time.    This increases the complexity though, because you need to handle the lines that go across the block boundaries.
0
 
planoczCommented:
#3 is the best way...
do you have a sample of the input data to the  Regular Expressions code?
0
 
pogowolfAuthor Commented:
Sure do, here you are:

--Log START
[10-26-04 18_16_02] Opening chat log...
[10/26/04 18:16:55] Message of the day: Istarian Auctions have begun!  Join the 'auction' chat channel if you wish to see reports of who has won each plot.  Good luck to all the bidders!
[10/26/04 18:18:21] Duxx casts Raise Strength II.
[10/26/04 18:18:22] Duxx casts Enhance Health II.
[10/26/04 18:18:24] Duxx casts Enhance Armor II.
[10/26/04 18:18:25] Duxx casts Gift of Speed II.
[10/26/04 18:18:27] Duxx casts Gift of Health II.
[10/26/04 18:18:28] Duxx casts Raise Strength II.
[10/26/04 18:20:49] You begin casting Gift of Speed II on yourself.
[10/26/04 18:20:52] You cast Gift of Speed II on yourself.
[10/26/04 18:20:53] You begin casting Swift Feet II on yourself.
[10/26/04 18:20:57] You cast Swift Feet II on yourself.
[10/26/04 18:21:32] Ice Golem hits you with Icy Spray I for 44 damage.
[10/26/04 18:21:34] Ice Golem hits you with Hurl Chunk III for 252 damage.
[10/26/04 18:21:49] [MarketPlace: Kimbala] so whos looking for this jacquess dude
[10/26/04 18:22:07] [MarketPlace: Sscortha] anyone check east blight?
[10/26/04 18:22:20] [MarketPlace: Jerret] where is he
[10/26/04 18:22:46] [MarketPlace: Kimbala] what?
[10/26/04 18:22:46] [MarketPlace: PersonalJustic] [<!--LI 9630196 884448>Grey Necrofly Wing<!--/LI>] sweet
[10/26/04 18:22:49] [MarketPlace: Avispa] I bet he has something to do with abandon island - that island is full of resoiurces
[10/26/04 18:23:06] [MarketPlace: Maguai] great, about the greys.. where are the browns?!
[10/26/04 18:23:32] [MarketPlace: Ivy] Ok huge exploit available now... how do they do it?
[10/26/04 18:23:33] Stinging Cold III has faded.
[10/26/04 18:23:39] [MarketPlace: Kimbala] what?
[10/26/04 18:23:41] [MarketPlace: Ivy] How does AE always mess up something simple
[10/26/04 18:23:46] [MarketPlace: Avispa] yup - heading over there now to scout it again
[10/26/04 18:23:49] [MarketPlace: T`rekannor] so ne thing new?
[10/26/04 18:23:57] [MarketPlace: Kimbala] ivy what are you talking about
[10/26/04 18:23:58] Swift Feet II has faded.
[10/26/04 18:24:00] You use Sprint.
--Log END


As you can see, the line items have a pattern, and those patterns are what I would like to use as the Regular expressions patterns to pull the data from.

For example in this line from the log:
[10/26/04 18:23:57] [MarketPlace: Kimbala] ivy what are you talking about

I would need to pull the Date/Time Stamp, the channel name (MarketPlace)
the username (Kimbala) and the Message (ivy what are you talking about)

but for this line from the log:
[10/26/04 18:21:32] Ice Golem hits you with Icy Spray I for 44 damage.

I would also need to pull the date/Time stamp, the name of the monster (Ice Golem) the Ability (Icy Spray I) and the Damage (44)
0
[Webinar] Improve your customer journey

A positive customer journey is important in attracting and retaining business. To improve this experience, you can use Google Maps APIs to increase checkout conversions, boost user engagement, and optimize order fulfillment. Learn how in this webinar presented by Dito.

 
planoczCommented:
Here is a small sample of how you can use the Regular Expressions .....

    Public Function StreamReaderReadCharFile(ByVal sFileName As String)
        Dim myStreamReader As StreamReader
        Dim myLine As String
        Dim sTimeStamp As String

        Try
            ' Create a StreamReader using a Shared (static) File class.
            myStreamReader = File.OpenText(sFileName)

             myLine = myStreamReader.ReadToEnd
            TextBox1.Text = ParseBlocks(myLine)        '<----- I used textbox1 and an output to view code
         Catch exc As Exception
            MsgBox("File could not be opened or read." + vbCrLf + _
                "Please verify that the filename is correct, " + _
                "and that you have read permissions for the desired " + _
                "directory." + vbCrLf + vbCrLf + "Exception: " + exc.Message)
        Finally
            ' Close the object if it has been created.
            If Not myStreamReader Is Nothing Then
                myStreamReader.Close()
            End If
            StreamReaderReadCharFile = sData
        End Try
    End Function
    Public Function ParseBlocks(ByVal input As String)

        'Regular Expression:  ([\[])([\w\d\s:])+([\]])
        Dim pattern As String = "([\[])([\w\d\s:])+([\]])"
        Dim rx As Regex = New Regex(pattern, RegexOptions.Multiline)
        Dim sData As String
        Dim m As Match
        Dim rowCount As Integer = 1

        For Each m In rx.Matches(input)
            sData += rowCount.ToString() + ": " + _
                         m.ToString + vbCrLf
            rowCount += 1
        Next m
        Return sData
    End Function
0
 
pogowolfAuthor Commented:
Well that would be part of the issue.  I could modify the ParseBlocks function to accept the pattern as input.. hmm.. How could I return the Matches Object so that the main code could work with the data?

Also, this example doesn't take into account the want to delete a matched line...  Though I guess it might work to
stream out Non-matched patternes, and then reload the file for the next 'round' of searches...

Thoughts?
0
 
farsightCommented:
[Not an answer so much as a few extra tips.]

If you have huge files, it's quite possible that #1 (or something like it) would be faster.  Loading huge files into memory, especially if you have multiple copies of it (the Temp file) can take a lot of time depending on the amount of resources on your system.  If allocating the memory causes other memory to be swapped out to disk, you'll be hurtin'.

It wouldn't be too hard to write several implementations, and actually test to see which is faster for your specific cases.

P.S. For more speed, look into compiling your Regex patterns if and only if you can reuse a single pattern 100's of times.  A _very_ brief summary is here: http://www.regular-expressions.info/dotnet.html (Search that page for "compile".)
0
 
pogowolfAuthor Commented:
Good Point FarSight,
  I'll take a look into that.  I am worried about the amount of memory it's going to take, even if the adverage size of the log is only about 150k..  

Thanks for the link, I'll take a look at the site!
0
All Courses

From novice to tech pro — start learning today.