parsing data not working

this source its parsing data like this perfectly fine
start0end
start1end
start2end

now if I change the sample data I get 3 issues..

>app crashes
ex 1: start hello missing

>app crashes - needs to extract "hello" & "world"
ex2: start hello end start world end

>app crashes - needs to extract "hello"
ex3: blah blah start hello end blah blah
    Debug.Print(ParseData("start", "end"))

    Function ParseData(sStart As String, sStop As String)
        Dim sData As String = RichTextBox1.Text.Replace(vbLf, Environment.NewLine)
        Dim nNDX As Integer = 0
        Dim iCount As Integer = 0
        Dim i As Integer = 0
        Dim j As Integer = 0

        Do While (nNDX < sData.Length) AndAlso (i >= 0)
            i = sData.IndexOf(sStart, nNDX)
            j = sData.IndexOf(sStop, i)

            'Debug.Print(sData.Substring(i + sStart.Length, j - (i + sStart.Length)))

            nNDX = j + sStart.Length
            iCount += 1
        Loop

        Return iCount
    End Function

Open in new window

LVL 1
XK8ERAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

p_davisCommented:
how are you modifying the input to the method?
what are the desire results?
what are the errors you are getting? (either from runtime or from compiler)
0
XK8ERAuthor Commented:
I thought I was extremely clear in this questions and provided all the questions you already asked.. did you try the code?
0
Mike TomlinsonMiddle School Assistant TeacherCommented:
Give this a whirl...
    Function ParseData(sStart As String, sStop As String)
        Dim data As String
        Dim i, j As Integer
        Dim iCount As Integer
        Dim lines As New List(Of String)
        lines.AddRange(RichTextBox1.Text.Replace(vbLf, Environment.NewLine).Split(Environment.NewLine.ToCharArray, StringSplitOptions.RemoveEmptyEntries))
        For index As Integer = 0 To lines.Count - 1
            i = lines(index).IndexOf(sStart)
            While i <> -1
                j = lines(index).IndexOf(sStop, i + sStart.Length)
                If j <> -1 Then
                    If j > i + sStart.Length Then
                        data = lines(index).Substring(i + sStart.Length, j - (i + sStart.Length)).Trim
                        Debug.Print(index & ": " & data)
                        iCount = iCount + 1

                        i = lines(index).IndexOf(sStart, j + sStop.Length)
                    Else
                        ' ... data not long enough ...
                        i = -1
                    End If
                Else
                    ' ... no "stop" ...
                    i = -1
                End If
            End While
        Next
        Return iCount
    End Function

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

ArkCommented:
Imports System.Text.RegularExpressions

Open in new window

   Private sData As String

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim s As String = String.Join(vbLf, {"start0end", "start1end", "start2end"})
        sData = s
        Debug.Print("1 => " & ParseData("start", "end"))
        s = "start hello missing"
        sData = s
        Debug.Print("2 => " & ParseData("start", "end"))
        s = "start hello end start world end"
        sData = s
        Debug.Print("3 => " & ParseData("start", "end"))
        s = "blah blah start hello end blah blah"
        sData = s
        Debug.Print("4 => " & ParseData("start", "end"))
    End Sub

    Function ParseData(ByVal sStart As String, ByVal sStop As String)
        'Dim sData As String = RichTextBox1.Text.Replace(vbLf, Environment.NewLine)
        Dim r As New Regex(sStart & "(.*?)(" & sStop & "|$)", RegexOptions.Multiline Or RegexOptions.IgnoreCase)
        Dim matches = r.Matches(sData)
        Dim i As Integer = 1
        For Each m As Match In matches
            Debug.Print("    match #" & i & ": " & m.Groups(1).Value)
            i += 1
        Next
        Return matches.Count
    End Function

Open in new window

Output:
    match #1: 0
    match #2: 1
    match #3: 2
1 => 3
    match #1:  hello missing
2 => 1
    match #1:  hello
    match #2:  world
3 => 2
    match #1:  hello
4 => 1
0
XK8ERAuthor Commented:
Idle_Mind, amazing code as usual.. thanks so much!!
0
Mike TomlinsonMiddle School Assistant TeacherCommented:
I haven't tried Ark's code (I'm sure it works because he's awesome!)...but Regular Expressions are definitely meant for this kind of thing (I just suck at them).  They make code like this much more concise.
0
XK8ERAuthor Commented:
I tried both, your code took one second to process a 12 MB file and Ark's code took 3x longer..
0
ArkCommented:
Hi
Actually regex should be about as fast as plain code - it depends on correct pattern and string length. BTW, if speed is a goal, add StringComparison.Ordinal[IgnoreCase] to IndexOf operator (see http://ayende.com/blog/2930/regex-vs-string-indexof)

PS To speed up regex add RegexOptions.Compiled
0
XK8ERAuthor Commented:
for Idle_Mind's code
I change this

i = lines(iCount).IndexOf(sStart)
into
i = lines(iCount).IndexOf(sStart, StringComparison.OrdinalIgnoreCase)

Open in new window

for your code I changed this
Dim r As New Regex(sStart & "(.*?)(" & sStop & "|$)", RegexOptions.Multiline Or RegexOptions.IgnoreCase)
into this
Dim r As New Regex(sStart & "(.*?)(" & sStop & "|$)", RegexOptions.Multiline Or RegexOptions.IgnoreCase Or RegexOptions.Compiled)

Open in new window

I get about the same speed as before.

1091664
Mins: 0 - Secs: 1 - Milli:1623.7614
1091664
Mins: 0 - Secs: 4 - Milli:4449.9922
0
ArkCommented:
Again, efficiency depends on circumstances. For example, Idle_Mind's code skip 'no "stop"' lines while mine retrive the rest of string from "start" mark to the end. And my test case show nearly equal results:
Imports System.Text.RegularExpressions

Public Class Form1
    Private sData As String

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        Button1.Enabled = False
        PrepareTest()
        Button1.Enabled = True
    End Sub

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim dt As Date, count As Integer
        dt = DateTime.Now
        count = ParseDataIdle_Mind("start", "end")
        Dim s1, s2 As String
        s1 = "Idle_Mind: count = " & count & "; time = " & (DateTime.Now - dt).TotalSeconds & " sec."
        dt = DateTime.Now
        count = ParseDataArk("start", "end")
        s2 = "Ark: count = " & count & "; time = " & (DateTime.Now - dt).TotalSeconds & " sec."
        MsgBox(s1 & vbCrLf & s2)
    End Sub

    Private Sub PrepareTest()
        Dim sb As New System.Text.StringBuilder
        For i = 0 To 100000
            Dim line = String.Format("dummy{0}search text {1}dummy", "start", "end")
            sb.AppendLine(line)
        Next
        sData = sb.ToString
    End Sub

    Function ParseDataIdle_Mind(ByVal sStart As String, ByVal sStop As String)
        Dim data As String
        Dim i, j As Integer
        Dim iCount As Integer
        Dim lines As New List(Of String)
        lines.AddRange(sData.Replace(vbLf, Environment.NewLine).Split(Environment.NewLine.ToCharArray, StringSplitOptions.RemoveEmptyEntries))
        For index As Integer = 0 To lines.Count - 1
            i = lines(index).IndexOf(sStart, StringComparison.OrdinalIgnoreCase)
            While i <> -1
                j = lines(index).IndexOf(sStop, i + sStart.Length, StringComparison.OrdinalIgnoreCase)
                If j <> -1 Then
                    If j > i + sStart.Length Then
                        data = lines(index).Substring(i + sStart.Length, j - (i + sStart.Length)).Trim
                        'Debug.Print(index & ": " & data)
                        iCount = iCount + 1

                        i = lines(index).IndexOf(sStart, j + sStop.Length, StringComparison.OrdinalIgnoreCase)
                    Else
                        ' ... data not long enough ...
                        i = -1
                    End If
                Else
                    ' ... no "stop" ...
                    i = -1
                End If
            End While
        Next
        Return iCount
    End Function

    Function ParseDataArk(ByVal sStart As String, ByVal sStop As String)
        'Dim pattern As String = sStart & "(.*?)(" & sStop & "|$)"
        Dim pattern As String = sStart & "(.*?)" & sStop
        Dim r As New Regex(pattern,
                           RegexOptions.Multiline Or
                           RegexOptions.IgnoreCase Or
                           RegexOptions.Compiled)
        Dim matches = r.Matches(sData)
        Dim data As String, i As Integer = 1
        For Each m As Match In matches
            data = m.Groups(1).Value
            'Debug.Print("    match #" & i & ": " & data)
            i += 1
        Next
        Return matches.Count
    End Function

End Class

Open in new window

BTW, interesting thing - removing OptionalIgnoreCase from IndexOf code makes it 3 times slower. Removing RegexOptions.IgnoreCase make it 20% faster!

Anyway, regex IS NOT intended for faster search. The real advantage of regular expressions is the ability to express a complex search in a simple manner. You can easy modify search pattern for much complex search - in this case it would be definitely faster then IndexOf

Regards
Ark
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Visual Basic.NET

From novice to tech pro — start learning today.