Link to home
Start Free TrialLog in
Avatar of jgv
jgv

asked on

parsing strings

I have a variable that holds the contents of a text file. I am trying to parse the string and remove certain chr's. Using:
  Mid(mystring, 1, 5)=""
does not work. I have gotten around this by using the left and right functions to copy the string, minus the unwanted stuff, to a new variable. Some of the files are fairly large and this seems to be a poor way to do it. Have I overlooked something or is this the only method to remove portions of a string?
Avatar of Dr. Kamal Mehdi
Dr. Kamal Mehdi
Flag of Greece image

This is not the optimal way.
A very fast method is to use the InStr() function to find a search string in the main string. When found, cut the main string there, join the string again and restart the search. Its much faster.
If you have difficulties, leave me a comment.
Regards.
VB string functions are slow (except few new, e.g. Split, Join, Replace, added in VB6).
You can use byte array instead of string. This can accelerate your string operations 3-5 times.

Private text() As Byte

    '--- Read the file
    FLen = LOF(1)
    ReDim text(1 To FLen) As Byte
    Get #1, , text()
    Close #1

    Dim tempchar As Byte
    For i = 1 To FLen
        tempchar = text(i)
        If tempchar = 13 Then
etc.
Avatar of jgv
jgv

ASKER

kamall,

I already use the instr function to locate items in the string. My problem is 'cutting' the string properly. The method I am using is to copy the 'Left' and 'Right' portions of the unwanted chr's to a new variable. This works but I am looking for a more efficient way to do it. I am removing quite a bit of info from some large files. If you have an alternative way of doing this I would appreciate an example.

ameba,

I wish I had VB6 but I am still using 5. I haven't considered a byte array and will look into it. I did try parsing the entire string one chr at a time but it took waaaaay too long.

Thanks for the response :)
1 - Use Instr to find each subsequent occurrence of the offending character in the string
2 - Use Mid to extract the bit between the previous occurrence and the last occurrence
3 - Concatenate each extracted bit to a new string

Make sure it properly handles the beginning and end of a string.

Here's an example:
Private Sub Form_Load()

Dim strTest As String

strTest = "Freuden schöner Götterfunken, " & Chr(0) & _
          "Tochter aus Elysium, " & Chr(0) & _
          "wir betreten feuertrunken, " & Chr(0) & _
          "Himmlische, dein Heiligtum."

'The above string contains some chr(0) characters that _
 we don't like. Let's remove them.
Dim intPos1 As Integer
Dim intPos2 As Integer
Dim strWork As String
intPos1 = 0
intPos2 = InStr(intPos1 + 1, strTest, Chr(0))
Do While intPos2 > 0
    strWork = strWork & Mid(strTest, intPos1 + 1, intPos2 - intPos1 - 1)
    intPos1 = intPos2
    intPos2 = InStr(intPos1 + 1, strTest, Chr(0))
Loop

If intPos1 < Len(strTest) Then
    strWork = strWork & Right(strTest, Len(strTest) - intPos1)
End If

MsgBox strWork

End Sub

jgv,

If you have VB6, use REPLACE to search a string and replace a certain string with another.

This line will replace all occurences of "A" in the string with "", and show "BCDBCDBBCC".

  MsgBox Replace("ABCDABCDAABBCCAA", "A", "")

Ture Magnusson
Karlstad, Sweden
See jgv's previous comment:
"I wish I had VB6 but I am still using 5"
:)
caraf_g,

  I missed that. Thanks for pointing it out to me.


jgv,

  In VB5, if Excel is installed, you can add a reference to the MS Excel object library and use:

  MsgBox Excel.WorksheetFunction.Substitute("ABCDABCDAABBCCAA", "A", "")


/Ture
Ture,

A bit expensive if your application does not use excel, but a very good solution if it does! BTW, do you need to instantiate anything or can you use the function just like that?
Just like that.

/Ture
Avatar of jgv

ASKER

Thanks for the extra comments. I will have to get back to you in awhile. Off to work shortly.
Ture - nice one!
Well....., below is the code. See how fast it is!!!
Tested with 40Kb text file (FAQ.TXT file under the windows dir. - contains 40378 characters)

Test results:
Search string = Microsoft
# of occurences = 16
Time required = 0 sec.

Test results:
Search string = M
# of occurences = 191
Time required = 0.21875 sec.

Test results:
Search string = -
# of occurences = 375
Time required = 0.375 sec.

Test results:
Search string = n
# of occurences = 2033
Time required = 2.3125 sec.


***************************** CODE *****************************
'Requires one command button and one text box on your form

Private Sub Command1_Click()
   
    Dim MyStr As String
    Dim Counter As Long
   
    Open "c:\win95\faq.txt" For Input As #1
    MyStr = Input(LOF(1), #1)
    Close 1
    Text1 = MyStr
    DoEvents
    x = Timer
    Print "Start time=" & x
   
    SearchStr = "Microsoft"
    Do
    MyPos = InStr(MyStr, SearchStr)
    If MyPos <> 0 Then
    MyStr = Left$(MyStr, MyPos - 1) & Right$(MyStr, Len(MyStr) - MyPos - Len(SearchStr) + 1)
    Counter = Counter + 1
    End If
    Loop Until MyPos = 0
    Print "Time required=" & Timer - x
    Print "# of occurences=" & Counter
    Text1 = MyStr
    Print "Done."
   

End Sub

***************************************************************

Regards

kamall,
Try a real problem:
File: "c:\vb5\winapi\win32api.txt"    ' 700K
    SearchStr = "Public"      ' 6315 occurences

In VB6 it is 0.9 s
MyStr = Replace(MyStr, "Public", "")

VB5 code in few minutes ...
' Requires one command button and one text box on your form
' ??? 0.4 seconds, 2x faster than using vb6: MyStr=Replace(mystr,"Public","")
Option Explicit
Private Declare Function GetTickCount Lib "kernel32" () As Long

Private Sub Command1_Click()
    Dim x As Long, SearchStr As String, MyPos As Long
    Dim MyStr As String
    Dim Counter As Long
' read file
    Open "c:\vb5\winapi\win32api.txt" For Input As #1
    MyStr = Input(LOF(1), #1)
    Close 1
   
    Text1.Text = Left$(MyStr, 20000)
    Text1.Refresh
   
    ' start timer
    x = GetTickCount
    Print "Start time=" & x
    SearchStr = "Public"

' first pass - count number of occurences
    Dim start As Long
    start = 1
    Do
        MyPos = InStr(start, MyStr, SearchStr)
        If MyPos <> 0 Then
            Counter = Counter + 1
            start = MyPos + Len(SearchStr)
        Else
            Exit Do
        End If
    Loop

' create new string
    Dim sResult As String, newlen As Long
    newlen = Len(MyStr) - Counter * Len(SearchStr)
    sResult = Space(newlen)

' second pass - fill new string
    Counter = 0 ' reset counter
    start = 1
    Dim newstart As Long, LenSearchStr As Long
    LenSearchStr = Len(SearchStr)
    newstart = 1
    Do
        MyPos = InStr(start, MyStr, SearchStr)
        If MyPos > 0 Then
            Mid(sResult, newstart, MyPos - start) = Mid(MyStr, start, MyPos - start)
            Counter = Counter + 1
            start = MyPos + LenSearchStr
            newstart = start - Counter * LenSearchStr
            'If Counter Mod 100 = 1 Then Caption = start
        Else
            If Len(MyStr) > start + 1 Then
                Mid(sResult, newstart, Len(MyStr) - start + 1) = Mid(MyStr, start, Len(MyStr) - start + 1)
            End If
            Exit Do
        End If
    Loop
' report time
    Print "Time required=" & (GetTickCount - x) / 1000 & " seconds"
    Print "# of occurences=" & Counter
    Caption = Len(MyStr)
    Text1 = Left$(sResult, 20000)
    Print "Done."
End Sub
Avatar of jgv

ASKER

Wow, it took awhile to wade through all this! Here's the findings:

ameba,

Unbelievably fast! I never bothered to let any of the other examples finsh running through the win32api.txt, including my own (I was falling asleep). The points are your's with a little extra :)

kamall and caraf_g,

Your example's were similar to my own code except that you reassign the value of the string back to itself minus the 'offending character(s)'. I was assigning to a second string and then back to the original. Since your comments were actually more efficient I want to give you points as well.

ture,

I used the excel function and had mixed results. I ran it on the winapi file and an error was returned. When I ran it using the first 500 characters of the winapi file it worked. There must be a limit on the size of the string being searched. However, it is useful information for the future and is worth some points.

Thanks to everyone for the input :)
Avatar of jgv

ASKER

Adjusted points to 100
ASKER CERTIFIED SOLUTION
Avatar of ameba
ameba
Flag of Croatia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
' ReplaceAll function sample
' can be 3x faster than using vb6: MyStr=Replace(mystr,"Public","")
' Created by: Bruno Paris, ameba @zg.tel.hr
' Date: 4 August 1999
' Free use permitted with attribution to source
Option Explicit
Private Declare Function GetTickCount Lib "kernel32" () As Long
Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (dest As Any, source As Any, ByVal numBytes As Long)

Private Sub Form_Click()
    Dim x As Long, MyPos As Long, Counter As Long
    Dim MyStr As String
' read file
    Open "c:\vb5\winapi\win32api.txt" For Input Access Read Lock Read Write As #1
    MyStr = Input(LOF(1), #1)
    Close 1
    ' start timer
    x = GetTickCount
'   replace "Public" with ""
    MyStr = ReplaceAll(MyStr, "Public", "", Counter)
    ' MyStr = Replace(MyStr, "Public", "")   ' vb6 Replace
' report time
    Print "Time required=" & (GetTickCount - x) / 1000 & " seconds"
    Print "# of occurences=" & Counter
End Sub

' Find and replace a substring within a string, case sensitive
Private Function ReplaceAll(s As String, sFind As String, sReplace As String, Optional retCount As Long) As String
    Dim LenS As Long, LenSFind As Long, LenSReplace As Long
    retCount = 0
    LenS = Len(s)
    LenSFind = Len(sFind)
    If LenS = 0 Or LenSFind = 0 Then Exit Function
    LenSReplace = Len(sReplace)
   
    Dim start As Long, MyPos As Long
    start = 1
' first pass - count number of occurences
    MyPos = InStr(start, s, sFind)
    While MyPos
        retCount = retCount + 1
        start = MyPos + LenSFind
        MyPos = InStr(start, s, sFind)
    Wend
' create new string
    ReplaceAll = Space(LenS - retCount * (LenSFind - LenSReplace))
' second pass - fill new string
    retCount = 0 ' reset counter
    start = 1
    Dim newstart As Long
    newstart = 1
    Dim StrPtrReplaceAll As Long, StrPtrS As Long, StrPtrsReplace As Long, Len2SReplace As Long
    Len2SReplace = LenSReplace * 2 ' length in bytes
    ' pointers
    StrPtrReplaceAll = StrPtr(ReplaceAll)
    StrPtrS = StrPtr(s)
    StrPtrsReplace = StrPtr(sReplace)
   
    MyPos = InStr(start, s, sFind)
    While MyPos
        ' without pointer arithmetics
        'Mid(ReplaceAll, newstart, MyPos - start) = Mid$(s, start, MyPos - start)
        'Mid(ReplaceAll, newstart + MyPos - start) = sReplace
       
        ' with pointer arithmetics
        ' Credits: Francesco Balena, http://www.vb2themax.com/
        CopyMemory ByVal StrPtrReplaceAll + 2 * (newstart - 1), ByVal StrPtrS + 2 * (start - 1), (MyPos - start) * 2
        CopyMemory ByVal StrPtrReplaceAll + 2 * (newstart + MyPos - start - 1), ByVal StrPtr(sReplace), Len2SReplace
       
        retCount = retCount + 1
        start = MyPos + LenSFind
        newstart = start - retCount * (LenSFind - LenSReplace)
        MyPos = InStr(start, s, sFind)
    Wend
    If Len(s) > start + 1 Then
        ' without pointer arithmetics
        'Mid(ReplaceAll, newstart, LenS - start + 1) = Mid$(s, start, LenS - start + 1)
        '
        CopyMemory ByVal StrPtrReplaceAll + 2 * (newstart - 1), ByVal StrPtrS + 2 * (start - 1), (LenS - start + 1) * 2
    End If
End Function

Avatar of jgv

ASKER

Thanks for V2.0 amemba :)
Avatar of jgv

ASKER

Excuse the typo, 'amemba', I'm not fully awake yet!
:)
Avatar of jgv

ASKER

ameba,

Just to let you know...I've incorporated your code with a few minor changes. 49 files were read (2,200k total), condensed by over 50% and written to an mdb in under 12 sec's. Although I didn't benchmark my original code, the same operation took well over two minutes.

My compliments on some well written code :o)
Thanks. I intend to use it in VB6 for big strings, case sensitive replace.

It is now a bit modified:
I use array to save instr positions found in first pass. The second pass does not have to call Instr again.

For vbTextCompare (case insensitive), it is much slower than VB6's Replace. I'm still studying it...
jgv,
Thanks for the points you offered in your new question.
However, note that in your first comment you wrote:
"I already use the instr function to locate items in the string. My problem is 'cutting' the string properly."
and:
"I wish I had VB6 but I am still using 5."
Thats why I proposed you this answer which *cuts the string properly* and *works in VB5*.
I see now that you have rejected my answer which satisfied your requirements. right?
Regards.
Avatar of jgv

ASKER

ameba,

The array is a good idea. I've modified V1.0 to find and remove information contained between specified character's/tokens. Works well.

kamall,

If you re-read the comment you will notice that I also mentioned *efficient* and *alternative solutions*. ameba's solution was, by far, the most efficient and expedient way to do what I wanted. To be fair, I also gave points to everyone else that contributed valid suggestions to this question. I'm sorry if you don't agree with my decision.
no problem man.
Thank you for the points anyway. You were really *fair*.
Good luck.