Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

parsing strings

Posted on 1999-08-02
26
Medium Priority
?
685 Views
Last Modified: 2010-04-30
I have a variable that holds the contents of a text file. I am trying to parse the string and remove certain chr's. Using:
  Mid(mystring, 1, 5)=""
does not work. I have gotten around this by using the left and right functions to copy the string, minus the unwanted stuff, to a new variable. Some of the files are fairly large and this seems to be a poor way to do it. Have I overlooked something or is this the only method to remove portions of a string?
0
Comment
Question by:jgv
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 7
  • 4
  • +2
26 Comments
 
LVL 7

Expert Comment

by:kamall
ID: 1529103
This is not the optimal way.
A very fast method is to use the InStr() function to find a search string in the main string. When found, cut the main string there, join the string again and restart the search. Its much faster.
If you have difficulties, leave me a comment.
Regards.
0
 
LVL 15

Expert Comment

by:ameba
ID: 1529104
VB string functions are slow (except few new, e.g. Split, Join, Replace, added in VB6).
You can use byte array instead of string. This can accelerate your string operations 3-5 times.

Private text() As Byte

    '--- Read the file
    FLen = LOF(1)
    ReDim text(1 To FLen) As Byte
    Get #1, , text()
    Close #1

    Dim tempchar As Byte
    For i = 1 To FLen
        tempchar = text(i)
        If tempchar = 13 Then
etc.
0
 
LVL 12

Author Comment

by:jgv
ID: 1529105
kamall,

I already use the instr function to locate items in the string. My problem is 'cutting' the string properly. The method I am using is to copy the 'Left' and 'Right' portions of the unwanted chr's to a new variable. This works but I am looking for a more efficient way to do it. I am removing quite a bit of info from some large files. If you have an alternative way of doing this I would appreciate an example.

ameba,

I wish I had VB6 but I am still using 5. I haven't considered a byte array and will look into it. I did try parsing the entire string one chr at a time but it took waaaaay too long.

Thanks for the response :)
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 10

Expert Comment

by:caraf_g
ID: 1529106
1 - Use Instr to find each subsequent occurrence of the offending character in the string
2 - Use Mid to extract the bit between the previous occurrence and the last occurrence
3 - Concatenate each extracted bit to a new string

Make sure it properly handles the beginning and end of a string.

Here's an example:
Private Sub Form_Load()

Dim strTest As String

strTest = "Freuden schöner Götterfunken, " & Chr(0) & _
          "Tochter aus Elysium, " & Chr(0) & _
          "wir betreten feuertrunken, " & Chr(0) & _
          "Himmlische, dein Heiligtum."

'The above string contains some chr(0) characters that _
 we don't like. Let's remove them.
Dim intPos1 As Integer
Dim intPos2 As Integer
Dim strWork As String
intPos1 = 0
intPos2 = InStr(intPos1 + 1, strTest, Chr(0))
Do While intPos2 > 0
    strWork = strWork & Mid(strTest, intPos1 + 1, intPos2 - intPos1 - 1)
    intPos1 = intPos2
    intPos2 = InStr(intPos1 + 1, strTest, Chr(0))
Loop

If intPos1 < Len(strTest) Then
    strWork = strWork & Right(strTest, Len(strTest) - intPos1)
End If

MsgBox strWork

End Sub

0
 
LVL 22

Expert Comment

by:ture
ID: 1529107
jgv,

If you have VB6, use REPLACE to search a string and replace a certain string with another.

This line will replace all occurences of "A" in the string with "", and show "BCDBCDBBCC".

  MsgBox Replace("ABCDABCDAABBCCAA", "A", "")

Ture Magnusson
Karlstad, Sweden
0
 
LVL 10

Expert Comment

by:caraf_g
ID: 1529108
See jgv's previous comment:
"I wish I had VB6 but I am still using 5"
:)
0
 
LVL 22

Expert Comment

by:ture
ID: 1529109
caraf_g,

  I missed that. Thanks for pointing it out to me.


jgv,

  In VB5, if Excel is installed, you can add a reference to the MS Excel object library and use:

  MsgBox Excel.WorksheetFunction.Substitute("ABCDABCDAABBCCAA", "A", "")


/Ture
0
 
LVL 10

Expert Comment

by:caraf_g
ID: 1529110
Ture,

A bit expensive if your application does not use excel, but a very good solution if it does! BTW, do you need to instantiate anything or can you use the function just like that?
0
 
LVL 22

Expert Comment

by:ture
ID: 1529111
Just like that.

/Ture
0
 
LVL 12

Author Comment

by:jgv
ID: 1529112
Thanks for the extra comments. I will have to get back to you in awhile. Off to work shortly.
0
 
LVL 10

Expert Comment

by:caraf_g
ID: 1529113
Ture - nice one!
0
 
LVL 7

Expert Comment

by:kamall
ID: 1529114
Well....., below is the code. See how fast it is!!!
Tested with 40Kb text file (FAQ.TXT file under the windows dir. - contains 40378 characters)

Test results:
Search string = Microsoft
# of occurences = 16
Time required = 0 sec.

Test results:
Search string = M
# of occurences = 191
Time required = 0.21875 sec.

Test results:
Search string = -
# of occurences = 375
Time required = 0.375 sec.

Test results:
Search string = n
# of occurences = 2033
Time required = 2.3125 sec.


***************************** CODE *****************************
'Requires one command button and one text box on your form

Private Sub Command1_Click()
   
    Dim MyStr As String
    Dim Counter As Long
   
    Open "c:\win95\faq.txt" For Input As #1
    MyStr = Input(LOF(1), #1)
    Close 1
    Text1 = MyStr
    DoEvents
    x = Timer
    Print "Start time=" & x
   
    SearchStr = "Microsoft"
    Do
    MyPos = InStr(MyStr, SearchStr)
    If MyPos <> 0 Then
    MyStr = Left$(MyStr, MyPos - 1) & Right$(MyStr, Len(MyStr) - MyPos - Len(SearchStr) + 1)
    Counter = Counter + 1
    End If
    Loop Until MyPos = 0
    Print "Time required=" & Timer - x
    Print "# of occurences=" & Counter
    Text1 = MyStr
    Print "Done."
   

End Sub

***************************************************************

Regards

0
 
LVL 15

Expert Comment

by:ameba
ID: 1529115
kamall,
Try a real problem:
File: "c:\vb5\winapi\win32api.txt"    ' 700K
    SearchStr = "Public"      ' 6315 occurences

In VB6 it is 0.9 s
MyStr = Replace(MyStr, "Public", "")

VB5 code in few minutes ...
0
 
LVL 15

Expert Comment

by:ameba
ID: 1529116
' Requires one command button and one text box on your form
' ??? 0.4 seconds, 2x faster than using vb6: MyStr=Replace(mystr,"Public","")
Option Explicit
Private Declare Function GetTickCount Lib "kernel32" () As Long

Private Sub Command1_Click()
    Dim x As Long, SearchStr As String, MyPos As Long
    Dim MyStr As String
    Dim Counter As Long
' read file
    Open "c:\vb5\winapi\win32api.txt" For Input As #1
    MyStr = Input(LOF(1), #1)
    Close 1
   
    Text1.Text = Left$(MyStr, 20000)
    Text1.Refresh
   
    ' start timer
    x = GetTickCount
    Print "Start time=" & x
    SearchStr = "Public"

' first pass - count number of occurences
    Dim start As Long
    start = 1
    Do
        MyPos = InStr(start, MyStr, SearchStr)
        If MyPos <> 0 Then
            Counter = Counter + 1
            start = MyPos + Len(SearchStr)
        Else
            Exit Do
        End If
    Loop

' create new string
    Dim sResult As String, newlen As Long
    newlen = Len(MyStr) - Counter * Len(SearchStr)
    sResult = Space(newlen)

' second pass - fill new string
    Counter = 0 ' reset counter
    start = 1
    Dim newstart As Long, LenSearchStr As Long
    LenSearchStr = Len(SearchStr)
    newstart = 1
    Do
        MyPos = InStr(start, MyStr, SearchStr)
        If MyPos > 0 Then
            Mid(sResult, newstart, MyPos - start) = Mid(MyStr, start, MyPos - start)
            Counter = Counter + 1
            start = MyPos + LenSearchStr
            newstart = start - Counter * LenSearchStr
            'If Counter Mod 100 = 1 Then Caption = start
        Else
            If Len(MyStr) > start + 1 Then
                Mid(sResult, newstart, Len(MyStr) - start + 1) = Mid(MyStr, start, Len(MyStr) - start + 1)
            End If
            Exit Do
        End If
    Loop
' report time
    Print "Time required=" & (GetTickCount - x) / 1000 & " seconds"
    Print "# of occurences=" & Counter
    Caption = Len(MyStr)
    Text1 = Left$(sResult, 20000)
    Print "Done."
End Sub
0
 
LVL 12

Author Comment

by:jgv
ID: 1529117
Wow, it took awhile to wade through all this! Here's the findings:

ameba,

Unbelievably fast! I never bothered to let any of the other examples finsh running through the win32api.txt, including my own (I was falling asleep). The points are your's with a little extra :)

kamall and caraf_g,

Your example's were similar to my own code except that you reassign the value of the string back to itself minus the 'offending character(s)'. I was assigning to a second string and then back to the original. Since your comments were actually more efficient I want to give you points as well.

ture,

I used the excel function and had mixed results. I ran it on the winapi file and an error was returned. When I ran it using the first 500 characters of the winapi file it worked. There must be a limit on the size of the string being searched. However, it is useful information for the future and is worth some points.

Thanks to everyone for the input :)
0
 
LVL 12

Author Comment

by:jgv
ID: 1529118
Adjusted points to 100
0
 
LVL 15

Accepted Solution

by:
ameba earned 400 total points
ID: 1529119
Thanks, scroll down to see optimized version, for reuse
0
 
LVL 15

Expert Comment

by:ameba
ID: 1529120
' ReplaceAll function sample
' can be 3x faster than using vb6: MyStr=Replace(mystr,"Public","")
' Created by: Bruno Paris, ameba @zg.tel.hr
' Date: 4 August 1999
' Free use permitted with attribution to source
Option Explicit
Private Declare Function GetTickCount Lib "kernel32" () As Long
Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (dest As Any, source As Any, ByVal numBytes As Long)

Private Sub Form_Click()
    Dim x As Long, MyPos As Long, Counter As Long
    Dim MyStr As String
' read file
    Open "c:\vb5\winapi\win32api.txt" For Input Access Read Lock Read Write As #1
    MyStr = Input(LOF(1), #1)
    Close 1
    ' start timer
    x = GetTickCount
'   replace "Public" with ""
    MyStr = ReplaceAll(MyStr, "Public", "", Counter)
    ' MyStr = Replace(MyStr, "Public", "")   ' vb6 Replace
' report time
    Print "Time required=" & (GetTickCount - x) / 1000 & " seconds"
    Print "# of occurences=" & Counter
End Sub

' Find and replace a substring within a string, case sensitive
Private Function ReplaceAll(s As String, sFind As String, sReplace As String, Optional retCount As Long) As String
    Dim LenS As Long, LenSFind As Long, LenSReplace As Long
    retCount = 0
    LenS = Len(s)
    LenSFind = Len(sFind)
    If LenS = 0 Or LenSFind = 0 Then Exit Function
    LenSReplace = Len(sReplace)
   
    Dim start As Long, MyPos As Long
    start = 1
' first pass - count number of occurences
    MyPos = InStr(start, s, sFind)
    While MyPos
        retCount = retCount + 1
        start = MyPos + LenSFind
        MyPos = InStr(start, s, sFind)
    Wend
' create new string
    ReplaceAll = Space(LenS - retCount * (LenSFind - LenSReplace))
' second pass - fill new string
    retCount = 0 ' reset counter
    start = 1
    Dim newstart As Long
    newstart = 1
    Dim StrPtrReplaceAll As Long, StrPtrS As Long, StrPtrsReplace As Long, Len2SReplace As Long
    Len2SReplace = LenSReplace * 2 ' length in bytes
    ' pointers
    StrPtrReplaceAll = StrPtr(ReplaceAll)
    StrPtrS = StrPtr(s)
    StrPtrsReplace = StrPtr(sReplace)
   
    MyPos = InStr(start, s, sFind)
    While MyPos
        ' without pointer arithmetics
        'Mid(ReplaceAll, newstart, MyPos - start) = Mid$(s, start, MyPos - start)
        'Mid(ReplaceAll, newstart + MyPos - start) = sReplace
       
        ' with pointer arithmetics
        ' Credits: Francesco Balena, http://www.vb2themax.com/
        CopyMemory ByVal StrPtrReplaceAll + 2 * (newstart - 1), ByVal StrPtrS + 2 * (start - 1), (MyPos - start) * 2
        CopyMemory ByVal StrPtrReplaceAll + 2 * (newstart + MyPos - start - 1), ByVal StrPtr(sReplace), Len2SReplace
       
        retCount = retCount + 1
        start = MyPos + LenSFind
        newstart = start - retCount * (LenSFind - LenSReplace)
        MyPos = InStr(start, s, sFind)
    Wend
    If Len(s) > start + 1 Then
        ' without pointer arithmetics
        'Mid(ReplaceAll, newstart, LenS - start + 1) = Mid$(s, start, LenS - start + 1)
        '
        CopyMemory ByVal StrPtrReplaceAll + 2 * (newstart - 1), ByVal StrPtrS + 2 * (start - 1), (LenS - start + 1) * 2
    End If
End Function

0
 
LVL 12

Author Comment

by:jgv
ID: 1529121
Thanks for V2.0 amemba :)
0
 
LVL 12

Author Comment

by:jgv
ID: 1529122
Excuse the typo, 'amemba', I'm not fully awake yet!
0
 
LVL 15

Expert Comment

by:ameba
ID: 1529123
:)
0
 
LVL 12

Author Comment

by:jgv
ID: 1529124
ameba,

Just to let you know...I've incorporated your code with a few minor changes. 49 files were read (2,200k total), condensed by over 50% and written to an mdb in under 12 sec's. Although I didn't benchmark my original code, the same operation took well over two minutes.

My compliments on some well written code :o)
0
 
LVL 15

Expert Comment

by:ameba
ID: 1529125
Thanks. I intend to use it in VB6 for big strings, case sensitive replace.

It is now a bit modified:
I use array to save instr positions found in first pass. The second pass does not have to call Instr again.

For vbTextCompare (case insensitive), it is much slower than VB6's Replace. I'm still studying it...
0
 
LVL 7

Expert Comment

by:kamall
ID: 1529126
jgv,
Thanks for the points you offered in your new question.
However, note that in your first comment you wrote:
"I already use the instr function to locate items in the string. My problem is 'cutting' the string properly."
and:
"I wish I had VB6 but I am still using 5."
Thats why I proposed you this answer which *cuts the string properly* and *works in VB5*.
I see now that you have rejected my answer which satisfied your requirements. right?
Regards.
0
 
LVL 12

Author Comment

by:jgv
ID: 1529127
ameba,

The array is a good idea. I've modified V1.0 to find and remove information contained between specified character's/tokens. Works well.

kamall,

If you re-read the comment you will notice that I also mentioned *efficient* and *alternative solutions*. ameba's solution was, by far, the most efficient and expedient way to do what I wanted. To be fair, I also gave points to everyone else that contributed valid suggestions to this question. I'm sorry if you don't agree with my decision.
0
 
LVL 7

Expert Comment

by:kamall
ID: 1529128
no problem man.
Thank you for the points anyway. You were really *fair*.
Good luck.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Have you ever wanted to restrict the users input in a textbox to numbers, and while doing that make sure that they can't 'cheat' by pasting in non-numeric text? Of course you can do that with code you write yourself but it's tedious and error-prone …
If you need to start windows update installation remotely or as a scheduled task you will find this very helpful.
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…
Suggested Courses

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question