jgv
asked on
parsing strings
I have a variable that holds the contents of a text file. I am trying to parse the string and remove certain chr's. Using:
Mid(mystring, 1, 5)=""
does not work. I have gotten around this by using the left and right functions to copy the string, minus the unwanted stuff, to a new variable. Some of the files are fairly large and this seems to be a poor way to do it. Have I overlooked something or is this the only method to remove portions of a string?
Mid(mystring, 1, 5)=""
does not work. I have gotten around this by using the left and right functions to copy the string, minus the unwanted stuff, to a new variable. Some of the files are fairly large and this seems to be a poor way to do it. Have I overlooked something or is this the only method to remove portions of a string?
VB string functions are slow (except few new, e.g. Split, Join, Replace, added in VB6).
You can use byte array instead of string. This can accelerate your string operations 3-5 times.
Private text() As Byte
'--- Read the file
FLen = LOF(1)
ReDim text(1 To FLen) As Byte
Get #1, , text()
Close #1
Dim tempchar As Byte
For i = 1 To FLen
tempchar = text(i)
If tempchar = 13 Then
etc.
You can use byte array instead of string. This can accelerate your string operations 3-5 times.
Private text() As Byte
'--- Read the file
FLen = LOF(1)
ReDim text(1 To FLen) As Byte
Get #1, , text()
Close #1
Dim tempchar As Byte
For i = 1 To FLen
tempchar = text(i)
If tempchar = 13 Then
etc.
ASKER
kamall,
I already use the instr function to locate items in the string. My problem is 'cutting' the string properly. The method I am using is to copy the 'Left' and 'Right' portions of the unwanted chr's to a new variable. This works but I am looking for a more efficient way to do it. I am removing quite a bit of info from some large files. If you have an alternative way of doing this I would appreciate an example.
ameba,
I wish I had VB6 but I am still using 5. I haven't considered a byte array and will look into it. I did try parsing the entire string one chr at a time but it took waaaaay too long.
Thanks for the response :)
I already use the instr function to locate items in the string. My problem is 'cutting' the string properly. The method I am using is to copy the 'Left' and 'Right' portions of the unwanted chr's to a new variable. This works but I am looking for a more efficient way to do it. I am removing quite a bit of info from some large files. If you have an alternative way of doing this I would appreciate an example.
ameba,
I wish I had VB6 but I am still using 5. I haven't considered a byte array and will look into it. I did try parsing the entire string one chr at a time but it took waaaaay too long.
Thanks for the response :)
1 - Use Instr to find each subsequent occurrence of the offending character in the string
2 - Use Mid to extract the bit between the previous occurrence and the last occurrence
3 - Concatenate each extracted bit to a new string
Make sure it properly handles the beginning and end of a string.
Here's an example:
Private Sub Form_Load()
Dim strTest As String
strTest = "Freuden schöner Götterfunken, " & Chr(0) & _
"Tochter aus Elysium, " & Chr(0) & _
"wir betreten feuertrunken, " & Chr(0) & _
"Himmlische, dein Heiligtum."
'The above string contains some chr(0) characters that _
we don't like. Let's remove them.
Dim intPos1 As Integer
Dim intPos2 As Integer
Dim strWork As String
intPos1 = 0
intPos2 = InStr(intPos1 + 1, strTest, Chr(0))
Do While intPos2 > 0
strWork = strWork & Mid(strTest, intPos1 + 1, intPos2 - intPos1 - 1)
intPos1 = intPos2
intPos2 = InStr(intPos1 + 1, strTest, Chr(0))
Loop
If intPos1 < Len(strTest) Then
strWork = strWork & Right(strTest, Len(strTest) - intPos1)
End If
MsgBox strWork
End Sub
2 - Use Mid to extract the bit between the previous occurrence and the last occurrence
3 - Concatenate each extracted bit to a new string
Make sure it properly handles the beginning and end of a string.
Here's an example:
Private Sub Form_Load()
Dim strTest As String
strTest = "Freuden schöner Götterfunken, " & Chr(0) & _
"Tochter aus Elysium, " & Chr(0) & _
"wir betreten feuertrunken, " & Chr(0) & _
"Himmlische, dein Heiligtum."
'The above string contains some chr(0) characters that _
we don't like. Let's remove them.
Dim intPos1 As Integer
Dim intPos2 As Integer
Dim strWork As String
intPos1 = 0
intPos2 = InStr(intPos1 + 1, strTest, Chr(0))
Do While intPos2 > 0
strWork = strWork & Mid(strTest, intPos1 + 1, intPos2 - intPos1 - 1)
intPos1 = intPos2
intPos2 = InStr(intPos1 + 1, strTest, Chr(0))
Loop
If intPos1 < Len(strTest) Then
strWork = strWork & Right(strTest, Len(strTest) - intPos1)
End If
MsgBox strWork
End Sub
jgv,
If you have VB6, use REPLACE to search a string and replace a certain string with another.
This line will replace all occurences of "A" in the string with "", and show "BCDBCDBBCC".
MsgBox Replace("ABCDABCDAABBCCAA" , "A", "")
Ture Magnusson
Karlstad, Sweden
If you have VB6, use REPLACE to search a string and replace a certain string with another.
This line will replace all occurences of "A" in the string with "", and show "BCDBCDBBCC".
MsgBox Replace("ABCDABCDAABBCCAA"
Ture Magnusson
Karlstad, Sweden
See jgv's previous comment:
"I wish I had VB6 but I am still using 5"
:)
"I wish I had VB6 but I am still using 5"
:)
caraf_g,
I missed that. Thanks for pointing it out to me.
jgv,
In VB5, if Excel is installed, you can add a reference to the MS Excel object library and use:
MsgBox Excel.WorksheetFunction.Su bstitute(" ABCDABCDAA BBCCAA", "A", "")
/Ture
I missed that. Thanks for pointing it out to me.
jgv,
In VB5, if Excel is installed, you can add a reference to the MS Excel object library and use:
MsgBox Excel.WorksheetFunction.Su
/Ture
Ture,
A bit expensive if your application does not use excel, but a very good solution if it does! BTW, do you need to instantiate anything or can you use the function just like that?
A bit expensive if your application does not use excel, but a very good solution if it does! BTW, do you need to instantiate anything or can you use the function just like that?
Just like that.
/Ture
/Ture
ASKER
Thanks for the extra comments. I will have to get back to you in awhile. Off to work shortly.
Ture - nice one!
Well....., below is the code. See how fast it is!!!
Tested with 40Kb text file (FAQ.TXT file under the windows dir. - contains 40378 characters)
Test results:
Search string = Microsoft
# of occurences = 16
Time required = 0 sec.
Test results:
Search string = M
# of occurences = 191
Time required = 0.21875 sec.
Test results:
Search string = -
# of occurences = 375
Time required = 0.375 sec.
Test results:
Search string = n
# of occurences = 2033
Time required = 2.3125 sec.
************************** *** CODE ************************** ***
'Requires one command button and one text box on your form
Private Sub Command1_Click()
Dim MyStr As String
Dim Counter As Long
Open "c:\win95\faq.txt" For Input As #1
MyStr = Input(LOF(1), #1)
Close 1
Text1 = MyStr
DoEvents
x = Timer
Print "Start time=" & x
SearchStr = "Microsoft"
Do
MyPos = InStr(MyStr, SearchStr)
If MyPos <> 0 Then
MyStr = Left$(MyStr, MyPos - 1) & Right$(MyStr, Len(MyStr) - MyPos - Len(SearchStr) + 1)
Counter = Counter + 1
End If
Loop Until MyPos = 0
Print "Time required=" & Timer - x
Print "# of occurences=" & Counter
Text1 = MyStr
Print "Done."
End Sub
************************** ********** ********** ********** *******
Regards
Tested with 40Kb text file (FAQ.TXT file under the windows dir. - contains 40378 characters)
Test results:
Search string = Microsoft
# of occurences = 16
Time required = 0 sec.
Test results:
Search string = M
# of occurences = 191
Time required = 0.21875 sec.
Test results:
Search string = -
# of occurences = 375
Time required = 0.375 sec.
Test results:
Search string = n
# of occurences = 2033
Time required = 2.3125 sec.
**************************
'Requires one command button and one text box on your form
Private Sub Command1_Click()
Dim MyStr As String
Dim Counter As Long
Open "c:\win95\faq.txt" For Input As #1
MyStr = Input(LOF(1), #1)
Close 1
Text1 = MyStr
DoEvents
x = Timer
Print "Start time=" & x
SearchStr = "Microsoft"
Do
MyPos = InStr(MyStr, SearchStr)
If MyPos <> 0 Then
MyStr = Left$(MyStr, MyPos - 1) & Right$(MyStr, Len(MyStr) - MyPos - Len(SearchStr) + 1)
Counter = Counter + 1
End If
Loop Until MyPos = 0
Print "Time required=" & Timer - x
Print "# of occurences=" & Counter
Text1 = MyStr
Print "Done."
End Sub
**************************
Regards
kamall,
Try a real problem:
File: "c:\vb5\winapi\win32api.tx t" ' 700K
SearchStr = "Public" ' 6315 occurences
In VB6 it is 0.9 s
MyStr = Replace(MyStr, "Public", "")
VB5 code in few minutes ...
Try a real problem:
File: "c:\vb5\winapi\win32api.tx
SearchStr = "Public" ' 6315 occurences
In VB6 it is 0.9 s
MyStr = Replace(MyStr, "Public", "")
VB5 code in few minutes ...
' Requires one command button and one text box on your form
' ??? 0.4 seconds, 2x faster than using vb6: MyStr=Replace(mystr,"Publi c","")
Option Explicit
Private Declare Function GetTickCount Lib "kernel32" () As Long
Private Sub Command1_Click()
Dim x As Long, SearchStr As String, MyPos As Long
Dim MyStr As String
Dim Counter As Long
' read file
Open "c:\vb5\winapi\win32api.tx t" For Input As #1
MyStr = Input(LOF(1), #1)
Close 1
Text1.Text = Left$(MyStr, 20000)
Text1.Refresh
' start timer
x = GetTickCount
Print "Start time=" & x
SearchStr = "Public"
' first pass - count number of occurences
Dim start As Long
start = 1
Do
MyPos = InStr(start, MyStr, SearchStr)
If MyPos <> 0 Then
Counter = Counter + 1
start = MyPos + Len(SearchStr)
Else
Exit Do
End If
Loop
' create new string
Dim sResult As String, newlen As Long
newlen = Len(MyStr) - Counter * Len(SearchStr)
sResult = Space(newlen)
' second pass - fill new string
Counter = 0 ' reset counter
start = 1
Dim newstart As Long, LenSearchStr As Long
LenSearchStr = Len(SearchStr)
newstart = 1
Do
MyPos = InStr(start, MyStr, SearchStr)
If MyPos > 0 Then
Mid(sResult, newstart, MyPos - start) = Mid(MyStr, start, MyPos - start)
Counter = Counter + 1
start = MyPos + LenSearchStr
newstart = start - Counter * LenSearchStr
'If Counter Mod 100 = 1 Then Caption = start
Else
If Len(MyStr) > start + 1 Then
Mid(sResult, newstart, Len(MyStr) - start + 1) = Mid(MyStr, start, Len(MyStr) - start + 1)
End If
Exit Do
End If
Loop
' report time
Print "Time required=" & (GetTickCount - x) / 1000 & " seconds"
Print "# of occurences=" & Counter
Caption = Len(MyStr)
Text1 = Left$(sResult, 20000)
Print "Done."
End Sub
' ??? 0.4 seconds, 2x faster than using vb6: MyStr=Replace(mystr,"Publi
Option Explicit
Private Declare Function GetTickCount Lib "kernel32" () As Long
Private Sub Command1_Click()
Dim x As Long, SearchStr As String, MyPos As Long
Dim MyStr As String
Dim Counter As Long
' read file
Open "c:\vb5\winapi\win32api.tx
MyStr = Input(LOF(1), #1)
Close 1
Text1.Text = Left$(MyStr, 20000)
Text1.Refresh
' start timer
x = GetTickCount
Print "Start time=" & x
SearchStr = "Public"
' first pass - count number of occurences
Dim start As Long
start = 1
Do
MyPos = InStr(start, MyStr, SearchStr)
If MyPos <> 0 Then
Counter = Counter + 1
start = MyPos + Len(SearchStr)
Else
Exit Do
End If
Loop
' create new string
Dim sResult As String, newlen As Long
newlen = Len(MyStr) - Counter * Len(SearchStr)
sResult = Space(newlen)
' second pass - fill new string
Counter = 0 ' reset counter
start = 1
Dim newstart As Long, LenSearchStr As Long
LenSearchStr = Len(SearchStr)
newstart = 1
Do
MyPos = InStr(start, MyStr, SearchStr)
If MyPos > 0 Then
Mid(sResult, newstart, MyPos - start) = Mid(MyStr, start, MyPos - start)
Counter = Counter + 1
start = MyPos + LenSearchStr
newstart = start - Counter * LenSearchStr
'If Counter Mod 100 = 1 Then Caption = start
Else
If Len(MyStr) > start + 1 Then
Mid(sResult, newstart, Len(MyStr) - start + 1) = Mid(MyStr, start, Len(MyStr) - start + 1)
End If
Exit Do
End If
Loop
' report time
Print "Time required=" & (GetTickCount - x) / 1000 & " seconds"
Print "# of occurences=" & Counter
Caption = Len(MyStr)
Text1 = Left$(sResult, 20000)
Print "Done."
End Sub
ASKER
Wow, it took awhile to wade through all this! Here's the findings:
ameba,
Unbelievably fast! I never bothered to let any of the other examples finsh running through the win32api.txt, including my own (I was falling asleep). The points are your's with a little extra :)
kamall and caraf_g,
Your example's were similar to my own code except that you reassign the value of the string back to itself minus the 'offending character(s)'. I was assigning to a second string and then back to the original. Since your comments were actually more efficient I want to give you points as well.
ture,
I used the excel function and had mixed results. I ran it on the winapi file and an error was returned. When I ran it using the first 500 characters of the winapi file it worked. There must be a limit on the size of the string being searched. However, it is useful information for the future and is worth some points.
Thanks to everyone for the input :)
ameba,
Unbelievably fast! I never bothered to let any of the other examples finsh running through the win32api.txt, including my own (I was falling asleep). The points are your's with a little extra :)
kamall and caraf_g,
Your example's were similar to my own code except that you reassign the value of the string back to itself minus the 'offending character(s)'. I was assigning to a second string and then back to the original. Since your comments were actually more efficient I want to give you points as well.
ture,
I used the excel function and had mixed results. I ran it on the winapi file and an error was returned. When I ran it using the first 500 characters of the winapi file it worked. There must be a limit on the size of the string being searched. However, it is useful information for the future and is worth some points.
Thanks to everyone for the input :)
ASKER
Adjusted points to 100
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
' ReplaceAll function sample
' can be 3x faster than using vb6: MyStr=Replace(mystr,"Publi c","")
' Created by: Bruno Paris, ameba @zg.tel.hr
' Date: 4 August 1999
' Free use permitted with attribution to source
Option Explicit
Private Declare Function GetTickCount Lib "kernel32" () As Long
Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (dest As Any, source As Any, ByVal numBytes As Long)
Private Sub Form_Click()
Dim x As Long, MyPos As Long, Counter As Long
Dim MyStr As String
' read file
Open "c:\vb5\winapi\win32api.tx t" For Input Access Read Lock Read Write As #1
MyStr = Input(LOF(1), #1)
Close 1
' start timer
x = GetTickCount
' replace "Public" with ""
MyStr = ReplaceAll(MyStr, "Public", "", Counter)
' MyStr = Replace(MyStr, "Public", "") ' vb6 Replace
' report time
Print "Time required=" & (GetTickCount - x) / 1000 & " seconds"
Print "# of occurences=" & Counter
End Sub
' Find and replace a substring within a string, case sensitive
Private Function ReplaceAll(s As String, sFind As String, sReplace As String, Optional retCount As Long) As String
Dim LenS As Long, LenSFind As Long, LenSReplace As Long
retCount = 0
LenS = Len(s)
LenSFind = Len(sFind)
If LenS = 0 Or LenSFind = 0 Then Exit Function
LenSReplace = Len(sReplace)
Dim start As Long, MyPos As Long
start = 1
' first pass - count number of occurences
MyPos = InStr(start, s, sFind)
While MyPos
retCount = retCount + 1
start = MyPos + LenSFind
MyPos = InStr(start, s, sFind)
Wend
' create new string
ReplaceAll = Space(LenS - retCount * (LenSFind - LenSReplace))
' second pass - fill new string
retCount = 0 ' reset counter
start = 1
Dim newstart As Long
newstart = 1
Dim StrPtrReplaceAll As Long, StrPtrS As Long, StrPtrsReplace As Long, Len2SReplace As Long
Len2SReplace = LenSReplace * 2 ' length in bytes
' pointers
StrPtrReplaceAll = StrPtr(ReplaceAll)
StrPtrS = StrPtr(s)
StrPtrsReplace = StrPtr(sReplace)
MyPos = InStr(start, s, sFind)
While MyPos
' without pointer arithmetics
'Mid(ReplaceAll, newstart, MyPos - start) = Mid$(s, start, MyPos - start)
'Mid(ReplaceAll, newstart + MyPos - start) = sReplace
' with pointer arithmetics
' Credits: Francesco Balena, http://www.vb2themax.com/
CopyMemory ByVal StrPtrReplaceAll + 2 * (newstart - 1), ByVal StrPtrS + 2 * (start - 1), (MyPos - start) * 2
CopyMemory ByVal StrPtrReplaceAll + 2 * (newstart + MyPos - start - 1), ByVal StrPtr(sReplace), Len2SReplace
retCount = retCount + 1
start = MyPos + LenSFind
newstart = start - retCount * (LenSFind - LenSReplace)
MyPos = InStr(start, s, sFind)
Wend
If Len(s) > start + 1 Then
' without pointer arithmetics
'Mid(ReplaceAll, newstart, LenS - start + 1) = Mid$(s, start, LenS - start + 1)
'
CopyMemory ByVal StrPtrReplaceAll + 2 * (newstart - 1), ByVal StrPtrS + 2 * (start - 1), (LenS - start + 1) * 2
End If
End Function
' can be 3x faster than using vb6: MyStr=Replace(mystr,"Publi
' Created by: Bruno Paris, ameba @zg.tel.hr
' Date: 4 August 1999
' Free use permitted with attribution to source
Option Explicit
Private Declare Function GetTickCount Lib "kernel32" () As Long
Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (dest As Any, source As Any, ByVal numBytes As Long)
Private Sub Form_Click()
Dim x As Long, MyPos As Long, Counter As Long
Dim MyStr As String
' read file
Open "c:\vb5\winapi\win32api.tx
MyStr = Input(LOF(1), #1)
Close 1
' start timer
x = GetTickCount
' replace "Public" with ""
MyStr = ReplaceAll(MyStr, "Public", "", Counter)
' MyStr = Replace(MyStr, "Public", "") ' vb6 Replace
' report time
Print "Time required=" & (GetTickCount - x) / 1000 & " seconds"
Print "# of occurences=" & Counter
End Sub
' Find and replace a substring within a string, case sensitive
Private Function ReplaceAll(s As String, sFind As String, sReplace As String, Optional retCount As Long) As String
Dim LenS As Long, LenSFind As Long, LenSReplace As Long
retCount = 0
LenS = Len(s)
LenSFind = Len(sFind)
If LenS = 0 Or LenSFind = 0 Then Exit Function
LenSReplace = Len(sReplace)
Dim start As Long, MyPos As Long
start = 1
' first pass - count number of occurences
MyPos = InStr(start, s, sFind)
While MyPos
retCount = retCount + 1
start = MyPos + LenSFind
MyPos = InStr(start, s, sFind)
Wend
' create new string
ReplaceAll = Space(LenS - retCount * (LenSFind - LenSReplace))
' second pass - fill new string
retCount = 0 ' reset counter
start = 1
Dim newstart As Long
newstart = 1
Dim StrPtrReplaceAll As Long, StrPtrS As Long, StrPtrsReplace As Long, Len2SReplace As Long
Len2SReplace = LenSReplace * 2 ' length in bytes
' pointers
StrPtrReplaceAll = StrPtr(ReplaceAll)
StrPtrS = StrPtr(s)
StrPtrsReplace = StrPtr(sReplace)
MyPos = InStr(start, s, sFind)
While MyPos
' without pointer arithmetics
'Mid(ReplaceAll, newstart, MyPos - start) = Mid$(s, start, MyPos - start)
'Mid(ReplaceAll, newstart + MyPos - start) = sReplace
' with pointer arithmetics
' Credits: Francesco Balena, http://www.vb2themax.com/
CopyMemory ByVal StrPtrReplaceAll + 2 * (newstart - 1), ByVal StrPtrS + 2 * (start - 1), (MyPos - start) * 2
CopyMemory ByVal StrPtrReplaceAll + 2 * (newstart + MyPos - start - 1), ByVal StrPtr(sReplace), Len2SReplace
retCount = retCount + 1
start = MyPos + LenSFind
newstart = start - retCount * (LenSFind - LenSReplace)
MyPos = InStr(start, s, sFind)
Wend
If Len(s) > start + 1 Then
' without pointer arithmetics
'Mid(ReplaceAll, newstart, LenS - start + 1) = Mid$(s, start, LenS - start + 1)
'
CopyMemory ByVal StrPtrReplaceAll + 2 * (newstart - 1), ByVal StrPtrS + 2 * (start - 1), (LenS - start + 1) * 2
End If
End Function
ASKER
Thanks for V2.0 amemba :)
ASKER
Excuse the typo, 'amemba', I'm not fully awake yet!
:)
ASKER
ameba,
Just to let you know...I've incorporated your code with a few minor changes. 49 files were read (2,200k total), condensed by over 50% and written to an mdb in under 12 sec's. Although I didn't benchmark my original code, the same operation took well over two minutes.
My compliments on some well written code :o)
Just to let you know...I've incorporated your code with a few minor changes. 49 files were read (2,200k total), condensed by over 50% and written to an mdb in under 12 sec's. Although I didn't benchmark my original code, the same operation took well over two minutes.
My compliments on some well written code :o)
Thanks. I intend to use it in VB6 for big strings, case sensitive replace.
It is now a bit modified:
I use array to save instr positions found in first pass. The second pass does not have to call Instr again.
For vbTextCompare (case insensitive), it is much slower than VB6's Replace. I'm still studying it...
It is now a bit modified:
I use array to save instr positions found in first pass. The second pass does not have to call Instr again.
For vbTextCompare (case insensitive), it is much slower than VB6's Replace. I'm still studying it...
jgv,
Thanks for the points you offered in your new question.
However, note that in your first comment you wrote:
"I already use the instr function to locate items in the string. My problem is 'cutting' the string properly."
and:
"I wish I had VB6 but I am still using 5."
Thats why I proposed you this answer which *cuts the string properly* and *works in VB5*.
I see now that you have rejected my answer which satisfied your requirements. right?
Regards.
Thanks for the points you offered in your new question.
However, note that in your first comment you wrote:
"I already use the instr function to locate items in the string. My problem is 'cutting' the string properly."
and:
"I wish I had VB6 but I am still using 5."
Thats why I proposed you this answer which *cuts the string properly* and *works in VB5*.
I see now that you have rejected my answer which satisfied your requirements. right?
Regards.
ASKER
ameba,
The array is a good idea. I've modified V1.0 to find and remove information contained between specified character's/tokens. Works well.
kamall,
If you re-read the comment you will notice that I also mentioned *efficient* and *alternative solutions*. ameba's solution was, by far, the most efficient and expedient way to do what I wanted. To be fair, I also gave points to everyone else that contributed valid suggestions to this question. I'm sorry if you don't agree with my decision.
The array is a good idea. I've modified V1.0 to find and remove information contained between specified character's/tokens. Works well.
kamall,
If you re-read the comment you will notice that I also mentioned *efficient* and *alternative solutions*. ameba's solution was, by far, the most efficient and expedient way to do what I wanted. To be fair, I also gave points to everyone else that contributed valid suggestions to this question. I'm sorry if you don't agree with my decision.
no problem man.
Thank you for the points anyway. You were really *fair*.
Good luck.
Thank you for the points anyway. You were really *fair*.
Good luck.
A very fast method is to use the InStr() function to find a search string in the main string. When found, cut the main string there, join the string again and restart the search. Its much faster.
If you have difficulties, leave me a comment.
Regards.