• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 261
  • Last Modified:

Mid$ anomaly in VB5

In parsing a long string of HTML code,  the Mid$ function does something odd when seeing a comma.  

x$= ">SMITH       , JOHN            <"

Mid$(x$,2,30) returns "SMITHJOHN           "

If I replace the comma with a period it properly returns,

 "SMITH       . JOHN            "

This intial position in the document is 40,000 characters from beginning.  When I test on just the small snippet above,  it works properly with comma in field.

Actual coding is:
        pos1 = InStr(pos, HTMLtext, "<table ")
        pos2 = InStr(pos1, HTMLtext, "</table>")
        pos2 = pos2 + 8
        CrewInfoHTML = Mid(HTMLtext, pos1, pos2 - pos1)
0
drleewood
Asked:
drleewood
  • 5
  • 4
  • 2
  • +2
2 Solutions
 
chaauCommented:
I think you need to use this code:
        pos1 = InStr(pos, HTMLtext, "<table ")
        pos1 = pos1 + 8
        pos2 = InStr(pos1, HTMLtext, "</table>")
        CrewInfoHTML = Mid(HTMLtext, pos1, pos2 - pos1) 

Open in new window

0
 
aikimarkCommented:
please post an instance of the HTML you are parsing and post the VB statements you are using to get the HTML
0
 
Martin LissOlder than dirtCommented:
There's no problem with your code in VB6 but if the comma is a problem in VB5 then you can do something like this.

x$ = ">SMITH       , JOHN            <"
x$ = Replace(x$, ",", "|")
x$ = Replace(x$, "|", ",")

Open in new window


I just remembered that VB5 doesn't have VB6's Replace function so you can add this clone that I found on DevX.

' A clone of the VB6's Replace function for use under VB5

Function Replace(Source As String, Find As String, ReplaceStr As String, _
    Optional ByVal Start As Long = 1, Optional Count As Long = -1, _
    Optional Compare As VbCompareMethod = vbBinaryCompare) As String

    Dim findLen As Long
    Dim replaceLen As Long
    Dim index As Long
    Dim counter As Long
    
    findLen = Len(Find)
    replaceLen = Len(ReplaceStr)
    ' this prevents an endless loop
    If findLen = 0 Then Err.Raise 5
    
    If Start < 1 Then Start = 1
    index = Start
    
    ' let's start by assigning the source to the result
    Replace = Source
    
    ' if Find and ReplaceStr strings have same length, it is possible to
    ' use an optimized algorithm, based on the Mid$ command
    Do
        index = InStr(index, Replace, Find, Compare)
        If index = 0 Then Exit Do
        If findLen = replaceLen Then
            ' if the find and replace strings have same length
            ' we can use the faster Mid$ command
            Mid$(Replace, index, findLen) = ReplaceStr
        Else
            ' else we must use concatenation
            Replace = Left$(Replace, index - 1) & ReplaceStr & Mid$(Replace, _
                index + findLen)
        End If
        ' skip over the string just added
        index = index + replaceLen
        ' increment the replacement counter
        counter = counter + 1
        ' Note that the Loop Until test will always fail if Count = -1
    Loop Until counter = Count
    
    ' The next operation serves to keep complete compatibility with
    ' VB6's Replace function. You can delete it if you prefer.
    If Start > 1 Then Replace = Mid$(Replace, Start)

End Function

Open in new window

0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
I fear that the issue is with the length of 40K ...
in VB5, the default data type for integer was +-32000, so you may try to use a bigger data type than integer for the start/length ...
0
 
drleewoodAuthor Commented:
1. pos, pos1 and po2 are all declared as Long
2. I cut the document to a much smaller size with same result.
3. Would prefer not to find and replace " , " if it can be avoided.
4. HTML document code is attached.  The position of "<Table " is 46,407.
Pairing-Information.htm
0
 
aikimarkCommented:
I think the problem is in the file format, rather than the Mid() or Instr() functions.
0
 
aikimarkCommented:
Since this file is unicode, you need to use InStrB() for your string searching.  Likewise, you will probably need to use MidB().
0
 
drleewoodAuthor Commented:
I believe the problem is getting all the characters from the document into the variable.  I am using this code:

Open "c:\Pairing Information.htm" For Input As #f
Do While Not EOF(f)
    Input #f, A$
    HTMLdoc = HTMLdoc & A$
Loop
Close #f

I have found the resulting HTMLdoc does not have the those commas.  How do I input the data from the document file so it is accurate?
0
 
drleewoodAuthor Commented:
I have found using:

Line Input #f, A$

instead of:

Input #f, A$

preserves the commas.  Any better ideas as it may have some other effects?
0
 
aikimarkCommented:
Here is an example of code that reads the unicode HTML file you posted and parses using the InstrB() and MidB() functions.
Option Explicit

Public Sub Q_28386246()
    Dim oFS As Object
    Dim oTS As Object
    Dim HTMLtext As String, CrewInfoHTML As String
    Dim pos1 As Long, pos2 As Long
    Set oFS = CreateObject("scripting.filesystemobject")
    Set oTS = oFS.opentextfile("C:\Users\Mark\Downloads\Pairing-Information.htm", 1, 1)
    HTMLtext = oTS.readall
    oTS.Close
    HTMLtext = StrConv(HTMLtext, vbFromUnicode)
    pos1 = 1
    pos1 = InStrB(pos1, HTMLtext, "<table ")
    pos2 = InStrB(pos1, HTMLtext, "</table>")
    pos2 = pos2 + 8
    CrewInfoHTML = MidB(HTMLtext, pos1, pos2 - pos1)
    Debug.Print Len(HTMLtext)
    Debug.Print CrewInfoHTML
End Sub

Open in new window

0
 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
Line Input:
http://msdn.microsoft.com/en-us/library/aa243392%28v=vs.60%29.aspx
vs Input:
http://msdn.microsoft.com/en-us/library/aa243386%28v=vs.60%29.aspx

visibly, the input statement does things which I didn't know (remember), about "parsing" the data, compared to the way the data was written, and indeed it may "skip" the comma.
so, please switch to the Line Input, the only important point to note there is that the carriage return / line feed characters are NOT put to the variable. if you need them, you need to do:
 HTMLdoc = HTMLdoc & A$ & vbCrlf
0
 
drleewoodAuthor Commented:
Thank you all for your input.  I am afraid the problem was not in the mid$ function but rather in the reading of the HTML code after it was saved to a file.  The project actually reads the HTML source code directly from the browser control and may eliminate anomaly of the comma.  The reading of the file was for testing purposes only.  The Line Input seems to have solved that problem.  I am going to close this issue for now.  Thank you.
0
 
aikimarkCommented:
What about my code?
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 5
  • 4
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now