Hard question about string search

Posted on 2000-04-06
Last Modified: 2010-05-02
I was just wondering if anyone knows how the browsers strips out the values of tags and such...
I want to know how to take a tag like this.
<img src="hello.jpg" border=0 width="2" height=34 align="texttop" alt=hey>

a tag like this and get its values, this must be very difficult cause the tags can be like border=0 and border="0"
how the heck does this work?
Is it possible to make a function that works like this, call StripTags(richtextbox.text,"<img src,border,width,align,height,alt,output) and output contains their values separated by "," . This is to hard to do in vb right?
Question by:Geo24
LVL 28

Expert Comment

ID: 2690839
you may get ideas from these two samples.  One strips all the hyperlinks out of the document and the other strips all the tags out of the document.


Expert Comment

ID: 2690972
you will need to think methodically. Think to yourself how you would go about doing it manually, then automate that. Use commands such as Right, Left, Mid and InStr. You might want to keep a progress bar in mind, because VB's string manipulation is sssslllloooowww!
If you could make a function to return the specified Tag or word in the string, then that would be a great start.

I'll have a look for you, see if i can work out some functions.
LVL 28

Expert Comment

ID: 2691026
the way i would approach it is create an array of strings that holds all the possible tags.  then i would load the html source into a richtextbox and use its find method and perform a loop that finds the tags and go from there using string manipulation and such
Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.


Expert Comment

ID: 2691072
It's very simpe, if you are using HTML object model, not string parsing. Here is a small example :

Dim ob As Variant
For Each ob In WebBrowser1.document.All
    If ob.tagName = "IMG" Then
        Debug.Print "Src=" & ob.src & " Border=" & ob.border & " Width=" & ob.Width
    End If

WebBrowser1 is a WebBrowser :-)) or may be created

set WebBrowser1 = CreateObject("internetexplorer.application")
WebBrowser1.Navigate "www....."
WebBrowser1.visible = True


Expert Comment

ID: 2691382
try something like this

sStr = "<img src=""hello.jpg"" border=0 width=""2"" height=34 align=""texttop"" alt=hey>"

vparts = Split(sStr, " ")

for p = LBound(vparts) to UBound(vparts)
   if Instr(vparts(p), "=") then
      if Right(vparts(p), 1) = """" then
         sTemp = Right(vparts(p), len(vparts(p)) - Instr(vparts(p), "=")) & ","
         sResults = sResults & Mid(sTemp, 2, len(sTemp) - 2)
         sResults = sResults & Right(vparts(p), len(vparts(p)) - Instr(vparts(p), "=")) & ","
      end if
   end if
next p
Debug.Print sResults

That should give you what you want without having to create a webbrowser object.

Accepted Solution

ATM earned 100 total points
ID: 2691788
hey here is cool code which can parse your string ... not only when You use double quotes, also when there is single quote ... but what will You to do if your ALT parameter like:
ALT=mama miya bambarabiya kerkudu
This code also can doing that, create new form add Text1, Command1, List1, List2 and copy paste code ...

Dim TagParamIndex(2, 9) As String

Private Sub Command1_Click()
Dim Tagz() As String
Dim TagCount As Integer
Dim StartPos As Long
Dim StopPos As Long
Dim DefText As String
Dim TagArrayPos As Long
Dim ParamArrayPos As Long
Dim defTag As String
Dim defParam As String
Dim defValue As String
Dim bStillSeekForChar As Boolean


DefText = ""

StartPos = InStr(1, Text1.Text, "<")
If StartPos > 0 Then
   StopPos = InStr(StartPos, Text1.Text, ">")
   If StopPos > StartPos Then
      DefText = Trim(Mid(Text1.Text, StartPos + 1, StopPos - StartPos - 1))
   End If
End If

If DefText = "" Then
   MsgBox "Can't find <>"
   Exit Sub
End If

TagArrayPos = 0
ParamArrayPos = 0

Text1.Text = DefText

'find end of tag name
StopPos = InStr(1, DefText, Chr(32))
If StopPos > 1 Then
defTag = Mid(DefText, 1, StopPos - 1)
MsgBox defTag
Do While TagParamIndex(TagArrayPos, 0) <> Chr(0)
     If UCase(defTag) = TagParamIndex(TagArrayPos, 0) Then
       'here tag found
        If StopPos < Len(DefText) - 1 Then
             DefText = Right(DefText, Len(DefText) - StopPos)
            Do While DefText <> ""
             Text1.Text = DefText
             StartPos = InStr(1, DefText, "=")
             'ok param name found
             If StartPos > 1 Then
                defParam = Trim(Mid(DefText, 1, StartPos - 1))
                ParamArrayPos = 1
                'determine parameter type
                Do While TagParamIndex(TagArrayPos, ParamArrayPos) <> Chr(0)
                     If UCase(defParam) = TagParamIndex(TagArrayPos, ParamArrayPos) Then
                        'ok its found, add to list
                          List1.AddItem defParam
                          DefText = Right(DefText, Len(DefText) - StartPos)
                          Text1.Text = DefText
                          'get param value
                          bStillSeekForChar = True
                          'skip one word back
                          StartPos = InStr(StartPos, DefText, "=")
                          If StartPos > 0 Then
                             StopPos = StartPos - 1
                             Do While StopPos > 0
                                 If Mid(DefText, StopPos, 1) <> Chr(32) Then
                                    If bStillSeekForChar Then
                                       bStillSeekForChar = False
                                    End If
                                    If Not (bStillSeekForChar) Then Exit Do
                                 End If
                                 StopPos = StopPos - 1
                             If StopPos > 0 Then
                                defValue = Trim(Replace(Left(DefText, StopPos), Chr(34), Chr(32)))
                                defValue = Trim(Replace(defValue, Chr(39), Chr(32)))
                                List2.AddItem defValue
                                DefText = Right(DefText, Len(DefText) - StopPos)
                             End If
                            defValue = Trim(Replace(DefText, Chr(34), Chr(32)))
                            defValue = Trim(Replace(defValue, Chr(39), Chr(32)))
                            List2.AddItem defValue
                            DefText = ""
                          End If
                          Exit Do
                     End If
                     ParamArrayPos = ParamArrayPos + 1
                MsgBox defParam
             End If
        End If
       Exit Do
     End If
     TagArrayPos = TagArrayPos + 1
End If

End Sub

Private Sub Form_Load()

Text1.Text = "<img src=" & Chr(34) & "hello.jpg" & Chr(34) & " border=0 width=" & Chr(34) & "2" & Chr(34) & " height=34 align=" & Chr(34) & "texttop" & Chr(34) & " alt=hey>"

TagParamIndex(0, 0) = "IMG"
TagParamIndex(0, 1) = "SRC"
TagParamIndex(0, 2) = "BORDER"
TagParamIndex(0, 3) = "WIDTH"
TagParamIndex(0, 4) = "HEIGHT"
TagParamIndex(0, 5) = "ALIGN"
TagParamIndex(0, 6) = "ALT"
TagParamIndex(0, 7) = "NAME"
TagParamIndex(0, 8) = "ID"
TagParamIndex(0, 9) = Chr(0)

TagParamIndex(1, 0) = "A"
TagParamIndex(1, 1) = "HREF"
TagParamIndex(1, 2) = "TARGET"
TagParamIndex(1, 3) = "NAME"
TagParamIndex(1, 4) = "ID"
TagParamIndex(1, 5) = Chr(0)

TagParamIndex(2, 0) = Chr(0)

End Sub

Author Comment

ID: 2692563
This code is more than EXCELLENT!!

Thank u man!

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Background What I'm presenting in this article is the result of 2 conditions in my work area: We have a SQL Server production environment but no development or test environment; andWe have an MS Access front end using tables in SQL Server but we a…
Since upgrading to Office 2013 or higher installing the Smart Indenter addin will fail. This article will explain how to install it so it will work regardless of the Office version installed.
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question