Cropping Text

Posted on 2006-04-01
Last Modified: 2008-01-09
Hi experts...

I would like to ask about how to do this:

For example:

<FORM name=myLoginForm onsubmit="return checkrequired(this)" method=post><INPUT type=hidden name=forgotPasswordClicked> <INPUT type=hidden value=false name=cs> <INPUT type=hidden value=Billing.jsp?siteId=1&amp;jid=FFADF9475X6858E5D3X1CAXDEC29ED21&amp;platformId=2&amp;eGift=&amp;NL=true name=caller>
<TD colSpan=3><B><FONT color=#ff6600>Sign in for rewards and faster shopping </FONT></B></TD></TR>
<TD colSpan=3>We’ll fill in your preferences, reward points and saved billing information.<BR></TD></TR>
<TD colSpan=3><B>Handango Member ID (email address)</B><BR><INPUT class=Input name=requiredEmail> <BR></TD></TR>
<TD colSpan=3><B>Password</B><BR><INPUT class=Input type=password maxLength=20 value="" name=requiredPassword></TD></TR>
<TD vAlign=top colSpan=3><!--<input type="image" name="getPassword" src="images/english/buttons/forgot_pass.gif" height="15" border="0" onClick="TurnOffVerify()">--><A href="javascript:forgotPasswordSubmit()"><B>Forgot your password?</B></A> <BR>(We'll send an email with your password.) </TD></TR>
<TD vAlign=top width=20><INPUT type=checkbox value=true name=RememerLoginEmail></TD>
<TD vAlign=top>Remember my email when I return.&nbsp;</TD></TR>

i have this HTML source ( may not always look this but its a FORM )

The question is :

How to crop thoose source to become this :
all the line that contain <input....
just want to get the name="..... ( not include the name="" just in between the " ")

if not clear here some sample again
INPUT type=checkbox value=true name=RememerLoginEmail> become this ---> "RememerLoginEmail"
all begin with input or INPUT

Question by:abangbatax
    LVL 7

    Expert Comment

    either you could use a regular expression to match the <INPUT> tag and then pull out the name


    use the MSHTML object to parse the HTML...that object will have a forms collection...which will have a element collection...iterate thru that pulling out the name...but the source url (which can be a file) must be valid HTML
    ''You have to add a reference to the Microsoft HTML Object Library

    ''this should work
       Dim objMSHTML As New MSHTML.HTMLDocument
        Dim objDocument As MSHTML.HTMLDocument
        Set objDocument = objMSHTML.createDocumentFromUrl(Url, vbNullString)

        ''wait until the d/l is complete
        While objDocument.readyState <> "complete"
        ''<TODO> add timer/counter here to eventualy timeout

       Dim objForm As HTMLFormElement
        Dim objInput As HTMLInputElement
        Dim obj As Object
        For Each objForm In objDocument.Forms
            For Each obj In objForm.elements
                If UCase(obj.nodeName) = "INPUT" Then
                    Set objInput = obj
                    '''set name using
                End If
    LVL 7

    Expert Comment

    you can also use objMSHTML.createDocumentFragment
    so long as the tags match up...they don't in your example

    Author Comment

    Hi SweetsGreen,

    If you dont mind, what is the source for webbrowser_downloadcomplete instead of  MSHTML.HTMLDocument
    LVL 7

    Expert Comment

    simply replace

    Set objDocument = objMSHTML.createDocumentFromUrl(Url, vbNullString)


    Set objDocument = wb.document ''where wb is your webbrowser control


    I don't believe there is a way to parse out elements using the webbrowser control alone.
    You might or might not need the reference to the "Microsoft HTML Object Library " since the WebBrowser control is based off of MSHTML...but I'm not sure off the top of my head.

    Author Comment

    Can anyone convert this VB.NET source to VB?

        Private Function FilterInput(ByVal htmlText As String, ByVal listB As ListBox) As String()

            Dim inputPattern As String = "<INPUT(?<input>[\w\s" & Chr(34) & "=']+)"
            Dim namePattern As String = "NAME=" & Chr(34) & "?(?<name>[A-Za-z0-9_]+)"

            Dim regexInput As New Regex(inputPattern, RegexOptions.IgnoreCase)
            Dim regexName As New Regex(namePattern, RegexOptions.IgnoreCase)

            For Each inputElements As Match In regexInput.Matches(htmlText)
                Dim input As String = inputElements.Groups("input").Value

                Dim name As Match = regexName.Match(input)

                If name.Success Then
                End If
            Next inputElements

        End Function
    LVL 7

    Accepted Solution

    Well I guess you are not going the MSHTML route.

    As for your request to convert the code here you go....
    but a few things you should know.
    1. VB6 does not have the support for regular expressions that .NET have to use VBScript regex, so you have to add a reference for "Microsoft VBScript Regular Expressions"
    2. Your regex's supplied would not work...I changed them and they should be fine now.
    3. If you are not familiar with regular expressions, I would of gone with the MSHTML solution, since it does all the parsing work for don't have to worry about using an incorrect regular expression.

        Dim inputPattern As String
        inputPattern = "<(INPUT)[^<>]+>"
        Dim namePattern As String
        namePattern = "name\s*=('|& Chr(34) &|\s)*\w*('|& Chr(34) &|\s)*\s*"
        Dim regexInput As New RegExp
        regexInput.IgnoreCase = True
        regexInput.Global = True ''since we want all matches
        regexInput.Pattern = inputPattern
        Dim regexName As New RegExp
        regexName.IgnoreCase = True
        regexName.Pattern = namePattern
        Dim regexCleanup As New RegExp
        regexCleanup.Global = True
        regexCleanup.IgnoreCase = True
        regexCleanup.Pattern = "(name\s*=\s*|'|& Chr(34) &)"

        Dim name As Match
        Dim inputString As String
        Dim inputElements As Match
        For Each inputElements In regexInput.Execute(HTMLText)
             inputString = inputElements.Value

                For Each name In regexName.Execute(inputString)
                ''remove any name= and '
                    listB.Items.Add (regexCleanup.Replace(name.Value, ""))
                    ''Debug.Print regexCleanup.Replace(name.Value, "")
    LVL 7

    Expert Comment

    make sure the reference you use is "Microsoft VBScript Regular Expressions 5.5"

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Looking for New Ways to Advertise?

    Engage with tech pros in our community with native advertising, as a Vendor Expert, and more.

    Suggested Solutions

    There are many ways to remove duplicate entries in an SQL or Access database. Most make you temporarily insert an ID field, make a temp table and copy data back and forth, and/or are slow. Here is an easy way in VB6 using ADO to remove duplicate row…
    The debugging module of the VB 6 IDE can be accessed by way of the Debug menu item. That menu item can normally be found in the IDE's main menu line as shown in this picture.   There is also a companion Debug Toolbar that looks like the followin…
    As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
    Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…

    759 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    11 Experts available now in Live!

    Get 1:1 Help Now