• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 3739
  • Last Modified:

Vbscript finds text within web page

Expert Exchange Expert "Shift-3" helped me with the code below. It's basically searching for a word within a web page. http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Script/Q_23871363.html

Everything works great, but I need help with editing the code to find an "exact match" within the webpage.

e.g.

 If I'm searching for "fox"

 "Firefox" or "foxy" would be considered unsuccessful.



Const ForReading = 1
Const ForWriting = 2
Const TriStateUseDefault = -2
 
On Error Resume Next
 
strWordList = "C:\Words2Search.txt"
strErrorLog = "C:\Failed to Find.txt"
 
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objWordList = objFSO.OpenTextFile(strWordList, ForReading, False, TriStateUseDefault)
arrWords = Split(objWordList.ReadAll, vbCrLf)
objWordList.Close
 
Set objErrorLog = objFSO.OpenTextFile(strErrorLog, ForWriting, True)
 
Set objShell = CreateObject("Shell.Application")
Set objShellWindows = objShell.Windows
 
For i = 0 to objShellWindows.Count - 1
    Set objIE = objShellWindows.Item(i)
    strURL = objIE.LocationURL
    If Left(strURL, 4) = "http" Then
        strText = objIE.document.body.innerText
        
        For Each strWord in arrWords
            If InStr(1, strText, strWord, vbTextCompare) = 0 Then
                objErrorLog.WriteLine strWord
            End If
        Next
    End If
Next
 
objErrorLog.Close

Open in new window

0
abgtemp
Asked:
abgtemp
  • 2
1 Solution
 
Shift-3Commented:
That does make it a little trickier.

This revision should only find words which are not preceded or followed by letters or numbers.  It will still find ones adjacent to symbols.


Const ForReading = 1
Const ForWriting = 2
Const TriStateUseDefault = -2
 
On Error Resume Next
 
strWordList = "C:\Words2Search.txt"
strErrorLog = "C:\Failed to Find.txt"
 
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objWordList = objFSO.OpenTextFile(strWordList, ForReading, False, TriStateUseDefault)
arrWords = Split(objWordList.ReadAll, vbCrLf)
objWordList.Close
 
Set objErrorLog = objFSO.OpenTextFile(strErrorLog, ForWriting, True)
 
Set objShell = CreateObject("Shell.Application")
Set objShellWindows = objShell.Windows
 
For i = 0 to objShellWindows.Count - 1
    Set objIE = objShellWindows.Item(i)
    strURL = objIE.LocationURL
    If Left(strURL, 4) = "http" Then
        strText = objIE.document.body.innerText
        
        For Each strWord in arrWords
            If Not ExactMatch(1, strText, strWord) Then
                objErrorLog.WriteLine strWord
            End If
        Next
    End If
Next
 
objErrorLog.Close
 
Function ExactMatch(intStart, strText, strToFind)
    intPos = InStr(intStart, strText, strToFind, vbTextCompare)
    If intPos <> 0 Then
        If intPos > 1 Then
            strBefore = Mid(strText, intPos - 1, 1)                
            If IsNumeric(strBefore) Or UCase(strBefore) <> LCase(strBefore) Then
                blnBad = True
            End If
        End If
        
        If (intPos + Len(strToFind)) < Len(strText) Then
            strAfter = Mid(strText, intPos + Len(strToFind), 1)                
            If IsNumeric(strAfter) Or UCase(strAfter) <> LCase(strAfter) Then
                blnBad = True
            End If
        End If
        
        If Not blnBad Then
            ExactMatch = True
        Else
            Call ExactMatch(intPos + 1, strText, strToFind)
        End If
    End If
End Function

Open in new window

0
 
abgtempAuthor Commented:
Hey Shift-3,

                    Thanks for the updated code.  I think I may have came up with a way to cover all bases.  I'm just adding a "space" behind the word. I also added a 'comma' just in case the word is listed in a series. I'm still giving you extra points because of your time spent on this and still I'm happy with the result of the original code.


For i = 0 to objShellWindows.Count - 1
    Set objIE = objShellWindows.Item(i)
    strURL = objIE.LocationURL
    If Left(strURL, 4) = "http" Then
        strText = objIE.document.body.innerText
        
        For Each strWord in arrWords
         strWord1 = strWord & " "              'Adds space behind word
         strWord2 = strWord & ","              'Adds comma behind word
 
 
         Match1 = InStr(1, strText, strWord1, vbTextCompare)
         Match2 = InStr(1, strText, strWord2, vbTextCompare)	
 
 
	If Match1 = 0 And Match2 = 0 Then
          objErrorLog.WriteLine strWord
            End If
        Next
    End If
Next
 
objErrorLog.Close

Open in new window

0
 
abgtempAuthor Commented:
Could you please help me with this open question that I have. Thanks
http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Script/Q_23874454.html
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now