Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 572
  • Last Modified:

LOOKING FOR REGEX TO SPOT MISSING ALT TAGS

Hi there

I'm looking for a VBSCRIPT regex to look into a web page source code and return IMG tags but only those which either have the ALT tag missing - the regex would ideally return the name of the image but simply returning the whole string or nothing would be fine.

Peter.
0
fsbsupport
Asked:
fsbsupport
  • 3
  • 2
1 Solution
 
yotamsherCommented:
Hi Peter
can you give some info?

What are you trying to achive?
Is this script supposed to be embeded in a web page?
Why VBSCRIPT?

Yotam
0
 
fsbsupportAuthor Commented:
It's part of a content management system - when someone edits a page - I look through the source and set a flag for pasted WORD content - which they have to remove .... and for tables that are too large. So I also want to be able to flag up a page that contains one or more images - but wherein the ALT tag is missing from the image.  I don't really need to know how many times this occurs - just that it occurs.

VBscript (server side ASP) - because that's what the system is written in - though I would assume the regular expressions are similar in most languages. I know how to actually write the code - it's just the expression I'm unclear on.

Regards

Peter.
0
 
yotamsherCommented:
Just to be sure, can you post here a minimal example of page containing bad IMG and good IMG example
0
 
fsbsupportAuthor Commented:

Good example - bearing in mind that there is no guarantee that double quotes are used - might be single....

<img src="fred.gif" alt="this is a picture of fred">

bad example

<img src="fred.gif">

There may be other attributes such as size etc....
0
 
yotamsherCommented:
Hi Peter

I had a problem of matching "Starting with IMG, and has no ALT"
but what about having two regular expressions?
the following code (working as a stand-alone vbs) assumes each Tag is in a different line
I guess you are already breaking the HTML into Tags (if not there are examples in the internet)

hope this helps

Yotam

' *****************
' filter-images.vbs
' *****************
'open the HTML file
Const ForReading = 1
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile _
    (".\images.html", ForReading)

'Set up RegExp objects

Set ImgRegularExpressionObject = New RegExp
Set GoodImgRegularExpressionObject = New RegExp

With ImgRegularExpressionObject
.Pattern = "img"
.IgnoreCase = True
.Global = True
End With

With GoodImgRegularExpressionObject
.Pattern = "alt=*"
.IgnoreCase = True
.Global = True
End With

'Read the file line by line

Do Until objTextFile.AtEndOfStream
    strNextLine = objTextFile.Readline

'check for an IMG tag
   Set image_match = ImgRegularExpressionObject.Execute(strNextLine)
   If image_match.Count > 0 Then
      Set good_image_match = GoodImgRegularExpressionObject.Execute(strNextLine)
      If good_image_match.Count > 0 Then
         WScript.Echo "[" & strNextLine & "] Is a good Img tag."
      Else
         WScript.Echo "[" & strNextLine & "] Is an Img tag without ALT."
      End If
   Else
      WScript.Echo "[" & strNextLine & "] Is not an Img tag."
   End If
Loop

Set RegularExpressionObject = nothing
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now