Solved

LOOKING FOR REGEX TO SPOT MISSING ALT TAGS

Posted on 2006-07-05
7
559 Views
Last Modified: 2011-10-03
Hi there

I'm looking for a VBSCRIPT regex to look into a web page source code and return IMG tags but only those which either have the ALT tag missing - the regex would ideally return the name of the image but simply returning the whole string or nothing would be fine.

Peter.
0
Comment
Question by:fsbsupport
  • 3
  • 2
7 Comments
 
LVL 7

Expert Comment

by:yotamsher
ID: 17048643
Hi Peter
can you give some info?

What are you trying to achive?
Is this script supposed to be embeded in a web page?
Why VBSCRIPT?

Yotam
0
 

Author Comment

by:fsbsupport
ID: 17048775
It's part of a content management system - when someone edits a page - I look through the source and set a flag for pasted WORD content - which they have to remove .... and for tables that are too large. So I also want to be able to flag up a page that contains one or more images - but wherein the ALT tag is missing from the image.  I don't really need to know how many times this occurs - just that it occurs.

VBscript (server side ASP) - because that's what the system is written in - though I would assume the regular expressions are similar in most languages. I know how to actually write the code - it's just the expression I'm unclear on.

Regards

Peter.
0
 
LVL 7

Expert Comment

by:yotamsher
ID: 17049045
Just to be sure, can you post here a minimal example of page containing bad IMG and good IMG example
0
 

Author Comment

by:fsbsupport
ID: 17049341

Good example - bearing in mind that there is no guarantee that double quotes are used - might be single....

<img src="fred.gif" alt="this is a picture of fred">

bad example

<img src="fred.gif">

There may be other attributes such as size etc....
0
 
LVL 7

Accepted Solution

by:
yotamsher earned 500 total points
ID: 17050597
Hi Peter

I had a problem of matching "Starting with IMG, and has no ALT"
but what about having two regular expressions?
the following code (working as a stand-alone vbs) assumes each Tag is in a different line
I guess you are already breaking the HTML into Tags (if not there are examples in the internet)

hope this helps

Yotam

' *****************
' filter-images.vbs
' *****************
'open the HTML file
Const ForReading = 1
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile _
    (".\images.html", ForReading)

'Set up RegExp objects

Set ImgRegularExpressionObject = New RegExp
Set GoodImgRegularExpressionObject = New RegExp

With ImgRegularExpressionObject
.Pattern = "img"
.IgnoreCase = True
.Global = True
End With

With GoodImgRegularExpressionObject
.Pattern = "alt=*"
.IgnoreCase = True
.Global = True
End With

'Read the file line by line

Do Until objTextFile.AtEndOfStream
    strNextLine = objTextFile.Readline

'check for an IMG tag
   Set image_match = ImgRegularExpressionObject.Execute(strNextLine)
   If image_match.Count > 0 Then
      Set good_image_match = GoodImgRegularExpressionObject.Execute(strNextLine)
      If good_image_match.Count > 0 Then
         WScript.Echo "[" & strNextLine & "] Is a good Img tag."
      Else
         WScript.Echo "[" & strNextLine & "] Is an Img tag without ALT."
      End If
   Else
      WScript.Echo "[" & strNextLine & "] Is not an Img tag."
   End If
Loop

Set RegularExpressionObject = nothing
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
changeXy challenge 13 57
mapAB Challlenge 35 88
python question 5 58
python sqlite question 11 43
This article will show, step by step, how to integrate R code into a R Sweave document
This is about my first experience with programming Arduino.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now