Solved

LOOKING FOR REGEX TO SPOT MISSING ALT TAGS

Posted on 2006-07-05
7
560 Views
Last Modified: 2011-10-03
Hi there

I'm looking for a VBSCRIPT regex to look into a web page source code and return IMG tags but only those which either have the ALT tag missing - the regex would ideally return the name of the image but simply returning the whole string or nothing would be fine.

Peter.
0
Comment
Question by:fsbsupport
  • 3
  • 2
7 Comments
 
LVL 7

Expert Comment

by:yotamsher
ID: 17048643
Hi Peter
can you give some info?

What are you trying to achive?
Is this script supposed to be embeded in a web page?
Why VBSCRIPT?

Yotam
0
 

Author Comment

by:fsbsupport
ID: 17048775
It's part of a content management system - when someone edits a page - I look through the source and set a flag for pasted WORD content - which they have to remove .... and for tables that are too large. So I also want to be able to flag up a page that contains one or more images - but wherein the ALT tag is missing from the image.  I don't really need to know how many times this occurs - just that it occurs.

VBscript (server side ASP) - because that's what the system is written in - though I would assume the regular expressions are similar in most languages. I know how to actually write the code - it's just the expression I'm unclear on.

Regards

Peter.
0
 
LVL 7

Expert Comment

by:yotamsher
ID: 17049045
Just to be sure, can you post here a minimal example of page containing bad IMG and good IMG example
0
 

Author Comment

by:fsbsupport
ID: 17049341

Good example - bearing in mind that there is no guarantee that double quotes are used - might be single....

<img src="fred.gif" alt="this is a picture of fred">

bad example

<img src="fred.gif">

There may be other attributes such as size etc....
0
 
LVL 7

Accepted Solution

by:
yotamsher earned 500 total points
ID: 17050597
Hi Peter

I had a problem of matching "Starting with IMG, and has no ALT"
but what about having two regular expressions?
the following code (working as a stand-alone vbs) assumes each Tag is in a different line
I guess you are already breaking the HTML into Tags (if not there are examples in the internet)

hope this helps

Yotam

' *****************
' filter-images.vbs
' *****************
'open the HTML file
Const ForReading = 1
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile _
    (".\images.html", ForReading)

'Set up RegExp objects

Set ImgRegularExpressionObject = New RegExp
Set GoodImgRegularExpressionObject = New RegExp

With ImgRegularExpressionObject
.Pattern = "img"
.IgnoreCase = True
.Global = True
End With

With GoodImgRegularExpressionObject
.Pattern = "alt=*"
.IgnoreCase = True
.Global = True
End With

'Read the file line by line

Do Until objTextFile.AtEndOfStream
    strNextLine = objTextFile.Readline

'check for an IMG tag
   Set image_match = ImgRegularExpressionObject.Execute(strNextLine)
   If image_match.Count > 0 Then
      Set good_image_match = GoodImgRegularExpressionObject.Execute(strNextLine)
      If good_image_match.Count > 0 Then
         WScript.Echo "[" & strNextLine & "] Is a good Img tag."
      Else
         WScript.Echo "[" & strNextLine & "] Is an Img tag without ALT."
      End If
   Else
      WScript.Echo "[" & strNextLine & "] Is not an Img tag."
   End If
Loop

Set RegularExpressionObject = nothing
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
mapAB Challlenge 35 124
Microsoft C++ code failing in executable that worked 9 85
Hide vba in gp 7 82
Regular Expression Calculator Tester 2 54
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
Displaying an arrayList in a listView using the default adapter is rarely the best solution. To get full control of your display data, and to be able to refresh it after editing, requires the use of a custom adapter.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now