Stripping a block of HTML into JUST the images used in the IMG tags

I am looking to strip a large block of HTML into just the <IMG> tags. For example, I'd like to turn the following code:

<p>
<img src="images/top.jpg" width="100" alt="hey there!">
<br>
Hey check this out!<br>
<img src="gallery/checkthisout.jpg" width="500" border="0">
</p>

Into a list of just the image files referenced, ie.

"images/top.jpg", "gallery/checkthisout.jpg"

If anyone could help me on the road to doing this, I'd be really appreciative.

bombriderAsked:
Who is Participating?
 
aseusaincConnect With a Mentor Commented:
You can replace the cfhttp with a cffile is that is the method you are using.  It will return a comma delimited list of all images used on a page.  I did not code any dupe checking, but it works exactly as you want.

Try this:

<CFHTTP Method="GET"
 URL="http://www.experts-exchange.com"
 UserAgent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
 Redirect="No">
</cfhttp>
 
<cfset start = 1>
<cfset loopstop = 0>
<cfset imagelist = "">

<CFLOOP condition="loopstop EQ 0">
  <cfset start = findnocase('<img src="',CFHTTP.FileContent,start)>
  <cfif start EQ 0>
    <cfset loopstop = 1>
  <cfelse>
    <cfset start = start + 10>
    <cfset end = findnocase('"',CFHTTP.FileContent,start)>
    <cfset count = end - start>
    <cfset image = mid(CFHTTP.FileContent,start,count)>
      <cfset imagelist = listappend(imagelist,image,',')>
  </cfif>
</cfloop>
<cfoutput>#imagelist#</cfoutput>
0
 
dgrafxCommented:
First, Read your file:<br>
<CFFILE ACTION="READ" file="d:\_web\path to a file\test.html" variable="str">

<cfset startstring="<img">
<cfset endstring=">">
<cfset parsed="">
<cfset images="">
<cfloop list="#str#" index="ii" delimiters="#chr(10)##chr(13)#">
<cfif listvaluecountnocase(ii,startstring,"#chr(32)##chr(9)#") gt 1>
      <cfloop list="#ii#" index="jj" delimiters="<">
      <CFSET start = findnocase("img",jj)>
      <cfif start>
      <cfset end = findnocase(endstring,jj,start)+len(endstring)>
      <cfif end gt start>
      <cfset parsed = ListAppend(parsed,"<" & trim(MID(jj,start,end-start)))>
      </cfif>
      </cfif>
      </cfloop>
<cfelse>
      <CFSET start = findnocase(startstring,ii)>
      <cfif start>
      <cfset end = findnocase(endstring,ii,start)+len(endstring)>
      <cfif end gt start>
      <cfset parsed = ListAppend(parsed,trim(MID(ii,start,end-start)))>
      </cfif>
      </cfif>
      </cfif>            
</cfloop>
Here are your img tags:<br>
<br>#replace(htmlcodeformat(parsed),",","<br>","all")#<br>

<cfset startstring="src=#chr(34)#">
<cfset endstring="#chr(34)#">
<cfloop list="#parsed#" index="kk">
<CFSET start = findnocase(startstring,kk)+len(startstring)>
<cfif start>
<cfset end = findnocase(endstring,kk,start)>
<cfif end gt start>
<cfset images = ListAppend(images,trim(MID(kk,start,end-start)))>
</cfif>
</cfif>
</cfloop>
And Here is your image list:<br>
#listqualify(images,chr(34))#

by appreciative, do you mean increasing points?
:)
0
 
bombriderAuthor Commented:
That almost works. It only displays one image from my HTML code which contains 3 images.

Almost there I guess!
0
Easily Design & Build Your Next Website

Squarespace’s all-in-one platform gives you everything you need to express yourself creatively online, whether it is with a domain, website, or online store. Get started with your free trial today, and when ready, take 10% off your first purchase with offer code 'EXPERTS'.

 
dgrafxCommented:
Are the 3 images right next to each other with no spaces like:
<img src="xyz.jpg"><img src="wer.jpg"><img src="abc.jpg">
0
 
bombriderAuthor Commented:
For the purposes I am needing the script for, they may or may not be next to eachother in that format, so the script needs to accommodate both.. I am parsing HTML, this script is intended to weed out just the value of the image source (IMG SRC="") from large blocks of HTML that will include tables, font tags, css, etc.

There may be scenarios where images are directly next to eachother as you have put in your example. Does that make sense? :D

Thanks!
0
 
dgrafxConnect With a Mentor Commented:
the reason i asked is because i believe it won't parse correctly if stacked together like above - without something in between - space, tab etc

try this:

right after the <CFFILE ACTION="READ" file="d:\_web\path to a file\test.html" variable="str">
Put this:
<cfset str=replacenocase(str,"<img"," <img","all")>
0
 
dgrafxCommented:
yes, that's a good way of going about it - except that where it fails is that one cannot count on the image tag being <img src=.
It can easily be <img alt= or <img height= or <img id= etcetera...
and that is the main reason for going about it the way I did.
0
 
aseusaincCommented:
So would changing

<cfset start = findnocase('<img src="',CFHTTP.FileContent,start)>

to

<cfset start = findnocase('src="',CFHTTP.FileContent,start)>

fix it?  There any other tags that use "src="?
0
 
dgrafxCommented:
no, what I used (if you look at my code) is to find "<img" (all img tags start with "<img".
Then from that point find 'src="'
works everytime!

I like your condition loop!
I crawl directories using that method - wish I would have thought of it this time :)
0
 
aseusaincCommented:
Fixed!  I changed it to find "<IMG" 1st, then "src=" from there.  Give it a whirl :)



<CFHTTP Method="GET"
 URL="http://www.experts-exchange.com"
 UserAgent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
 Redirect="No">
</cfhttp>
 
<cfset start = 1>
<cfset loopstop = 0>
<cfset imagelist = "">

<CFLOOP condition="loopstop EQ 0">
  <cfset start = findnocase('<img',CFHTTP.FileContent,start)>
  <cfif start EQ 0>
    <cfset loopstop = 1>
  <cfelse>
    <cfset start = findnocase('src="',CFHTTP.FileContent,start)>
    <cfset start = start + 5>
    <cfset end = findnocase('"',CFHTTP.FileContent,start)>
    <cfset count = end - start>
    <cfset image = mid(CFHTTP.FileContent,start,count)>
      <cfset imagelist = listappend(imagelist,image,',')>
  </cfif>
</cfloop>
<cfoutput>#imagelist#</cfoutput>
0
 
aseusaincCommented:
Suggest assist between aseusainc and dgrafx as a correct answer was provided.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.