Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Stripping a block of HTML into JUST the images used in the IMG tags

Posted on 2006-06-10
12
Medium Priority
?
284 Views
Last Modified: 2013-12-24
I am looking to strip a large block of HTML into just the <IMG> tags. For example, I'd like to turn the following code:

<p>
<img src="images/top.jpg" width="100" alt="hey there!">
<br>
Hey check this out!<br>
<img src="gallery/checkthisout.jpg" width="500" border="0">
</p>

Into a list of just the image files referenced, ie.

"images/top.jpg", "gallery/checkthisout.jpg"

If anyone could help me on the road to doing this, I'd be really appreciative.

0
Comment
Question by:bombrider
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 2
12 Comments
 
LVL 25

Expert Comment

by:dgrafx
ID: 16877153
First, Read your file:<br>
<CFFILE ACTION="READ" file="d:\_web\path to a file\test.html" variable="str">

<cfset startstring="<img">
<cfset endstring=">">
<cfset parsed="">
<cfset images="">
<cfloop list="#str#" index="ii" delimiters="#chr(10)##chr(13)#">
<cfif listvaluecountnocase(ii,startstring,"#chr(32)##chr(9)#") gt 1>
      <cfloop list="#ii#" index="jj" delimiters="<">
      <CFSET start = findnocase("img",jj)>
      <cfif start>
      <cfset end = findnocase(endstring,jj,start)+len(endstring)>
      <cfif end gt start>
      <cfset parsed = ListAppend(parsed,"<" & trim(MID(jj,start,end-start)))>
      </cfif>
      </cfif>
      </cfloop>
<cfelse>
      <CFSET start = findnocase(startstring,ii)>
      <cfif start>
      <cfset end = findnocase(endstring,ii,start)+len(endstring)>
      <cfif end gt start>
      <cfset parsed = ListAppend(parsed,trim(MID(ii,start,end-start)))>
      </cfif>
      </cfif>
      </cfif>            
</cfloop>
Here are your img tags:<br>
<br>#replace(htmlcodeformat(parsed),",","<br>","all")#<br>

<cfset startstring="src=#chr(34)#">
<cfset endstring="#chr(34)#">
<cfloop list="#parsed#" index="kk">
<CFSET start = findnocase(startstring,kk)+len(startstring)>
<cfif start>
<cfset end = findnocase(endstring,kk,start)>
<cfif end gt start>
<cfset images = ListAppend(images,trim(MID(kk,start,end-start)))>
</cfif>
</cfif>
</cfloop>
And Here is your image list:<br>
#listqualify(images,chr(34))#

by appreciative, do you mean increasing points?
:)
0
 

Author Comment

by:bombrider
ID: 16877217
That almost works. It only displays one image from my HTML code which contains 3 images.

Almost there I guess!
0
 
LVL 25

Expert Comment

by:dgrafx
ID: 16877268
Are the 3 images right next to each other with no spaces like:
<img src="xyz.jpg"><img src="wer.jpg"><img src="abc.jpg">
0
Plesk WordPress Toolkit

Plesk's WordPress Toolkit allows server administrators, resellers and customers to manage their WordPress instances, enabling a variety of development workflows for WordPress admins of all skill levels, from beginners to pros.

See why 2/3 of Plesk servers use it.

 

Author Comment

by:bombrider
ID: 16879414
For the purposes I am needing the script for, they may or may not be next to eachother in that format, so the script needs to accommodate both.. I am parsing HTML, this script is intended to weed out just the value of the image source (IMG SRC="") from large blocks of HTML that will include tables, font tags, css, etc.

There may be scenarios where images are directly next to eachother as you have put in your example. Does that make sense? :D

Thanks!
0
 
LVL 25

Assisted Solution

by:dgrafx
dgrafx earned 500 total points
ID: 16879457
the reason i asked is because i believe it won't parse correctly if stacked together like above - without something in between - space, tab etc

try this:

right after the <CFFILE ACTION="READ" file="d:\_web\path to a file\test.html" variable="str">
Put this:
<cfset str=replacenocase(str,"<img"," <img","all")>
0
 
LVL 7

Accepted Solution

by:
aseusainc earned 500 total points
ID: 16882809
You can replace the cfhttp with a cffile is that is the method you are using.  It will return a comma delimited list of all images used on a page.  I did not code any dupe checking, but it works exactly as you want.

Try this:

<CFHTTP Method="GET"
 URL="http://www.experts-exchange.com"
 UserAgent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
 Redirect="No">
</cfhttp>
 
<cfset start = 1>
<cfset loopstop = 0>
<cfset imagelist = "">

<CFLOOP condition="loopstop EQ 0">
  <cfset start = findnocase('<img src="',CFHTTP.FileContent,start)>
  <cfif start EQ 0>
    <cfset loopstop = 1>
  <cfelse>
    <cfset start = start + 10>
    <cfset end = findnocase('"',CFHTTP.FileContent,start)>
    <cfset count = end - start>
    <cfset image = mid(CFHTTP.FileContent,start,count)>
      <cfset imagelist = listappend(imagelist,image,',')>
  </cfif>
</cfloop>
<cfoutput>#imagelist#</cfoutput>
0
 
LVL 25

Expert Comment

by:dgrafx
ID: 16884748
yes, that's a good way of going about it - except that where it fails is that one cannot count on the image tag being <img src=.
It can easily be <img alt= or <img height= or <img id= etcetera...
and that is the main reason for going about it the way I did.
0
 
LVL 7

Expert Comment

by:aseusainc
ID: 16884815
So would changing

<cfset start = findnocase('<img src="',CFHTTP.FileContent,start)>

to

<cfset start = findnocase('src="',CFHTTP.FileContent,start)>

fix it?  There any other tags that use "src="?
0
 
LVL 25

Expert Comment

by:dgrafx
ID: 16885196
no, what I used (if you look at my code) is to find "<img" (all img tags start with "<img".
Then from that point find 'src="'
works everytime!

I like your condition loop!
I crawl directories using that method - wish I would have thought of it this time :)
0
 
LVL 7

Expert Comment

by:aseusainc
ID: 16885345
Fixed!  I changed it to find "<IMG" 1st, then "src=" from there.  Give it a whirl :)



<CFHTTP Method="GET"
 URL="http://www.experts-exchange.com"
 UserAgent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
 Redirect="No">
</cfhttp>
 
<cfset start = 1>
<cfset loopstop = 0>
<cfset imagelist = "">

<CFLOOP condition="loopstop EQ 0">
  <cfset start = findnocase('<img',CFHTTP.FileContent,start)>
  <cfif start EQ 0>
    <cfset loopstop = 1>
  <cfelse>
    <cfset start = findnocase('src="',CFHTTP.FileContent,start)>
    <cfset start = start + 5>
    <cfset end = findnocase('"',CFHTTP.FileContent,start)>
    <cfset count = end - start>
    <cfset image = mid(CFHTTP.FileContent,start,count)>
      <cfset imagelist = listappend(imagelist,image,',')>
  </cfif>
</cfloop>
<cfoutput>#imagelist#</cfoutput>
0
 
LVL 7

Expert Comment

by:aseusainc
ID: 17051557
Suggest assist between aseusainc and dgrafx as a correct answer was provided.
0

Featured Post

Understanding Web Applications

Without even knowing it, most of us are using web applications on a daily basis. Gmail and Yahoo email, Twitter, Facebook, and eBay are used by most of us daily—and they are web applications. We often confuse these web applications tools for websites.  So, what is the difference?

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

One of the typical problems I have experienced is when you have to move a web server from one hosting site to another. You normally prepare all on the new host, transfer the site, change DNS and cross your fingers hoping all will be ok on new server…
Introduction This article explores the design of a cache system that can improve the performance of a web site or web application.  The assumption is that the web site has many more “read” operations than “write” operations (this is commonly the ca…
Please read the paragraph below before following the instructions in the video — there are important caveats in the paragraph that I did not mention in the video. If your PaperPort 12 or PaperPort 14 is failing to start, or crashing, or hanging, …
This lesson discusses how to use a Mainform + Subforms in Microsoft Access to find and enter data for payments on orders. The sample data comes from a custom shop that builds and sells movable storage structures that are delivered to your property. …
Suggested Courses

610 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question