Solved

Stripping a block of HTML into JUST the images used in the IMG tags

Posted on 2006-06-10
12
269 Views
Last Modified: 2013-12-24
I am looking to strip a large block of HTML into just the <IMG> tags. For example, I'd like to turn the following code:

<p>
<img src="images/top.jpg" width="100" alt="hey there!">
<br>
Hey check this out!<br>
<img src="gallery/checkthisout.jpg" width="500" border="0">
</p>

Into a list of just the image files referenced, ie.

"images/top.jpg", "gallery/checkthisout.jpg"

If anyone could help me on the road to doing this, I'd be really appreciative.

0
Comment
Question by:bombrider
  • 5
  • 4
  • 2
12 Comments
 
LVL 25

Expert Comment

by:dgrafx
ID: 16877153
First, Read your file:<br>
<CFFILE ACTION="READ" file="d:\_web\path to a file\test.html" variable="str">

<cfset startstring="<img">
<cfset endstring=">">
<cfset parsed="">
<cfset images="">
<cfloop list="#str#" index="ii" delimiters="#chr(10)##chr(13)#">
<cfif listvaluecountnocase(ii,startstring,"#chr(32)##chr(9)#") gt 1>
      <cfloop list="#ii#" index="jj" delimiters="<">
      <CFSET start = findnocase("img",jj)>
      <cfif start>
      <cfset end = findnocase(endstring,jj,start)+len(endstring)>
      <cfif end gt start>
      <cfset parsed = ListAppend(parsed,"<" & trim(MID(jj,start,end-start)))>
      </cfif>
      </cfif>
      </cfloop>
<cfelse>
      <CFSET start = findnocase(startstring,ii)>
      <cfif start>
      <cfset end = findnocase(endstring,ii,start)+len(endstring)>
      <cfif end gt start>
      <cfset parsed = ListAppend(parsed,trim(MID(ii,start,end-start)))>
      </cfif>
      </cfif>
      </cfif>            
</cfloop>
Here are your img tags:<br>
<br>#replace(htmlcodeformat(parsed),",","<br>","all")#<br>

<cfset startstring="src=#chr(34)#">
<cfset endstring="#chr(34)#">
<cfloop list="#parsed#" index="kk">
<CFSET start = findnocase(startstring,kk)+len(startstring)>
<cfif start>
<cfset end = findnocase(endstring,kk,start)>
<cfif end gt start>
<cfset images = ListAppend(images,trim(MID(kk,start,end-start)))>
</cfif>
</cfif>
</cfloop>
And Here is your image list:<br>
#listqualify(images,chr(34))#

by appreciative, do you mean increasing points?
:)
0
 

Author Comment

by:bombrider
ID: 16877217
That almost works. It only displays one image from my HTML code which contains 3 images.

Almost there I guess!
0
 
LVL 25

Expert Comment

by:dgrafx
ID: 16877268
Are the 3 images right next to each other with no spaces like:
<img src="xyz.jpg"><img src="wer.jpg"><img src="abc.jpg">
0
Connect further...control easier

With the ATEN CE624, you can now enjoy a high-quality visual experience powered by HDBaseT technology and the convenience of a single Cat6 cable to transmit uncompressed video with zero latency and multi-streaming for dual-view applications where remote access is required.

 

Author Comment

by:bombrider
ID: 16879414
For the purposes I am needing the script for, they may or may not be next to eachother in that format, so the script needs to accommodate both.. I am parsing HTML, this script is intended to weed out just the value of the image source (IMG SRC="") from large blocks of HTML that will include tables, font tags, css, etc.

There may be scenarios where images are directly next to eachother as you have put in your example. Does that make sense? :D

Thanks!
0
 
LVL 25

Assisted Solution

by:dgrafx
dgrafx earned 125 total points
ID: 16879457
the reason i asked is because i believe it won't parse correctly if stacked together like above - without something in between - space, tab etc

try this:

right after the <CFFILE ACTION="READ" file="d:\_web\path to a file\test.html" variable="str">
Put this:
<cfset str=replacenocase(str,"<img"," <img","all")>
0
 
LVL 7

Accepted Solution

by:
aseusainc earned 125 total points
ID: 16882809
You can replace the cfhttp with a cffile is that is the method you are using.  It will return a comma delimited list of all images used on a page.  I did not code any dupe checking, but it works exactly as you want.

Try this:

<CFHTTP Method="GET"
 URL="http://www.experts-exchange.com"
 UserAgent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
 Redirect="No">
</cfhttp>
 
<cfset start = 1>
<cfset loopstop = 0>
<cfset imagelist = "">

<CFLOOP condition="loopstop EQ 0">
  <cfset start = findnocase('<img src="',CFHTTP.FileContent,start)>
  <cfif start EQ 0>
    <cfset loopstop = 1>
  <cfelse>
    <cfset start = start + 10>
    <cfset end = findnocase('"',CFHTTP.FileContent,start)>
    <cfset count = end - start>
    <cfset image = mid(CFHTTP.FileContent,start,count)>
      <cfset imagelist = listappend(imagelist,image,',')>
  </cfif>
</cfloop>
<cfoutput>#imagelist#</cfoutput>
0
 
LVL 25

Expert Comment

by:dgrafx
ID: 16884748
yes, that's a good way of going about it - except that where it fails is that one cannot count on the image tag being <img src=.
It can easily be <img alt= or <img height= or <img id= etcetera...
and that is the main reason for going about it the way I did.
0
 
LVL 7

Expert Comment

by:aseusainc
ID: 16884815
So would changing

<cfset start = findnocase('<img src="',CFHTTP.FileContent,start)>

to

<cfset start = findnocase('src="',CFHTTP.FileContent,start)>

fix it?  There any other tags that use "src="?
0
 
LVL 25

Expert Comment

by:dgrafx
ID: 16885196
no, what I used (if you look at my code) is to find "<img" (all img tags start with "<img".
Then from that point find 'src="'
works everytime!

I like your condition loop!
I crawl directories using that method - wish I would have thought of it this time :)
0
 
LVL 7

Expert Comment

by:aseusainc
ID: 16885345
Fixed!  I changed it to find "<IMG" 1st, then "src=" from there.  Give it a whirl :)



<CFHTTP Method="GET"
 URL="http://www.experts-exchange.com"
 UserAgent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
 Redirect="No">
</cfhttp>
 
<cfset start = 1>
<cfset loopstop = 0>
<cfset imagelist = "">

<CFLOOP condition="loopstop EQ 0">
  <cfset start = findnocase('<img',CFHTTP.FileContent,start)>
  <cfif start EQ 0>
    <cfset loopstop = 1>
  <cfelse>
    <cfset start = findnocase('src="',CFHTTP.FileContent,start)>
    <cfset start = start + 5>
    <cfset end = findnocase('"',CFHTTP.FileContent,start)>
    <cfset count = end - start>
    <cfset image = mid(CFHTTP.FileContent,start,count)>
      <cfset imagelist = listappend(imagelist,image,',')>
  </cfif>
</cfloop>
<cfoutput>#imagelist#</cfoutput>
0
 
LVL 7

Expert Comment

by:aseusainc
ID: 17051557
Suggest assist between aseusainc and dgrafx as a correct answer was provided.
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

This is a guide to setting up a new WHM/cPanel Server to be used for web hosting accounts. It is intended for web hosting company administrators and dedicated server owners. For under $99 per month (considering normal rate of Big Data Cetnters like …
One of the typical problems I have experienced is when you have to move a web server from one hosting site to another. You normally prepare all on the new host, transfer the site, change DNS and cross your fingers hoping all will be ok on new server…
This video shows how to quickly and easily add an email signature for all users on Exchange 2016. The resulting signature is applied on a server level by Exchange Online. The email signature template has been downloaded from: www.mail-signatures…
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question