Replace image tags with alt descriptions

I'm trying to create a dynamically generated text-only version of my site, and what I want to do is replace all the image tags (or designated images) with their alt description, and then spit out the rest of the HTML code.

I have a script I found to strip the images using eregi_replace, but i can't figure out how to do what I want.

Any ideas, or suggestions on a better way to do this?

Thanks in advance
carlene
carlenevsAsked:
Who is Participating?
 
eeBlueShadowConnect With a Mentor Commented:
OK, my original code was failing when the alt tag had a space in it.

This should do the job:

//----START CODE----
$pattern = "/<img.*alt=[\"']([\w ]*)[\"'][^>]*>/i";
$replace = "[IMG: $1]";
$newhtml = preg_replace($pattern,$replace,$oldhtml);
//----END CODE----

or

$newhtml = preg_replace("/<img.*alt=[\"']([\w ]*)[\"'][^>]*>/i","[IMG: $1]",$oldhtml);

_Blue
0
 
eeBlueShadowCommented:
Hi, you could try

//----START CODE----
$pattern = "/<img.*alt=[\"'](\w*)[\"'][^>]*>/i";
$replace = "[IMG: $1]";
$newhtml = preg_replace($pattern,$replace,$oldhtml);
//----END CODE----

you can collapse this into one line if you like:

$newhtml = preg_replace("/<img.*alt=[\"'](\w*)[\"'][^>]*>/i","[IMG: $1]",$oldhtml);

but the first block is easier for you to read

_Blue
0
 
eeBlueShadowCommented:
No, just realised that for some reason this only replaces the first image :-/

I'll keep trying

_Blue
0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
aolXFTCommented:
The BBC have a script that does this. It is however a perl Script, and you use it as a proxy to the website you want to visit.

http://www.bbc.co.uk/education/betsie/download.html
0
 
carlenevsAuthor Commented:
eeBlueShadwow -

That works, sorta.

When I use the script as written, it only finds & replaces the first 4 of the 7 images...

I tried changing the 'alt' to 'name' (as the two images i want to change are the only ones in this section with name) and it finds the two i want (which it didn't change previously)....

how can i modify what's above to work if there's an alt AND a name tag, in any order?

sorry, i know naught about Perl, and am just getting my feet wet with php.

thanks,
carlenevs
0
 
carlenevsAuthor Commented:
Never mind, i found out why it wasn't showing the images i wanted... but it does fail if there's a . (period) in the alt tag.

Still, how would I set this up to only replace if there's a name attribute as well?

carlene
0
 
eeBlueShadowCommented:
I assume that you would only want to use the name attribute if the alt attribute isn't there...

In that case, since the replace as it is only leaves img tags that don't have an 'alt' attribute, just do another replace to pick up the name attributes:

$replace1 = preg_replace("/<img.*alt=[\"']([\w._ ]*)[\"'][^>]*>/i","[IMG: $1]",$oldhtml);
$replace2 = preg_replace("/<img.*name=[\"']([\w._ ]*)[\"'][^>]*>/i","[IMG: $1]",$replace1);

If you want to see exactly how this pattern works, copy the following into notepad or another monospaced editor (it might look OK below, if might not, but it'll be perfectly understandable in the text editor:

##START

/<img.*alt=[\"']([\w_. ]*)[\"'][^>]*>/i
/____________________________________/_  - mark the start and end of the pattern
_<img__________________________________  - find this text
_____.*________________________________  - the dot means 'any character' and the * means 'any number of . in a row'
_______alt=____________________________  - followed by this text
___________[___]_______________________  - followed by one of these characters
____________\"'________________________  - these characters = " or ' - the backslash is to tell PHP this isn't the " closing the string
________________(________)_____________  - brackets mean 'mark whatever is inside as a backreference - more on that later
_________________[_____]*______________  - any one of these characters, as many as you can find in a row
__________________\w_. ________________  - \w = any letter or digit, or an underscore, full stop or space (the only characters allowed in an attribute)
__________________________[\"']________  - followed by another quote (single or double)
_______________________________[^>]*___  - the ^ means "a character that isn't in the following set" if the ^ is the first character inside the [], so as many non '>' as possible
____________________________________>__  - followed by a closing tag
______________________________________i  - 'i' after the closing pattern tag means 'match any of those upper and lower case'

In the replace tag, the $1 is a special identifier that is replaced with the contents of the first set of brackets in the pattern. A $2 would be replace by the second set of brackets, if there were another set

##END
0
 
eeBlueShadowCommented:
By the way, the above two lines also fix the problem with periods in the tags not working.

_Blue
0
 
eeBlueShadowCommented:
Ah, I didn't read your comment properly (need to stop doing that ;))

If you only wanted to test for tags with both a name and an alt, it becomes more tricky, because you can't guarantee the order they attributes will appear in the tag. It's most likely possible, but I can't think of an easy way to do it at the moment.

_Blue
0
 
carlenevsAuthor Commented:
Hmm.

THanks for the tutorial! i'll have to study it, and maybe someday i'll figure it out....

what i was meaning with the alt and the name attributes is thus:

if the img has a name attribute, replace the tag with the alt, if no name attribute is present, leave the image.

so,

<img src="blah" name="pic" alt="REPLACED"> would change to REPLACED

but

<img src="blah2" alt="SAME"> would stay as an image tag

Does that make more sense?
0
 
carlenevsAuthor Commented:
Yeah, we posted at the same time.

Thanks for your help!

Carlene
0
 
eeBlueShadowCommented:
OK, that looks possible, and again could be solved in 2 replaces, simply by considering both possibilities - name then alt or alt then name. This is a shoddy method though, you couldn't easily extent it to depending on 3 attributes because it would need 6 replaces - 4 attributes would need 24 replaces! It should be ok for what you need it for though:

$replace1 = preg_replace("/<img[^>]+name=[\"'][\w._ ]*[\"'][^>]+alt=[\"']([\w._ ]*)[\"'][^>]*>/i","[IMG: $1]",$oldhtml);
$newhtml = preg_replace("/<img[^>]+alt=[\"']([\w._ ]*)[\"'][^>]+name=[\"'][\w._ ]*[\"'][^>]*>/i","[IMG: $1]",$replace1);

should work. If you want only a particular name to trigger the replacement, switch both
name=[\"'][\w._ ]*[\"']
with
name=[\"']triggerName[\"']

The patterns have been modified a bit from the original, if you want to try to work out why, figuring some regular expressions out is the best way to learn about them

_Blue
0
 
eeBlueShadowCommented:
I'll even give you a hint: a + sign means 'the previous character one or more times'

_Blue
0
 
carlenevsAuthor Commented:
YES!

That works perfectly! thank you!!

i thought i'd tried that before with one of the earlier iterations, and couldn't get it to work... oh well, it's working now!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.