Link to home
Start Free TrialLog in
Avatar of stakor
stakorFlag for United States of America

asked on

images from rss

Ok, slight change of pace. I was trying to automate scraping links from html file, but I think it might be easier to do it from a rss feed. It is a full rss feed. There will be lots of data in a xml format, but I am just looking for the image links. They would be in a structure similar to the following:

<img src="http://blah.com/blah.jpg" alt="Blah Blah" title="Blah Blah" /></a>

I would like to scrape each link one per line, so the result for the above example would be:

http://blah.com/blah.jpg
Avatar of ozo
ozo
Flag of United States of America image

/src="(.*?)"/
Avatar of stakor

ASKER

perl -lne 'print for /src="(.*?)"/g' < text.txt doesn't output anything.
Avatar of stakor

ASKER

hmm, when I run just the test text, and not the whole file, I get the response I am looking for. Let me see if there is something obvious about the large rss file that would be throwing it off.
Avatar of stakor

ASKER

It would appear that for no real reason, I am getting a lot of

 img src=&#34;http://blah.com/test.jpg";

I was not expecting that formatting, at all.
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial