stakor
asked on
images from rss
Ok, slight change of pace. I was trying to automate scraping links from html file, but I think it might be easier to do it from a rss feed. It is a full rss feed. There will be lots of data in a xml format, but I am just looking for the image links. They would be in a structure similar to the following:
<img src="http://blah.com/blah.jpg" alt="Blah Blah" title="Blah Blah" /></a>
I would like to scrape each link one per line, so the result for the above example would be:
http://blah.com/blah.jpg
<img src="http://blah.com/blah.jpg" alt="Blah Blah" title="Blah Blah" /></a>
I would like to scrape each link one per line, so the result for the above example would be:
http://blah.com/blah.jpg
/src="(.*?)"/
ASKER
perl -lne 'print for /src="(.*?)"/g' < text.txt doesn't output anything.
ASKER
hmm, when I run just the test text, and not the whole file, I get the response I am looking for. Let me see if there is something obvious about the large rss file that would be throwing it off.
ASKER
It would appear that for no real reason, I am getting a lot of
img src="http://blah.com/test.jpg";
I was not expecting that formatting, at all.
img src="http://blah.com/test.jpg";
I was not expecting that formatting, at all.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.