• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 225
  • Last Modified:

images from rss

Ok, slight change of pace. I was trying to automate scraping links from html file, but I think it might be easier to do it from a rss feed. It is a full rss feed. There will be lots of data in a xml format, but I am just looking for the image links. They would be in a structure similar to the following:

<img src="http://blah.com/blah.jpg" alt="Blah Blah" title="Blah Blah" /></a>

I would like to scrape each link one per line, so the result for the above example would be:

http://blah.com/blah.jpg
0
stakor
Asked:
stakor
  • 3
  • 2
1 Solution
 
ozoCommented:
/src="(.*?)"/
0
 
stakorAuthor Commented:
perl -lne 'print for /src="(.*?)"/g' < text.txt doesn't output anything.
0
 
stakorAuthor Commented:
hmm, when I run just the test text, and not the whole file, I get the response I am looking for. Let me see if there is something obvious about the large rss file that would be throwing it off.
0
 
stakorAuthor Commented:
It would appear that for no real reason, I am getting a lot of

 img src=&#34;http://blah.com/test.jpg";

I was not expecting that formatting, at all.
0
 
ozoCommented:
perl -lne 'print $2 while /src=("|&#34;)(.*?)\1/g'
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now