Solved

images from rss

Posted on 2014-01-06
5
218 Views
Last Modified: 2014-01-07
Ok, slight change of pace. I was trying to automate scraping links from html file, but I think it might be easier to do it from a rss feed. It is a full rss feed. There will be lots of data in a xml format, but I am just looking for the image links. They would be in a structure similar to the following:

<img src="http://blah.com/blah.jpg" alt="Blah Blah" title="Blah Blah" /></a>

I would like to scrape each link one per line, so the result for the above example would be:

http://blah.com/blah.jpg
0
Comment
Question by:stakor
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
5 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 39761337
/src="(.*?)"/
0
 

Author Comment

by:stakor
ID: 39761343
perl -lne 'print for /src="(.*?)"/g' < text.txt doesn't output anything.
0
 

Author Comment

by:stakor
ID: 39761421
hmm, when I run just the test text, and not the whole file, I get the response I am looking for. Let me see if there is something obvious about the large rss file that would be throwing it off.
0
 

Author Comment

by:stakor
ID: 39761424
It would appear that for no real reason, I am getting a lot of

 img src=&#34;http://blah.com/test.jpg";

I was not expecting that formatting, at all.
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 39761441
perl -lne 'print $2 while /src=("|&#34;)(.*?)\1/g'
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question