Solved

images from rss

Posted on 2014-01-06
5
219 Views
Last Modified: 2014-01-07
Ok, slight change of pace. I was trying to automate scraping links from html file, but I think it might be easier to do it from a rss feed. It is a full rss feed. There will be lots of data in a xml format, but I am just looking for the image links. They would be in a structure similar to the following:

<img src="http://blah.com/blah.jpg" alt="Blah Blah" title="Blah Blah" /></a>

I would like to scrape each link one per line, so the result for the above example would be:

http://blah.com/blah.jpg
0
Comment
Question by:stakor
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
5 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 39761337
/src="(.*?)"/
0
 

Author Comment

by:stakor
ID: 39761343
perl -lne 'print for /src="(.*?)"/g' < text.txt doesn't output anything.
0
 

Author Comment

by:stakor
ID: 39761421
hmm, when I run just the test text, and not the whole file, I get the response I am looking for. Let me see if there is something obvious about the large rss file that would be throwing it off.
0
 

Author Comment

by:stakor
ID: 39761424
It would appear that for no real reason, I am getting a lot of

 img src=&#34;http://blah.com/test.jpg";

I was not expecting that formatting, at all.
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 39761441
perl -lne 'print $2 while /src=("|&#34;)(.*?)\1/g'
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

691 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question