Solved

images from rss

Posted on 2014-01-06
5
210 Views
Last Modified: 2014-01-07
Ok, slight change of pace. I was trying to automate scraping links from html file, but I think it might be easier to do it from a rss feed. It is a full rss feed. There will be lots of data in a xml format, but I am just looking for the image links. They would be in a structure similar to the following:

<img src="http://blah.com/blah.jpg" alt="Blah Blah" title="Blah Blah" /></a>

I would like to scrape each link one per line, so the result for the above example would be:

http://blah.com/blah.jpg
0
Comment
Question by:stakor
  • 3
  • 2
5 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 39761337
/src="(.*?)"/
0
 

Author Comment

by:stakor
ID: 39761343
perl -lne 'print for /src="(.*?)"/g' < text.txt doesn't output anything.
0
 

Author Comment

by:stakor
ID: 39761421
hmm, when I run just the test text, and not the whole file, I get the response I am looking for. Let me see if there is something obvious about the large rss file that would be throwing it off.
0
 

Author Comment

by:stakor
ID: 39761424
It would appear that for no real reason, I am getting a lot of

 img src=&#34;http://blah.com/test.jpg&#34;

I was not expecting that formatting, at all.
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 39761441
perl -lne 'print $2 while /src=("|&#34;)(.*?)\1/g'
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now