Solved

images from rss

Posted on 2014-01-06
5
212 Views
Last Modified: 2014-01-07
Ok, slight change of pace. I was trying to automate scraping links from html file, but I think it might be easier to do it from a rss feed. It is a full rss feed. There will be lots of data in a xml format, but I am just looking for the image links. They would be in a structure similar to the following:

<img src="http://blah.com/blah.jpg" alt="Blah Blah" title="Blah Blah" /></a>

I would like to scrape each link one per line, so the result for the above example would be:

http://blah.com/blah.jpg
0
Comment
Question by:stakor
  • 3
  • 2
5 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 39761337
/src="(.*?)"/
0
 

Author Comment

by:stakor
ID: 39761343
perl -lne 'print for /src="(.*?)"/g' < text.txt doesn't output anything.
0
 

Author Comment

by:stakor
ID: 39761421
hmm, when I run just the test text, and not the whole file, I get the response I am looking for. Let me see if there is something obvious about the large rss file that would be throwing it off.
0
 

Author Comment

by:stakor
ID: 39761424
It would appear that for no real reason, I am getting a lot of

 img src=&#34;http://blah.com/test.jpg";

I was not expecting that formatting, at all.
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 39761441
perl -lne 'print $2 while /src=("|&#34;)(.*?)\1/g'
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
HTTP Error 502.2 - Bad Gateway 3 204
perl script help 12 104
Perl File::Find alternative 1 51
Control Number of Log Files -Perl 7 54
I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Hi friends,  in this video  I'll show you how new windows 10 user can learn the using of windows 10. Thank you.

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now