Solved

images from rss

Posted on 2014-01-06
5
213 Views
Last Modified: 2014-01-07
Ok, slight change of pace. I was trying to automate scraping links from html file, but I think it might be easier to do it from a rss feed. It is a full rss feed. There will be lots of data in a xml format, but I am just looking for the image links. They would be in a structure similar to the following:

<img src="http://blah.com/blah.jpg" alt="Blah Blah" title="Blah Blah" /></a>

I would like to scrape each link one per line, so the result for the above example would be:

http://blah.com/blah.jpg
0
Comment
Question by:stakor
  • 3
  • 2
5 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 39761337
/src="(.*?)"/
0
 

Author Comment

by:stakor
ID: 39761343
perl -lne 'print for /src="(.*?)"/g' < text.txt doesn't output anything.
0
 

Author Comment

by:stakor
ID: 39761421
hmm, when I run just the test text, and not the whole file, I get the response I am looking for. Let me see if there is something obvious about the large rss file that would be throwing it off.
0
 

Author Comment

by:stakor
ID: 39761424
It would appear that for no real reason, I am getting a lot of

 img src=&#34;http://blah.com/test.jpg";

I was not expecting that formatting, at all.
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 39761441
perl -lne 'print $2 while /src=("|&#34;)(.*?)\1/g'
0

Featured Post

Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Uploading Binary Data using Perl 5 90
Perl, group, sort, count question 6 133
perl script to check whether folder contains any files 5 83
Perl Awk Need Help 3 118
I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question