Solved

Problem with RSS File

Posted on 2014-02-18
14
321 Views
Last Modified: 2014-02-19
When I use the following command to get the title elements out of a rss file, it works great.
perl -lne 'print $2 while /title=("|&#34;)(.*?)\1/g' < $time.txt > $time.title.txt

Open in new window


When I attempt to use:
perl -lne 'print $2 while /link=("|&#34;)(.*?)\1/g' < $time.txt > $time.title.txt

Open in new window


It produces a blank file. I have checked, and there are <link> and </link> tags in the document. Would the text that makes up a URL be hampering the search? If so, how do you insulate the search query?
0
Comment
Question by:stakor
  • 7
  • 7
14 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 39869602
Do your links look like
link="http://www.experts-exchange.com/Q_28368410.html"
?
Shouldn't they be more like
<link>http://www.experts-exchange.com/Q_28368410.html</link>
0
 

Author Comment

by:stakor
ID: 39869613
They look like:

<link>http://www.blah.com/</link>
0
 

Author Comment

by:stakor
ID: 39869617
0
 
LVL 84

Expert Comment

by:ozo
ID: 39869624
perl -lne 'print $1 while /<link>(.*?)<\/link>/g' < $time.txt > $time.title.txt
0
 
LVL 84

Expert Comment

by:ozo
ID: 39869635
I'm also wondering what your /title=/ is matching?
0
 

Author Comment

by:stakor
ID: 39869666
Here is one entry:

Will you give me a hug?

Mostly just plain text.
0
 

Author Comment

by:stakor
ID: 39869668
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 84

Expert Comment

by:ozo
ID: 39869701
So you want the <img title="..."> attributes, not the <title>...</title> elements?
Since there are no <img link="..."> attributes, I would not expect /link=("|&#34;)(.*?)\1/ to find much
0
 

Author Comment

by:stakor
ID: 39869718
Honestly, I am looking for links that lead to comments pages, which should be under item. I was just hoping for a list of urls, that I would then grep through to find just the pages with comments...
0
 
LVL 84

Expert Comment

by:ozo
ID: 39869725
Do you mean the /<link>(.*?)</link>/ elements?
Or the /href=("|&#34;)(.*?)\1/ attributes?
0
 

Author Comment

by:stakor
ID: 39869737
I believe they would be the /<link>(.*?)</link>/ attributes. For instance,

http://www.reddit.com/r/funny/comments/1ya2kp/we_got_a_snowstorm_last_night_my_female_bosss/

should be pulled.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39869751
I don't see http://www.reddit.com/r/funny/comments/1ya2kp/we_got_a_snowstorm_last_night_my_female_bosss/ in http://www.reddit.com/.rss
What I get from  /<link>(.*?)<\/link>/g is
http://www.reddit.com/
http://www.reddit.com/
http://www.reddit.com/r/movies/comments/1ybd32/guardians_of_the_galaxy_full_trailer/
http://www.reddit.com/r/AdviceAnimals/comments/1yb550/when_netflix_stops_to_ask_me_if_im_still_watching/
http://www.reddit.com/r/aww/comments/1yb1iy/will_you_give_me_a_hug/
http://www.reddit.com/r/funny/comments/1yb7fk/you_just_cant_argue_with_numbers/
http://www.reddit.com/r/pics/comments/1yay42/heres_that_selfie_from_colbert/
http://www.reddit.com/r/todayilearned/comments/1yanso/til_that_north_korea_is_following_3_accounts_on/
http://www.reddit.com/r/worldnews/comments/1yb7um/leftwing_mexican_senators_on_tuesday_presented_an/
http://www.reddit.com/r/IAmA/comments/1yau5e/we_are_rocketjump_creators_of_video_game_high/
http://www.reddit.com/r/news/comments/1yae6h/hot_pockets_pulled_from_shelves_for_containing/
http://www.reddit.com/r/gaming/comments/1ya5ls/gamespot_149_nvidia_gtx_750_ti_plays_titanfall/
http://www.reddit.com/r/videos/comments/1y9rzs/i_always_thought_that_ski_jumping_and_ski_flying/
http://www.reddit.com/r/gifs/comments/1yagwh/raccoon_eating_a_grape/
http://www.reddit.com/r/science/comments/1y9m6w/a_neuroscientist_has_just_developed_an_app_that/
http://www.reddit.com/r/television/comments/1ya4zq/aziz_ansari_hannibal_buress_chelsea_peretti_and/
http://www.reddit.com/r/AskReddit/comments/1y9mjf/to_those_who_completed_a_rosetta_stone_language/
http://www.reddit.com/r/books/comments/1y9roj/kurt_vonnegut_diagrams_the_shape_of_all_stories/
http://www.reddit.com/r/EarthPorn/comments/1y9ra1/absolute_beauty_kazakhstan_1300x865/
http://www.reddit.com/r/explainlikeimfive/comments/1yb9xc/eli5_how_do_people_get_into_some_of_these_olympic/
http://www.reddit.com/r/Music/comments/1y95sk/rip_devo_member_robert_bob_2_casale/
http://www.reddit.com/r/bestof/comments/1y9alk/theshadowcat_gives_thorough_advice_on_starting_a/
http://www.reddit.com/r/technology/comments/1ybki9/security_researchers_have_discovered_a_flaw_in/
http://www.reddit.com/r/sports/comments/1y9uft/ibrahimovic_laser_goal_vs_leverkusen_xpost_rsoccer/
http://www.reddit.com/r/askscience/comments/1y8pvv/why_do_so_many_medications_have_hcl_in_them/
http://www.reddit.com/r/gifs/comments/1y9vyn/made_this_downvote_gif_the_other_day_i_am/
http://www.reddit.com/r/gifs/comments/1ya1a3/how_to_make_batman_cry/

 /href=("|&#34;)(.*?)\1/ gets several others, but not http://www.reddit.com/r/funny/comments/1ya2kp/we_got_a_snowstorm_last_night_my_female_bosss/
Can you show the context in which you found it?
0
 

Author Comment

by:stakor
ID: 39869762
It probably rotated off of the page. No big deal.

The

perl -lne 'print $2 while  /<link>(.*?)<\/link>/g' < $time.txt > $time.title.txt

line produces a text file that is 27 blank lines. I am using wget to retrieve the file. I wonder if there might be some sort of format change that is affecting it. The text is all there in the file that is retrieved.
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 39869856
perl -lne 'print $1 while /<link>(.*?)<\/link>/g' < $time.txt > $time.title.txt
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Suggested Solutions

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now