Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Parse rss cffeed to capture byline

Posted on 2011-03-11
8
486 Views
Last Modified: 2012-06-27
RSS feeds from Yahoo don't display the author's name so it can be easily captured and there is no specific variable for it within the feed.

I want only the author's name so we can insert it into our news database and display it separately.

Of course the page from which this code originated had much more to it. But I don't see any reason to display all of it for this purpose.

For the page, we can assume the structure is always the same. I want to get only the text between the byline classes, and specifically the text after the the "By " and before the next <span> tag.
<!--- I'm using ColdFusion 8's CFFEED to get the full text. I use <CF_REextract to strip out what I don't want and end up with the following: --->

<div class="byline">
 <cite class="vcard">
  By Peter Yentel and Victory Merkens        <span class="fn org">By Peter Yentel and Victory Merkens</span>
 </cite>
 &ndash;
 <abbr title="2011-03-11T15:11:41-0800" class="timedate">Fri&nbsp;Mar&nbsp;11, 6:11&nbsp;pm&nbsp;ET</abbr></div>
 <!-- end .byline -->

Open in new window

0
Comment
Question by:Qsorb
  • 4
  • 3
8 Comments
 
LVL 14

Expert Comment

by:RickEpnet
ID: 35114610
What is the URL of the RSS feed that might help us to help you.
0
 

Author Comment

by:Qsorb
ID: 35114688
The URL's vary but an example is here:

http://news.yahoo.com/s/ap/20110312/ap_on_re_us/us_southwest_wildfires

Again, most of the parsing is done with <CF_REextract but to get more specific with capturing just the author name(s), I need help, and have it explained.
0
 
LVL 11

Expert Comment

by:Brijesh Chauhan
ID: 35115673
You can use FINDNOCASE function to get it.. an example is below..

<cfset htmlToParse = '<div class="byline">
 <cite class="vcard">
  By Peter Yentel and Victory Merkens        <span class="fn org">By Peter Yentel and Victory Merkens</span>
 </cite>
 &ndash;
 <abbr title="2011-03-11T15:11:41-0800" class="timedate">Fri&nbsp;Mar&nbsp;11, 6:11&nbsp;pm&nbsp;ET</abbr></div>
 <!-- end .byline -->' />
 
<cfset strToCheck = '<div class="byline">' />
 
<cfset FOccurance = findNoCase('#strToCheck#','#htmlToParse#') />

<cfset EOccurance = findNoCase('</span>','#htmlToParse#',FOccurance) />

<cfset startFrom = FOccurance + len('#strToCheck#') />

<cfset count = EOccurance - startFrom /> 

<cfset reqString = MID('#htmlToParse#',startFrom,count) />

<cfdump var="#reqString#">

Open in new window


The above code will give you output as

<cite class="vcard"> By Peter Yentel and Victory Merkens <span class="fn org">By Peter Yentel and Victory Merkens

Explanation -> Fist look for <div class="byline"> and get it's location in the HTML, then strating from where you find this string, look for </span>.

Once you have the above numbers, use MID function to get the string between those.
0
Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

 

Author Comment

by:Qsorb
ID: 35118088
Yes, I got that far by myself. I need to be shown how to do the rest.

I would need just the text after ="vcard"> and before the next <span class="fn org">
0
 
LVL 11

Expert Comment

by:Brijesh Chauhan
ID: 35119322
Well again use the same logic, here is the complete thing..

<cfset htmlToParse = '<div class="byline">
 <cite class="vcard">
  By Peter Yentel and Victory Merkens        <span class="fn org">By Peter Yentel and Victory Merkens</span>
 </cite>
 &ndash;
 <abbr title="2011-03-11T15:11:41-0800" class="timedate">Fri&nbsp;Mar&nbsp;11, 6:11&nbsp;pm&nbsp;ET</abbr></div>
 <!-- end .byline -->' />
 
<cfset strToCheck = '<div class="byline">' />
 
<cfset FOccurance = findNoCase('#strToCheck#','#htmlToParse#') />

<cfset EOccurance = findNoCase('</span>','#htmlToParse#',FOccurance) />

<cfset startFrom = FOccurance + len('#strToCheck#') />

<cfset count = EOccurance - startFrom /> 

<cfset reqString = MID('#htmlToParse#',startFrom,count) />

<cfdump var="#reqString#">

<cfset FOccurance = findNoCase('<cite class="vcard">','#reqString#') />

<cfset EOccurance = findNoCase('<span class="fn org">','#reqString#',FOccurance) />

<cfset startFrom =  FOccurance + len('<cite class="vcard">') />

<cfset count = EOccurance - startFrom />

<cfset author = MID('#reqString#',startFrom,count) />

<cfdump var="#author#">

<cfset count = len('#reqString#') - (EOccurance + len('<span class="fn org">') - 1) />

<cfset author1 = right('#reqString#',count) />

<cfdump var="#author1#">

Open in new window


This will return author and author1 both as -> By Peter Yentel and Victory Merkens
0
 

Author Comment

by:Qsorb
ID: 35124259
Okay thanks. However I'm still not able to get just the name. Here's what I get with your latest suggestion:

<cite class="vcard"> By Peter Yentel and Victory Merkens <span class="fn org">By Peter Yentel and Victory Merkens By Peter Yentel and Victory Merkens By Peter Yentel and Victory Merkens
0
 
LVL 11

Accepted Solution

by:
Brijesh Chauhan earned 500 total points
ID: 35125088
Yes, because there is a cfdump for the first string

<cfdump var="#reqString#">

remove this, line 21 of the code and just do a bit of formatting, try the below code, same as above, I have removed the not required cfdump and cleaned up the code and also added some formatting..

<cfset htmlToParse = '<div class="byline">
 <cite class="vcard">
  By Peter Yentel and Victory Merkens        <span class="fn org">By Peter Yentel and Victory Merkens</span>
 </cite>
 &ndash;
 <abbr title="2011-03-11T15:11:41-0800" class="timedate">Fri&nbsp;Mar&nbsp;11, 6:11&nbsp;pm&nbsp;ET</abbr></div>
 <!-- end .byline -->' />
 
<cfset strToCheck = '<div class="byline">' />
 
<cfset FOccurance = findNoCase('#strToCheck#','#htmlToParse#') />

<cfset EOccurance = findNoCase('</span>','#htmlToParse#',FOccurance) />

<cfset startFrom = FOccurance + len('#strToCheck#') />

<cfset count = EOccurance - startFrom /> 

<cfset reqString = MID('#htmlToParse#',startFrom,count) />

<cfset FOccurance = findNoCase('<cite class="vcard">','#reqString#') />

<cfset EOccurance = findNoCase('<span class="fn org">','#reqString#',FOccurance) />

<cfset startFrom =  FOccurance + len('<cite class="vcard">') />

<cfset count = EOccurance - startFrom />

<cfset author = MID('#reqString#',startFrom,count) />

<cfoutput> Author for the article is -> <b>#author# </b></cfoutput>

Open in new window

0
 

Author Closing Comment

by:Qsorb
ID: 35144078
That does it. Thanks for the great help!
0

Featured Post

Free Tool: Postgres Monitoring System

A PHP and Perl based system to collect and display usage statistics from PostgreSQL databases.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Hi, I will be creating today a basic tutorial on how we can create a Mail Custom Function and use it where ever we want. The main advantage about creating a custom function is that we can accommodate a range of arguments to pass to the Function and …
PROBLEM: How to add your own buttons to the bottom toolbar with paging info ( result count ). While creating a cfgrid, I ran into an issue where I wanted to embed my own custom buttons where the default ones ( insert / delete / etc… ) are for aes…
Nobody understands Phishing better than an anti-spam company. That’s why we are providing Phishing Awareness Training to our customers. According to a report by Verizon, only 3% of targeted users report malicious emails to management. With compan…
In an interesting question (https://www.experts-exchange.com/questions/29008360/) here at Experts Exchange, a member asked how to split a single image into multiple images. The primary usage for this is to place many photographs on a flatbed scanner…

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question