Solved

Parse rss cffeed to capture byline

Posted on 2011-03-11
8
487 Views
Last Modified: 2012-06-27
RSS feeds from Yahoo don't display the author's name so it can be easily captured and there is no specific variable for it within the feed.

I want only the author's name so we can insert it into our news database and display it separately.

Of course the page from which this code originated had much more to it. But I don't see any reason to display all of it for this purpose.

For the page, we can assume the structure is always the same. I want to get only the text between the byline classes, and specifically the text after the the "By " and before the next <span> tag.
<!--- I'm using ColdFusion 8's CFFEED to get the full text. I use <CF_REextract to strip out what I don't want and end up with the following: --->

<div class="byline">
 <cite class="vcard">
  By Peter Yentel and Victory Merkens        <span class="fn org">By Peter Yentel and Victory Merkens</span>
 </cite>
 &ndash;
 <abbr title="2011-03-11T15:11:41-0800" class="timedate">Fri&nbsp;Mar&nbsp;11, 6:11&nbsp;pm&nbsp;ET</abbr></div>
 <!-- end .byline -->

Open in new window

0
Comment
Question by:Qsorb
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
8 Comments
 
LVL 14

Expert Comment

by:RickEpnet
ID: 35114610
What is the URL of the RSS feed that might help us to help you.
0
 

Author Comment

by:Qsorb
ID: 35114688
The URL's vary but an example is here:

http://news.yahoo.com/s/ap/20110312/ap_on_re_us/us_southwest_wildfires

Again, most of the parsing is done with <CF_REextract but to get more specific with capturing just the author name(s), I need help, and have it explained.
0
 
LVL 11

Expert Comment

by:Brijesh Chauhan
ID: 35115673
You can use FINDNOCASE function to get it.. an example is below..

<cfset htmlToParse = '<div class="byline">
 <cite class="vcard">
  By Peter Yentel and Victory Merkens        <span class="fn org">By Peter Yentel and Victory Merkens</span>
 </cite>
 &ndash;
 <abbr title="2011-03-11T15:11:41-0800" class="timedate">Fri&nbsp;Mar&nbsp;11, 6:11&nbsp;pm&nbsp;ET</abbr></div>
 <!-- end .byline -->' />
 
<cfset strToCheck = '<div class="byline">' />
 
<cfset FOccurance = findNoCase('#strToCheck#','#htmlToParse#') />

<cfset EOccurance = findNoCase('</span>','#htmlToParse#',FOccurance) />

<cfset startFrom = FOccurance + len('#strToCheck#') />

<cfset count = EOccurance - startFrom /> 

<cfset reqString = MID('#htmlToParse#',startFrom,count) />

<cfdump var="#reqString#">

Open in new window


The above code will give you output as

<cite class="vcard"> By Peter Yentel and Victory Merkens <span class="fn org">By Peter Yentel and Victory Merkens

Explanation -> Fist look for <div class="byline"> and get it's location in the HTML, then strating from where you find this string, look for </span>.

Once you have the above numbers, use MID function to get the string between those.
0
Webinar: Aligning, Automating, Winning

Join Dan Russo, Senior Manager of Operations Intelligence, for an in-depth discussion on how Dealertrack, leading provider of integrated digital solutions for the automotive industry, transformed their DevOps processes to increase collaboration and move with greater velocity.

 

Author Comment

by:Qsorb
ID: 35118088
Yes, I got that far by myself. I need to be shown how to do the rest.

I would need just the text after ="vcard"> and before the next <span class="fn org">
0
 
LVL 11

Expert Comment

by:Brijesh Chauhan
ID: 35119322
Well again use the same logic, here is the complete thing..

<cfset htmlToParse = '<div class="byline">
 <cite class="vcard">
  By Peter Yentel and Victory Merkens        <span class="fn org">By Peter Yentel and Victory Merkens</span>
 </cite>
 &ndash;
 <abbr title="2011-03-11T15:11:41-0800" class="timedate">Fri&nbsp;Mar&nbsp;11, 6:11&nbsp;pm&nbsp;ET</abbr></div>
 <!-- end .byline -->' />
 
<cfset strToCheck = '<div class="byline">' />
 
<cfset FOccurance = findNoCase('#strToCheck#','#htmlToParse#') />

<cfset EOccurance = findNoCase('</span>','#htmlToParse#',FOccurance) />

<cfset startFrom = FOccurance + len('#strToCheck#') />

<cfset count = EOccurance - startFrom /> 

<cfset reqString = MID('#htmlToParse#',startFrom,count) />

<cfdump var="#reqString#">

<cfset FOccurance = findNoCase('<cite class="vcard">','#reqString#') />

<cfset EOccurance = findNoCase('<span class="fn org">','#reqString#',FOccurance) />

<cfset startFrom =  FOccurance + len('<cite class="vcard">') />

<cfset count = EOccurance - startFrom />

<cfset author = MID('#reqString#',startFrom,count) />

<cfdump var="#author#">

<cfset count = len('#reqString#') - (EOccurance + len('<span class="fn org">') - 1) />

<cfset author1 = right('#reqString#',count) />

<cfdump var="#author1#">

Open in new window


This will return author and author1 both as -> By Peter Yentel and Victory Merkens
0
 

Author Comment

by:Qsorb
ID: 35124259
Okay thanks. However I'm still not able to get just the name. Here's what I get with your latest suggestion:

<cite class="vcard"> By Peter Yentel and Victory Merkens <span class="fn org">By Peter Yentel and Victory Merkens By Peter Yentel and Victory Merkens By Peter Yentel and Victory Merkens
0
 
LVL 11

Accepted Solution

by:
Brijesh Chauhan earned 500 total points
ID: 35125088
Yes, because there is a cfdump for the first string

<cfdump var="#reqString#">

remove this, line 21 of the code and just do a bit of formatting, try the below code, same as above, I have removed the not required cfdump and cleaned up the code and also added some formatting..

<cfset htmlToParse = '<div class="byline">
 <cite class="vcard">
  By Peter Yentel and Victory Merkens        <span class="fn org">By Peter Yentel and Victory Merkens</span>
 </cite>
 &ndash;
 <abbr title="2011-03-11T15:11:41-0800" class="timedate">Fri&nbsp;Mar&nbsp;11, 6:11&nbsp;pm&nbsp;ET</abbr></div>
 <!-- end .byline -->' />
 
<cfset strToCheck = '<div class="byline">' />
 
<cfset FOccurance = findNoCase('#strToCheck#','#htmlToParse#') />

<cfset EOccurance = findNoCase('</span>','#htmlToParse#',FOccurance) />

<cfset startFrom = FOccurance + len('#strToCheck#') />

<cfset count = EOccurance - startFrom /> 

<cfset reqString = MID('#htmlToParse#',startFrom,count) />

<cfset FOccurance = findNoCase('<cite class="vcard">','#reqString#') />

<cfset EOccurance = findNoCase('<span class="fn org">','#reqString#',FOccurance) />

<cfset startFrom =  FOccurance + len('<cite class="vcard">') />

<cfset count = EOccurance - startFrom />

<cfset author = MID('#reqString#',startFrom,count) />

<cfoutput> Author for the article is -> <b>#author# </b></cfoutput>

Open in new window

0
 

Author Closing Comment

by:Qsorb
ID: 35144078
That does it. Thanks for the great help!
0

Featured Post

Space-Age Communications Transitions to DevOps

ViaSat, a global provider of satellite and wireless communications, securely connects businesses, governments, and organizations to the Internet. Learn how ViaSat’s Network Solutions Engineer, drove the transition from a traditional network support to a DevOps-centric model.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Hi. There are several upload tutorials using jquery and coldfusion. I found a very interesting one here Upload Your Files using Jquery & ColdFusion and Preview them (http://www.randhawaworld.com/) . I did keep the main js functions but made sever…
Recently while working on a project I got a very annoying cfdocument has no body error message. I had never seen this error before. So I checked the code. The code was pretty simple; it was Just showing me the cfdocumnt tag and inside that tag a …
In an interesting question (https://www.experts-exchange.com/questions/29008360/) here at Experts Exchange, a member asked how to split a single image into multiple images. The primary usage for this is to place many photographs on a flatbed scanner…

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question