Solved

Parse rss cffeed to capture byline

Posted on 2011-03-11
8
489 Views
Last Modified: 2012-06-27
RSS feeds from Yahoo don't display the author's name so it can be easily captured and there is no specific variable for it within the feed.

I want only the author's name so we can insert it into our news database and display it separately.

Of course the page from which this code originated had much more to it. But I don't see any reason to display all of it for this purpose.

For the page, we can assume the structure is always the same. I want to get only the text between the byline classes, and specifically the text after the the "By " and before the next <span> tag.
<!--- I'm using ColdFusion 8's CFFEED to get the full text. I use <CF_REextract to strip out what I don't want and end up with the following: --->

<div class="byline">
 <cite class="vcard">
  By Peter Yentel and Victory Merkens        <span class="fn org">By Peter Yentel and Victory Merkens</span>
 </cite>
 &ndash;
 <abbr title="2011-03-11T15:11:41-0800" class="timedate">Fri&nbsp;Mar&nbsp;11, 6:11&nbsp;pm&nbsp;ET</abbr></div>
 <!-- end .byline -->

Open in new window

0
Comment
Question by:Qsorb
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
8 Comments
 
LVL 14

Expert Comment

by:RickEpnet
ID: 35114610
What is the URL of the RSS feed that might help us to help you.
0
 

Author Comment

by:Qsorb
ID: 35114688
The URL's vary but an example is here:

http://news.yahoo.com/s/ap/20110312/ap_on_re_us/us_southwest_wildfires

Again, most of the parsing is done with <CF_REextract but to get more specific with capturing just the author name(s), I need help, and have it explained.
0
 
LVL 11

Expert Comment

by:Brijesh Chauhan
ID: 35115673
You can use FINDNOCASE function to get it.. an example is below..

<cfset htmlToParse = '<div class="byline">
 <cite class="vcard">
  By Peter Yentel and Victory Merkens        <span class="fn org">By Peter Yentel and Victory Merkens</span>
 </cite>
 &ndash;
 <abbr title="2011-03-11T15:11:41-0800" class="timedate">Fri&nbsp;Mar&nbsp;11, 6:11&nbsp;pm&nbsp;ET</abbr></div>
 <!-- end .byline -->' />
 
<cfset strToCheck = '<div class="byline">' />
 
<cfset FOccurance = findNoCase('#strToCheck#','#htmlToParse#') />

<cfset EOccurance = findNoCase('</span>','#htmlToParse#',FOccurance) />

<cfset startFrom = FOccurance + len('#strToCheck#') />

<cfset count = EOccurance - startFrom /> 

<cfset reqString = MID('#htmlToParse#',startFrom,count) />

<cfdump var="#reqString#">

Open in new window


The above code will give you output as

<cite class="vcard"> By Peter Yentel and Victory Merkens <span class="fn org">By Peter Yentel and Victory Merkens

Explanation -> Fist look for <div class="byline"> and get it's location in the HTML, then strating from where you find this string, look for </span>.

Once you have the above numbers, use MID function to get the string between those.
0
Webinar: Aligning, Automating, Winning

Join Dan Russo, Senior Manager of Operations Intelligence, for an in-depth discussion on how Dealertrack, leading provider of integrated digital solutions for the automotive industry, transformed their DevOps processes to increase collaboration and move with greater velocity.

 

Author Comment

by:Qsorb
ID: 35118088
Yes, I got that far by myself. I need to be shown how to do the rest.

I would need just the text after ="vcard"> and before the next <span class="fn org">
0
 
LVL 11

Expert Comment

by:Brijesh Chauhan
ID: 35119322
Well again use the same logic, here is the complete thing..

<cfset htmlToParse = '<div class="byline">
 <cite class="vcard">
  By Peter Yentel and Victory Merkens        <span class="fn org">By Peter Yentel and Victory Merkens</span>
 </cite>
 &ndash;
 <abbr title="2011-03-11T15:11:41-0800" class="timedate">Fri&nbsp;Mar&nbsp;11, 6:11&nbsp;pm&nbsp;ET</abbr></div>
 <!-- end .byline -->' />
 
<cfset strToCheck = '<div class="byline">' />
 
<cfset FOccurance = findNoCase('#strToCheck#','#htmlToParse#') />

<cfset EOccurance = findNoCase('</span>','#htmlToParse#',FOccurance) />

<cfset startFrom = FOccurance + len('#strToCheck#') />

<cfset count = EOccurance - startFrom /> 

<cfset reqString = MID('#htmlToParse#',startFrom,count) />

<cfdump var="#reqString#">

<cfset FOccurance = findNoCase('<cite class="vcard">','#reqString#') />

<cfset EOccurance = findNoCase('<span class="fn org">','#reqString#',FOccurance) />

<cfset startFrom =  FOccurance + len('<cite class="vcard">') />

<cfset count = EOccurance - startFrom />

<cfset author = MID('#reqString#',startFrom,count) />

<cfdump var="#author#">

<cfset count = len('#reqString#') - (EOccurance + len('<span class="fn org">') - 1) />

<cfset author1 = right('#reqString#',count) />

<cfdump var="#author1#">

Open in new window


This will return author and author1 both as -> By Peter Yentel and Victory Merkens
0
 

Author Comment

by:Qsorb
ID: 35124259
Okay thanks. However I'm still not able to get just the name. Here's what I get with your latest suggestion:

<cite class="vcard"> By Peter Yentel and Victory Merkens <span class="fn org">By Peter Yentel and Victory Merkens By Peter Yentel and Victory Merkens By Peter Yentel and Victory Merkens
0
 
LVL 11

Accepted Solution

by:
Brijesh Chauhan earned 500 total points
ID: 35125088
Yes, because there is a cfdump for the first string

<cfdump var="#reqString#">

remove this, line 21 of the code and just do a bit of formatting, try the below code, same as above, I have removed the not required cfdump and cleaned up the code and also added some formatting..

<cfset htmlToParse = '<div class="byline">
 <cite class="vcard">
  By Peter Yentel and Victory Merkens        <span class="fn org">By Peter Yentel and Victory Merkens</span>
 </cite>
 &ndash;
 <abbr title="2011-03-11T15:11:41-0800" class="timedate">Fri&nbsp;Mar&nbsp;11, 6:11&nbsp;pm&nbsp;ET</abbr></div>
 <!-- end .byline -->' />
 
<cfset strToCheck = '<div class="byline">' />
 
<cfset FOccurance = findNoCase('#strToCheck#','#htmlToParse#') />

<cfset EOccurance = findNoCase('</span>','#htmlToParse#',FOccurance) />

<cfset startFrom = FOccurance + len('#strToCheck#') />

<cfset count = EOccurance - startFrom /> 

<cfset reqString = MID('#htmlToParse#',startFrom,count) />

<cfset FOccurance = findNoCase('<cite class="vcard">','#reqString#') />

<cfset EOccurance = findNoCase('<span class="fn org">','#reqString#',FOccurance) />

<cfset startFrom =  FOccurance + len('<cite class="vcard">') />

<cfset count = EOccurance - startFrom />

<cfset author = MID('#reqString#',startFrom,count) />

<cfoutput> Author for the article is -> <b>#author# </b></cfoutput>

Open in new window

0
 

Author Closing Comment

by:Qsorb
ID: 35144078
That does it. Thanks for the great help!
0

Featured Post

Stressed Out?

Watch some penguins on the livecam!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The technique is by far very Simple! How we can export the ColdFusion query results to DOC file?  Well before writing this I researched a lot in Internet but did not found a good Answer anyways!  So i thought now i should share my small snippet w…
PROBLEM:  How to open a cfwindow or run a function on double click of a cfgrid row. One of my clients wanted to be able to double click on a row item to get more detailed information about a transaction and to be able to modify the line items i…
In this video we outline the Physical Segments view of NetCrunch network monitor. By following this brief how-to video, you will be able to learn how NetCrunch visualizes your network, how granular is the information collected, as well as where to f…
Do you want to know how to make a graph with Microsoft Access? First, create a query with the data for the chart. Then make a blank form and add a chart control. This video also shows how to change what data is displayed on the graph as well as form…

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question