Solved

Parse rss cffeed to capture byline

Posted on 2011-03-11
8
476 Views
Last Modified: 2012-06-27
RSS feeds from Yahoo don't display the author's name so it can be easily captured and there is no specific variable for it within the feed.

I want only the author's name so we can insert it into our news database and display it separately.

Of course the page from which this code originated had much more to it. But I don't see any reason to display all of it for this purpose.

For the page, we can assume the structure is always the same. I want to get only the text between the byline classes, and specifically the text after the the "By " and before the next <span> tag.
<!--- I'm using ColdFusion 8's CFFEED to get the full text. I use <CF_REextract to strip out what I don't want and end up with the following: --->

<div class="byline">
 <cite class="vcard">
  By Peter Yentel and Victory Merkens        <span class="fn org">By Peter Yentel and Victory Merkens</span>
 </cite>
 &ndash;
 <abbr title="2011-03-11T15:11:41-0800" class="timedate">Fri&nbsp;Mar&nbsp;11, 6:11&nbsp;pm&nbsp;ET</abbr></div>
 <!-- end .byline -->

Open in new window

0
Comment
Question by:Qsorb
  • 4
  • 3
8 Comments
 
LVL 14

Expert Comment

by:RickEpnet
Comment Utility
What is the URL of the RSS feed that might help us to help you.
0
 

Author Comment

by:Qsorb
Comment Utility
The URL's vary but an example is here:

http://news.yahoo.com/s/ap/20110312/ap_on_re_us/us_southwest_wildfires

Again, most of the parsing is done with <CF_REextract but to get more specific with capturing just the author name(s), I need help, and have it explained.
0
 
LVL 11

Expert Comment

by:Brijesh Chauhan
Comment Utility
You can use FINDNOCASE function to get it.. an example is below..

<cfset htmlToParse = '<div class="byline">
 <cite class="vcard">
  By Peter Yentel and Victory Merkens        <span class="fn org">By Peter Yentel and Victory Merkens</span>
 </cite>
 &ndash;
 <abbr title="2011-03-11T15:11:41-0800" class="timedate">Fri&nbsp;Mar&nbsp;11, 6:11&nbsp;pm&nbsp;ET</abbr></div>
 <!-- end .byline -->' />
 
<cfset strToCheck = '<div class="byline">' />
 
<cfset FOccurance = findNoCase('#strToCheck#','#htmlToParse#') />

<cfset EOccurance = findNoCase('</span>','#htmlToParse#',FOccurance) />

<cfset startFrom = FOccurance + len('#strToCheck#') />

<cfset count = EOccurance - startFrom /> 

<cfset reqString = MID('#htmlToParse#',startFrom,count) />

<cfdump var="#reqString#">

Open in new window


The above code will give you output as

<cite class="vcard"> By Peter Yentel and Victory Merkens <span class="fn org">By Peter Yentel and Victory Merkens

Explanation -> Fist look for <div class="byline"> and get it's location in the HTML, then strating from where you find this string, look for </span>.

Once you have the above numbers, use MID function to get the string between those.
0
 

Author Comment

by:Qsorb
Comment Utility
Yes, I got that far by myself. I need to be shown how to do the rest.

I would need just the text after ="vcard"> and before the next <span class="fn org">
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 11

Expert Comment

by:Brijesh Chauhan
Comment Utility
Well again use the same logic, here is the complete thing..

<cfset htmlToParse = '<div class="byline">
 <cite class="vcard">
  By Peter Yentel and Victory Merkens        <span class="fn org">By Peter Yentel and Victory Merkens</span>
 </cite>
 &ndash;
 <abbr title="2011-03-11T15:11:41-0800" class="timedate">Fri&nbsp;Mar&nbsp;11, 6:11&nbsp;pm&nbsp;ET</abbr></div>
 <!-- end .byline -->' />
 
<cfset strToCheck = '<div class="byline">' />
 
<cfset FOccurance = findNoCase('#strToCheck#','#htmlToParse#') />

<cfset EOccurance = findNoCase('</span>','#htmlToParse#',FOccurance) />

<cfset startFrom = FOccurance + len('#strToCheck#') />

<cfset count = EOccurance - startFrom /> 

<cfset reqString = MID('#htmlToParse#',startFrom,count) />

<cfdump var="#reqString#">

<cfset FOccurance = findNoCase('<cite class="vcard">','#reqString#') />

<cfset EOccurance = findNoCase('<span class="fn org">','#reqString#',FOccurance) />

<cfset startFrom =  FOccurance + len('<cite class="vcard">') />

<cfset count = EOccurance - startFrom />

<cfset author = MID('#reqString#',startFrom,count) />

<cfdump var="#author#">

<cfset count = len('#reqString#') - (EOccurance + len('<span class="fn org">') - 1) />

<cfset author1 = right('#reqString#',count) />

<cfdump var="#author1#">

Open in new window


This will return author and author1 both as -> By Peter Yentel and Victory Merkens
0
 

Author Comment

by:Qsorb
Comment Utility
Okay thanks. However I'm still not able to get just the name. Here's what I get with your latest suggestion:

<cite class="vcard"> By Peter Yentel and Victory Merkens <span class="fn org">By Peter Yentel and Victory Merkens By Peter Yentel and Victory Merkens By Peter Yentel and Victory Merkens
0
 
LVL 11

Accepted Solution

by:
Brijesh Chauhan earned 500 total points
Comment Utility
Yes, because there is a cfdump for the first string

<cfdump var="#reqString#">

remove this, line 21 of the code and just do a bit of formatting, try the below code, same as above, I have removed the not required cfdump and cleaned up the code and also added some formatting..

<cfset htmlToParse = '<div class="byline">
 <cite class="vcard">
  By Peter Yentel and Victory Merkens        <span class="fn org">By Peter Yentel and Victory Merkens</span>
 </cite>
 &ndash;
 <abbr title="2011-03-11T15:11:41-0800" class="timedate">Fri&nbsp;Mar&nbsp;11, 6:11&nbsp;pm&nbsp;ET</abbr></div>
 <!-- end .byline -->' />
 
<cfset strToCheck = '<div class="byline">' />
 
<cfset FOccurance = findNoCase('#strToCheck#','#htmlToParse#') />

<cfset EOccurance = findNoCase('</span>','#htmlToParse#',FOccurance) />

<cfset startFrom = FOccurance + len('#strToCheck#') />

<cfset count = EOccurance - startFrom /> 

<cfset reqString = MID('#htmlToParse#',startFrom,count) />

<cfset FOccurance = findNoCase('<cite class="vcard">','#reqString#') />

<cfset EOccurance = findNoCase('<span class="fn org">','#reqString#',FOccurance) />

<cfset startFrom =  FOccurance + len('<cite class="vcard">') />

<cfset count = EOccurance - startFrom />

<cfset author = MID('#reqString#',startFrom,count) />

<cfoutput> Author for the article is -> <b>#author# </b></cfoutput>

Open in new window

0
 

Author Closing Comment

by:Qsorb
Comment Utility
That does it. Thanks for the great help!
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
ColdFusion Web Service/WSDL Connection Issue 2 75
Using Coldfusion to create a CSV file 13 82
cfspreadsheet 15 68
ColdFusion Rereplace 3 61
The technique is by far very Simple! How we can export the ColdFusion query results to DOC file?  Well before writing this I researched a lot in Internet but did not found a good Answer anyways!  So i thought now i should share my small snippet w…
PROBLEM: How to add your own buttons to the bottom toolbar with paging info ( result count ). While creating a cfgrid, I ran into an issue where I wanted to embed my own custom buttons where the default ones ( insert / delete / etc… ) are for aes…
This tutorial demonstrates a quick way of adding group price to multiple Magento products.
This video explains how to create simple products associated to Magento configurable product and offers fast way of their generation with Store Manager for Magento tool.

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now