?
Solved

Parse rss cffeed to capture byline

Posted on 2011-03-11
8
Medium Priority
?
490 Views
Last Modified: 2012-06-27
RSS feeds from Yahoo don't display the author's name so it can be easily captured and there is no specific variable for it within the feed.

I want only the author's name so we can insert it into our news database and display it separately.

Of course the page from which this code originated had much more to it. But I don't see any reason to display all of it for this purpose.

For the page, we can assume the structure is always the same. I want to get only the text between the byline classes, and specifically the text after the the "By " and before the next <span> tag.
<!--- I'm using ColdFusion 8's CFFEED to get the full text. I use <CF_REextract to strip out what I don't want and end up with the following: --->

<div class="byline">
 <cite class="vcard">
  By Peter Yentel and Victory Merkens        <span class="fn org">By Peter Yentel and Victory Merkens</span>
 </cite>
 &ndash;
 <abbr title="2011-03-11T15:11:41-0800" class="timedate">Fri&nbsp;Mar&nbsp;11, 6:11&nbsp;pm&nbsp;ET</abbr></div>
 <!-- end .byline -->

Open in new window

0
Comment
Question by:Qsorb
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
8 Comments
 
LVL 14

Expert Comment

by:RickEpnet
ID: 35114610
What is the URL of the RSS feed that might help us to help you.
0
 

Author Comment

by:Qsorb
ID: 35114688
The URL's vary but an example is here:

http://news.yahoo.com/s/ap/20110312/ap_on_re_us/us_southwest_wildfires

Again, most of the parsing is done with <CF_REextract but to get more specific with capturing just the author name(s), I need help, and have it explained.
0
 
LVL 11

Expert Comment

by:Brijesh Chauhan
ID: 35115673
You can use FINDNOCASE function to get it.. an example is below..

<cfset htmlToParse = '<div class="byline">
 <cite class="vcard">
  By Peter Yentel and Victory Merkens        <span class="fn org">By Peter Yentel and Victory Merkens</span>
 </cite>
 &ndash;
 <abbr title="2011-03-11T15:11:41-0800" class="timedate">Fri&nbsp;Mar&nbsp;11, 6:11&nbsp;pm&nbsp;ET</abbr></div>
 <!-- end .byline -->' />
 
<cfset strToCheck = '<div class="byline">' />
 
<cfset FOccurance = findNoCase('#strToCheck#','#htmlToParse#') />

<cfset EOccurance = findNoCase('</span>','#htmlToParse#',FOccurance) />

<cfset startFrom = FOccurance + len('#strToCheck#') />

<cfset count = EOccurance - startFrom /> 

<cfset reqString = MID('#htmlToParse#',startFrom,count) />

<cfdump var="#reqString#">

Open in new window


The above code will give you output as

<cite class="vcard"> By Peter Yentel and Victory Merkens <span class="fn org">By Peter Yentel and Victory Merkens

Explanation -> Fist look for <div class="byline"> and get it's location in the HTML, then strating from where you find this string, look for </span>.

Once you have the above numbers, use MID function to get the string between those.
0
Certified OpenStack Administrator Course

We just refreshed our COA course based on the Newton exam.  With 14 labs, this course goes over the different OpenStack services that are part of the certification: Dashboard, Identity Service, Image Service, Networking, Compute, Object Storage, Block Storage, and Orchestration.

 

Author Comment

by:Qsorb
ID: 35118088
Yes, I got that far by myself. I need to be shown how to do the rest.

I would need just the text after ="vcard"> and before the next <span class="fn org">
0
 
LVL 11

Expert Comment

by:Brijesh Chauhan
ID: 35119322
Well again use the same logic, here is the complete thing..

<cfset htmlToParse = '<div class="byline">
 <cite class="vcard">
  By Peter Yentel and Victory Merkens        <span class="fn org">By Peter Yentel and Victory Merkens</span>
 </cite>
 &ndash;
 <abbr title="2011-03-11T15:11:41-0800" class="timedate">Fri&nbsp;Mar&nbsp;11, 6:11&nbsp;pm&nbsp;ET</abbr></div>
 <!-- end .byline -->' />
 
<cfset strToCheck = '<div class="byline">' />
 
<cfset FOccurance = findNoCase('#strToCheck#','#htmlToParse#') />

<cfset EOccurance = findNoCase('</span>','#htmlToParse#',FOccurance) />

<cfset startFrom = FOccurance + len('#strToCheck#') />

<cfset count = EOccurance - startFrom /> 

<cfset reqString = MID('#htmlToParse#',startFrom,count) />

<cfdump var="#reqString#">

<cfset FOccurance = findNoCase('<cite class="vcard">','#reqString#') />

<cfset EOccurance = findNoCase('<span class="fn org">','#reqString#',FOccurance) />

<cfset startFrom =  FOccurance + len('<cite class="vcard">') />

<cfset count = EOccurance - startFrom />

<cfset author = MID('#reqString#',startFrom,count) />

<cfdump var="#author#">

<cfset count = len('#reqString#') - (EOccurance + len('<span class="fn org">') - 1) />

<cfset author1 = right('#reqString#',count) />

<cfdump var="#author1#">

Open in new window


This will return author and author1 both as -> By Peter Yentel and Victory Merkens
0
 

Author Comment

by:Qsorb
ID: 35124259
Okay thanks. However I'm still not able to get just the name. Here's what I get with your latest suggestion:

<cite class="vcard"> By Peter Yentel and Victory Merkens <span class="fn org">By Peter Yentel and Victory Merkens By Peter Yentel and Victory Merkens By Peter Yentel and Victory Merkens
0
 
LVL 11

Accepted Solution

by:
Brijesh Chauhan earned 2000 total points
ID: 35125088
Yes, because there is a cfdump for the first string

<cfdump var="#reqString#">

remove this, line 21 of the code and just do a bit of formatting, try the below code, same as above, I have removed the not required cfdump and cleaned up the code and also added some formatting..

<cfset htmlToParse = '<div class="byline">
 <cite class="vcard">
  By Peter Yentel and Victory Merkens        <span class="fn org">By Peter Yentel and Victory Merkens</span>
 </cite>
 &ndash;
 <abbr title="2011-03-11T15:11:41-0800" class="timedate">Fri&nbsp;Mar&nbsp;11, 6:11&nbsp;pm&nbsp;ET</abbr></div>
 <!-- end .byline -->' />
 
<cfset strToCheck = '<div class="byline">' />
 
<cfset FOccurance = findNoCase('#strToCheck#','#htmlToParse#') />

<cfset EOccurance = findNoCase('</span>','#htmlToParse#',FOccurance) />

<cfset startFrom = FOccurance + len('#strToCheck#') />

<cfset count = EOccurance - startFrom /> 

<cfset reqString = MID('#htmlToParse#',startFrom,count) />

<cfset FOccurance = findNoCase('<cite class="vcard">','#reqString#') />

<cfset EOccurance = findNoCase('<span class="fn org">','#reqString#',FOccurance) />

<cfset startFrom =  FOccurance + len('<cite class="vcard">') />

<cfset count = EOccurance - startFrom />

<cfset author = MID('#reqString#',startFrom,count) />

<cfoutput> Author for the article is -> <b>#author# </b></cfoutput>

Open in new window

0
 

Author Closing Comment

by:Qsorb
ID: 35144078
That does it. Thanks for the great help!
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Did you know SD-WANs can improve network connectivity? Check out this webinar to learn how an SD-WAN simplified, one-click tool can help you migrate and manage data in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

PROBLEM: How to add your own buttons to the bottom toolbar with paging info ( result count ). While creating a cfgrid, I ran into an issue where I wanted to embed my own custom buttons where the default ones ( insert / delete / etc… ) are for aes…
Hi, Even though I have created this Tutorial on My personal Blog, Some people might not able to find my website, So here i am posting it again Today, from the topic it is very clear that i will be showing you here the very basic usage of how we …
In this video we outline the Physical Segments view of NetCrunch network monitor. By following this brief how-to video, you will be able to learn how NetCrunch visualizes your network, how granular is the information collected, as well as where to f…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question