Solved

How to  clean this feed?

Posted on 2010-09-16
14
2,408 Views
Last Modified: 2013-11-18
Hi,

At www.groenerekenkamer.nl/milieublogs I run a page with the feeds of several blogs. As you can see both the 'GreenieWatch' feed and the 'FoodHealthSkeptic'-feed contain leftover tags. Since other feeds do not have this I assume it is the feeds fault. I would like to contact te producer, but I do not have the idea that he would know a solution. Do you?
0
Comment
Question by:TheoRichel
  • 5
  • 3
  • 3
  • +2
14 Comments
 

Expert Comment

by:iliyas_patel
ID: 33689616
Use a proper standard instruction and make sure at the time of running your previous data/record should be clear
0
 

Expert Comment

by:iliyas_patel
ID: 33689623
great
0
 

Expert Comment

by:iliyas86
ID: 33689666
thanks
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 110

Expert Comment

by:Ray Paseur
ID: 33690032
Please see the code snippet.  You will see things with strings like & l t ;  b r  & g t ;

These are "entitized" HTML tags, and that is usually the right way to put tags into an XML string like an RSS feed.  If you want to scrub this out of the feed, I will be glad to show you how.  Please post a link to the source of the data and I can show you a simple PHP script that will clean it up.  Hopefully you can integrate that in to Drupal.
<div class="block block-aggregator" id="block-aggregator-feed-6">
          <h2 class="title">Food and Health Skeptic</h2>
        <div class="content"><div class="item-list"><ul><li class="first"><a href="http://john-ray.blogspot.com/2010/09/autism-drug-has-some-promise-this-is.html">&lt;br&gt;&lt;br /&gt;&lt;b&gt;Autism  drug has some</a>
</li>

<li><a href="http://john-ray.blogspot.com/2010/09/anti-mcdonalds-ad-angers-fast-food.html">&lt;br&gt;&lt;br /&gt;&lt;b&gt;Anti-McDonald&#039;s ad angers</a>
</li>
<li><a href="http://john-ray.blogspot.com/2010/09/study-finds-people-with-lots-of-friends.html">&lt;br&gt;&lt;br /&gt;&lt;b&gt;Study finds people with</a>
</li>
<li><a href="http://john-ray.blogspot.com/2010/09/wcrf-is-at-it-again-extra-inch-on-waist.html">&lt;br&gt;&lt;br /&gt;&lt;b&gt;The WCRF is at it again: </a>

</li>
<li class="last"><a href="http://john-ray.blogspot.com/2010/09/cancer-patients-from-wealthy-areas-of.html">&lt;br&gt;&lt;br /&gt;&lt;b&gt;Cancer patients from</a>
</li>
</ul></div><div class="more-link"><a href="/aggregator/sources/6" title="Het meest recente nieuws van deze feed bekijken.">more</a></div></div>
 </div>

Open in new window

0
 

Author Comment

by:TheoRichel
ID: 33690572
Sounds promising Ray, thanks,
The datasource is: http://john-ray.blogspot.com/feeds/posts/default?alt=rss

Though I have no idea where to integrate that in Drupal.
0
 

Author Comment

by:TheoRichel
ID: 33690662
Fyi: on the page on my site that i point you to only shows blocks of the feed. If I go the the page of that feed (on my site) I see that the text is interspersed with 'No title provided'.
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 33691989
Interesting.  You may want to try processing the feed though this before using it in your site.  Give it a try and let's see if we have made any progress.
<?php // RAY_temp_theorichel.php
error_reporting(E_ALL);

// TEST DATA - READ AND MAKE AN OBJECT
$url = 'http://john-ray.blogspot.com/feeds/posts/default?alt=rss';
$xml = file_get_contents($url);
$obj = SimpleXML_Load_String($xml);

// ITERATE OVER THE OBJECT TO CLEAN UP THE EMBEDDED HTML IN THE DESCRIPTION FIELDS
foreach ($obj->channel->item as $item)
{
    $desc = $item->description;
    $desc = str_replace('<br>',   ' ', $desc);
    $desc = str_replace('<br />', ' ', $desc);
    $desc = strip_tags($desc, '<b>');
    $desc = str_replace('</b>', '</b> ', $desc);
    $item->description = $desc;
}

// ACTIVATE THIS TO SEE THE NEW OBJECT
// var_dump($obj);

// PRODUCE CLEANED UP XML
echo $obj->AsXML();

Open in new window

0
 
LVL 17

Expert Comment

by:Thomas4019
ID: 33692215
You should be able to fix this by configuring Drupal's Input Format settings for the input format used by that node type. There is an option to remove by tags or to entityize them. You simply need to switch it.
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 33692259
It looks like the current setting is "entitize"

;-)
0
 
LVL 17

Accepted Solution

by:
Thomas4019 earned 500 total points
ID: 33693104
Yeah!, here's some guides to familiarize yourself with Input Formats, if you don't know how to fix it.

http://www.lullabot.com/articles/drupal-input-formats-and-filters
http://drupal.org/node/213156
0
 

Author Comment

by:TheoRichel
ID: 33693179
Gentlemen thank you both, but the feed of the Drupal core aggregator does not produce 'nodes', but something called  a 'source'.
And as to the script: thanks veru much, but I wouldnt know where to paste that. Anyone else does?
0
 
LVL 17

Expert Comment

by:Thomas4019
ID: 33693213
So are you building that page with just the core aggregator module? I have not used that module much but thought it was just for producing RSS feeds, etc. Are you using any contributed modules to make that page like Views, Services, etc?
0
 

Author Comment

by:TheoRichel
ID: 33699549
@Thomas: Yes indeed, the core module, it works alright, though it has problems with Atom feeds. I added a patch, but that didnt improve anything,. The original url was: http://john-ray.blogspot.com/feeds/posts/default but it only works now since I added '?alt=rss', and then shows the tags.

BTW: I just discovered that the aggregator does have a setting to strip tags (yes I should have seen that before,  my bad), but it has no effect.
0
 

Author Closing Comment

by:TheoRichel
ID: 37924494
This made clear that with the present software I would never be able to solve my problem. Switching to Feeds did.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Format results with PHP 3 82
PHP PDO get the error if exists 3 102
replace quotes with UTF-8 character 38 112
Linux/Apache File Ownership/Permissions 1 83
Most of the sites are being standardized with W3C Web Standards. W3C provides lot of web standard services to the web. They have the web specification, process and documentation for all the web standards. You can apply HTML, CSS and Accessibility st…
It's sometimes a bit tricky to use date functions in Oracle BPEL. I'll explain quickly how you can add N days to the current date. In a BPEL process this can be useful, and you can adapt it to fit your needs. First of all, let's see how to add 1 …
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
HTML5 has deprecated a few of the older ways of showing media as well as offering up a new way to create games and animations. Audio, video, and canvas are just a few of the adjustments made between XHTML and HTML5. As we learned in our last micr…

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question