?
Solved

How do I retrieve an element from Atom Feed?

Posted on 2009-05-07
21
Medium Priority
?
625 Views
Last Modified: 2013-11-18
Hi,

I have an atom feed containing the following:
<feed idx:index="no">
...
<entry>
...
<source gr:stream-id="feed/http://www.plantphysiol.org/rss/current.xml">

<id>
tag:google.com,2005:reader/feed/http://www.plantphysiol.org/rss/current.xml
</id>
<title type="html">Plant Physiology current issue</title>
<link rel="alternate" href="http://www.plantphysiol.org" type="text/html"/>
</source>
</entry>
</feed>

... represents other fields ignored.

I am using Perl, XML::Feed which in turn automatically decides whether to use XML::Atom::Feed or XML::RSS depending on which type of feed. In the above case, it is XML::Atom::Feed.

How do I get the 'source' element & sub-elements in the above xml? I looked at the documentation etc. and seemed like $entry->get($ns, $element) needs to be used for the same.
However, I have been running into lot of minor problems.

Any suggestions on how I can do this starting from a given feed, parse an entry and retrieve all the elements & sub-elements of 'source' will be appreciated.

Thanks.
0
Comment
Question by:Purdue_Pete
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 10
  • 9
21 Comments
 
LVL 39

Expert Comment

by:Adam314
ID: 24338883
What are the errors you have with get?  Have you tried $entry->getlist?
0
 

Author Comment

by:Purdue_Pete
ID: 24339002
I would like to get the first entry and retrieve the 'source' element for it.
my @entries = $feed->entries; //all the entries in the feed

I tried the following:
CODE:
my $entry = @entries[0];
  my $dc = XML::Atom::Namespace->new(dc => 'http://purl.org/dc/elements/1.1/');
  my $source = $entry->get($dc, 'source');
  print $source;
ERROR:
Can't locate object method "get" via package "XML::Feed::Entry::Format::Atom" at rss.pm line 225.

CODE:
my $entry = XML::Atom::Entry->new(@entries[0]);
ERROR: read on filehandle failed: Can't locate object method "read" via package "XML::Feed::Entry::Format::Atom" at /usr/lib/perl5/XML/LibXML.pm line 607. at /usr/share/perl5/XML/Atom/Thing.pm line 27

Trying with getlist is same as that of ERROR1 (with making source as an array, i.e. @source = #entry->getlist...)
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24339031
What about this:
use Data::Dumper;   #put at top
 
my @entries = $feed->entries; //all the entries in the feed
print Dumper($entries[0]);

Open in new window

0
A new era in Cloud training has arrived.

A day that will go down in Cloud history.. But are you ready for it? Will you accept this Cloud challenge?

 

Author Comment

by:Purdue_Pete
ID: 24340554
Adam314,
I am not sure how this is useful - I got something like the following . If you can explain in code how I can manipulate this to retrieve 'source', that will be great.

$VAR1 = bless( {
                 'entry' => bless( {
                                     'ns' => 'http://www.w3.org/2005/Atom',
                                     'elem' => bless( do{\(my $o = 173718608)}, 'XML::LibXML::Element' ),
                                     'version' => '0.3'
                                   }, 'XML::Atom::Entry' )
               }, 'XML::Feed::Entry::Format::Atom' );
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24369212
In this case, what is 'source'?
0
 

Author Comment

by:Purdue_Pete
ID: 24431816
Adam314,

source is an xml element in Google Reader's shared items (Atom) - which I am trying to parse for each entry.
eg: look at this feed - you will see 'source' for an entry
http://www.google.com/reader/public/atom/user%2F03409051605136090036%2Fstate%2Fcom.google%2Fbroadcast

Hope the above answers the questions
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24433577
Can you provide the URI's you are trying?  I will play around with it.
0
 
LVL 39

Accepted Solution

by:
Adam314 earned 1000 total points
ID: 24497478
I'm not familiar with feeds and how namespaces work, so I couldn't figure out how to get it to work with the object directly.  If you want to call the get method with a namespace, you can do so like below.  If that doesn't work, you can parse the XML and get the source data from that, also below.


########## Get feed
my $url = 'http://www.google.com/reader/public/atom/user%2F03409051605136090036%2Fstate%2Fcom.google%2Fbroadcast';
my $feed = XML::Feed->parse(URI->new($url)) or die XML::Feed->errstr;
 
########## Method 1: Using the get method on the entry with a namespace
my $entry = ($feed->entries)[0];
my $dc = XML::Atom::Namespace->new(dc => 'http://purl.org/dc/elements/1.1/');
my $source = $entry->{entry}->get($dc, 'source');
#But here, $source is undef.  Not sure if the namespace isn't correct for this, or something else.
 
 
########## Method 2: Parsing the XML
my $xml = XMLin($feed->as_xml, forcearray => ['entry']);
 
foreach my $entry (@{$xml->{entry}}) {
	my $source = $entry->{source};
	
	##### You can access all of the entry properties through $entry
	print "Entry ID: $entry->{id}->{content}\n";
	
	##### You can now access all of the source properties through $source
	print "    Source Title: $source->{title}->{content}\n";
	print "    Source ID   : $source->{id}\n";
	#Access whatever properties from souce you want
}

Open in new window

0
 
LVL 39

Expert Comment

by:Adam314
ID: 24795953
I assumed my latest post solved the problem, as there was no response after that.
0
 

Author Comment

by:Purdue_Pete
ID: 24805430
Adam314,
Sorry - forgot about this question. I will try your last solution.
BTW, how can I get the attribute of an element?
Can you modify your code (if not available) to reflect that?

Thanks,
K
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24807419
Some things appear to be available through the XML::Atom objects.  If what you want is not, you can access everything through the $xml variable.  You might want to
    print Dumper($xml);
to see how the $xml is structured.  If you need help with a particular attribute, post the output from the above command.
0
 

Author Comment

by:Purdue_Pete
ID: 24808297
Adam314,
I figured out how to get the attributes with some modifications to the code. But, I have a naive question on arrayref and hashref.

I had to set forcearray option to 1 so as not to mix up attributes and elements for the entire xml.
Line becomes:
my $xml = XMLin($feed->as_xml, forcearray => 1);

How can I now print the title of the entry (since it is now an arrayref)?
I tried the following, but none of them worked
print @$entry->{title};
print @($entry->{title});

Thanks,
K
0
 

Author Comment

by:Purdue_Pete
ID: 24808455
Adam314,
When I set forcearray = 1, all the elements in xml are now arrayrefs, right?

When I set forcearray = {entry}, only entry element becomes arrayref and others all are hashref?

Related to the above problem of getting title value, I tried something like this which worked (w/ forcearray set to 1):
$entry->{title}->[0]->{content};

But, I am not able to understand its logic. Can you explain?
 
Also,
foreach my $entry (@{$xml->{entry}}) -
what does @{$xml->{entry}} mean?
what ref is  $xml->{entry} - arrayref or hashref?
what ref is @{$xml->{entry}} - arrayref or hashref?

Thanks,
K
   
0
 
LVL 39

Assisted Solution

by:Adam314
Adam314 earned 1000 total points
ID: 24808536
If you set forcearray=>1, then all elements will be an arrayref.  If you set forcearray=>[name1,name2,...], then only those named (eg: name1, name2) will be forced to an arrayref.  If there is more than 1 element with a given name, it will also be an arrayref.  Otherwise, it'll be a hashref.
eg1:
    <xml>
        <element attr1="val1" />
    </xml>
eg2:
    <xml>
        <element attr1="val1" />
        <element attr1="val2" />
    </xml>

Here, in example 1, element would be a hashref, with attr1 a key.  In example2, element will be an arrayref with 2 items.


In this: $entry->{title}->[0]->{content}
The $entry is a hashref. The ->{title} part gets the value associated with the "title" key in that hashref.
That value is an arrayref.  The ->[0] part gets the value of the first element of that arrayref.
That value is a hashref.  The ->{content} part gets the value associated with the "content" key in that hashref.

In this: foreach my $entry (@{$xml->{entry}}) -
The $xml is a hashref.  The ->{entry} part gets the value associated with the "entry" key in that hashref.
This value is an arrayref.  The "@{...}" turns this arrayref into a list.
The foreach part then loops over this list (eg: the values in the arrayref).
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24808545
There are a few (many at other sites also I'm sure) tutorials on references and complex data structures.  You might find them helpful.
    http://perldoc.perl.org/perlreftut.html
    http://perldoc.perl.org/perldsc.html
0
 

Author Comment

by:Purdue_Pete
ID: 24815501
Adam314,
Thanks - that clarified it! Will check out the links also.
0
 

Author Comment

by:Purdue_Pete
ID: 24840283
Adam314,
When I tried integrating the code, I am getting the following problem:
Undefined subroutine &B::C::RSS::Feed::XMLin around that XMLin line
Reason it could be I have
package B::C::RSS::Feed before the sub which has all this code

If I added XML::Simple->XMLin, I get the following error:
Can't use string ("XML::Simple") as a HASH ref while "strict refs" in use at /usr/local/share/perl/5.10.0/XML/Simple.pm line 697

How can this be fixed?

Thanks,
K
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24843175
Add:
    use XML::Simple 'XMLin';
0

Featured Post

Major Serverless Shift

Comparison of major players like AWS, Microsoft Azure, IBM Bluemix, and Google Cloud Platform

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Q&A with Course Creator, Mark Lassoff, on the importance of HTML5 in the career of a modern-day developer.
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…
Suggested Courses

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question