Solved

How do I retrieve an element from Atom Feed?

Posted on 2009-05-07
21
617 Views
Last Modified: 2013-11-18
Hi,

I have an atom feed containing the following:
<feed idx:index="no">
...
<entry>
...
<source gr:stream-id="feed/http://www.plantphysiol.org/rss/current.xml">

<id>
tag:google.com,2005:reader/feed/http://www.plantphysiol.org/rss/current.xml
</id>
<title type="html">Plant Physiology current issue</title>
<link rel="alternate" href="http://www.plantphysiol.org" type="text/html"/>
</source>
</entry>
</feed>

... represents other fields ignored.

I am using Perl, XML::Feed which in turn automatically decides whether to use XML::Atom::Feed or XML::RSS depending on which type of feed. In the above case, it is XML::Atom::Feed.

How do I get the 'source' element & sub-elements in the above xml? I looked at the documentation etc. and seemed like $entry->get($ns, $element) needs to be used for the same.
However, I have been running into lot of minor problems.

Any suggestions on how I can do this starting from a given feed, parse an entry and retrieve all the elements & sub-elements of 'source' will be appreciated.

Thanks.
0
Comment
Question by:Purdue_Pete
  • 10
  • 9
21 Comments
 
LVL 39

Expert Comment

by:Adam314
ID: 24338883
What are the errors you have with get?  Have you tried $entry->getlist?
0
 

Author Comment

by:Purdue_Pete
ID: 24339002
I would like to get the first entry and retrieve the 'source' element for it.
my @entries = $feed->entries; //all the entries in the feed

I tried the following:
CODE:
my $entry = @entries[0];
  my $dc = XML::Atom::Namespace->new(dc => 'http://purl.org/dc/elements/1.1/');
  my $source = $entry->get($dc, 'source');
  print $source;
ERROR:
Can't locate object method "get" via package "XML::Feed::Entry::Format::Atom" at rss.pm line 225.

CODE:
my $entry = XML::Atom::Entry->new(@entries[0]);
ERROR: read on filehandle failed: Can't locate object method "read" via package "XML::Feed::Entry::Format::Atom" at /usr/lib/perl5/XML/LibXML.pm line 607. at /usr/share/perl5/XML/Atom/Thing.pm line 27

Trying with getlist is same as that of ERROR1 (with making source as an array, i.e. @source = #entry->getlist...)
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24339031
What about this:
use Data::Dumper;   #put at top
 
my @entries = $feed->entries; //all the entries in the feed
print Dumper($entries[0]);

Open in new window

0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 

Author Comment

by:Purdue_Pete
ID: 24340554
Adam314,
I am not sure how this is useful - I got something like the following . If you can explain in code how I can manipulate this to retrieve 'source', that will be great.

$VAR1 = bless( {
                 'entry' => bless( {
                                     'ns' => 'http://www.w3.org/2005/Atom',
                                     'elem' => bless( do{\(my $o = 173718608)}, 'XML::LibXML::Element' ),
                                     'version' => '0.3'
                                   }, 'XML::Atom::Entry' )
               }, 'XML::Feed::Entry::Format::Atom' );
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24369212
In this case, what is 'source'?
0
 

Author Comment

by:Purdue_Pete
ID: 24431816
Adam314,

source is an xml element in Google Reader's shared items (Atom) - which I am trying to parse for each entry.
eg: look at this feed - you will see 'source' for an entry
http://www.google.com/reader/public/atom/user%2F03409051605136090036%2Fstate%2Fcom.google%2Fbroadcast

Hope the above answers the questions
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24433577
Can you provide the URI's you are trying?  I will play around with it.
0
 

Author Comment

by:Purdue_Pete
ID: 24483978
0
 
LVL 39

Accepted Solution

by:
Adam314 earned 250 total points
ID: 24497478
I'm not familiar with feeds and how namespaces work, so I couldn't figure out how to get it to work with the object directly.  If you want to call the get method with a namespace, you can do so like below.  If that doesn't work, you can parse the XML and get the source data from that, also below.


########## Get feed
my $url = 'http://www.google.com/reader/public/atom/user%2F03409051605136090036%2Fstate%2Fcom.google%2Fbroadcast';
my $feed = XML::Feed->parse(URI->new($url)) or die XML::Feed->errstr;
 
########## Method 1: Using the get method on the entry with a namespace
my $entry = ($feed->entries)[0];
my $dc = XML::Atom::Namespace->new(dc => 'http://purl.org/dc/elements/1.1/');
my $source = $entry->{entry}->get($dc, 'source');
#But here, $source is undef.  Not sure if the namespace isn't correct for this, or something else.
 
 
########## Method 2: Parsing the XML
my $xml = XMLin($feed->as_xml, forcearray => ['entry']);
 
foreach my $entry (@{$xml->{entry}}) {
	my $source = $entry->{source};
	
	##### You can access all of the entry properties through $entry
	print "Entry ID: $entry->{id}->{content}\n";
	
	##### You can now access all of the source properties through $source
	print "    Source Title: $source->{title}->{content}\n";
	print "    Source ID   : $source->{id}\n";
	#Access whatever properties from souce you want
}

Open in new window

0
 
LVL 39

Expert Comment

by:Adam314
ID: 24795953
I assumed my latest post solved the problem, as there was no response after that.
0
 

Author Comment

by:Purdue_Pete
ID: 24805430
Adam314,
Sorry - forgot about this question. I will try your last solution.
BTW, how can I get the attribute of an element?
Can you modify your code (if not available) to reflect that?

Thanks,
K
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24807419
Some things appear to be available through the XML::Atom objects.  If what you want is not, you can access everything through the $xml variable.  You might want to
    print Dumper($xml);
to see how the $xml is structured.  If you need help with a particular attribute, post the output from the above command.
0
 

Author Comment

by:Purdue_Pete
ID: 24808297
Adam314,
I figured out how to get the attributes with some modifications to the code. But, I have a naive question on arrayref and hashref.

I had to set forcearray option to 1 so as not to mix up attributes and elements for the entire xml.
Line becomes:
my $xml = XMLin($feed->as_xml, forcearray => 1);

How can I now print the title of the entry (since it is now an arrayref)?
I tried the following, but none of them worked
print @$entry->{title};
print @($entry->{title});

Thanks,
K
0
 

Author Comment

by:Purdue_Pete
ID: 24808455
Adam314,
When I set forcearray = 1, all the elements in xml are now arrayrefs, right?

When I set forcearray = {entry}, only entry element becomes arrayref and others all are hashref?

Related to the above problem of getting title value, I tried something like this which worked (w/ forcearray set to 1):
$entry->{title}->[0]->{content};

But, I am not able to understand its logic. Can you explain?
 
Also,
foreach my $entry (@{$xml->{entry}}) -
what does @{$xml->{entry}} mean?
what ref is  $xml->{entry} - arrayref or hashref?
what ref is @{$xml->{entry}} - arrayref or hashref?

Thanks,
K
   
0
 
LVL 39

Assisted Solution

by:Adam314
Adam314 earned 250 total points
ID: 24808536
If you set forcearray=>1, then all elements will be an arrayref.  If you set forcearray=>[name1,name2,...], then only those named (eg: name1, name2) will be forced to an arrayref.  If there is more than 1 element with a given name, it will also be an arrayref.  Otherwise, it'll be a hashref.
eg1:
    <xml>
        <element attr1="val1" />
    </xml>
eg2:
    <xml>
        <element attr1="val1" />
        <element attr1="val2" />
    </xml>

Here, in example 1, element would be a hashref, with attr1 a key.  In example2, element will be an arrayref with 2 items.


In this: $entry->{title}->[0]->{content}
The $entry is a hashref. The ->{title} part gets the value associated with the "title" key in that hashref.
That value is an arrayref.  The ->[0] part gets the value of the first element of that arrayref.
That value is a hashref.  The ->{content} part gets the value associated with the "content" key in that hashref.

In this: foreach my $entry (@{$xml->{entry}}) -
The $xml is a hashref.  The ->{entry} part gets the value associated with the "entry" key in that hashref.
This value is an arrayref.  The "@{...}" turns this arrayref into a list.
The foreach part then loops over this list (eg: the values in the arrayref).
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24808545
There are a few (many at other sites also I'm sure) tutorials on references and complex data structures.  You might find them helpful.
    http://perldoc.perl.org/perlreftut.html
    http://perldoc.perl.org/perldsc.html
0
 

Author Comment

by:Purdue_Pete
ID: 24815501
Adam314,
Thanks - that clarified it! Will check out the links also.
0
 

Author Comment

by:Purdue_Pete
ID: 24840283
Adam314,
When I tried integrating the code, I am getting the following problem:
Undefined subroutine &B::C::RSS::Feed::XMLin around that XMLin line
Reason it could be I have
package B::C::RSS::Feed before the sub which has all this code

If I added XML::Simple->XMLin, I get the following error:
Can't use string ("XML::Simple") as a HASH ref while "strict refs" in use at /usr/local/share/perl/5.10.0/XML/Simple.pm line 697

How can this be fixed?

Thanks,
K
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24843175
Add:
    use XML::Simple 'XMLin';
0

Featured Post

DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
splitOdd10 challenge 5 108
XSLT: Is it possible to assign number format from a variable? 5 44
AL3 Files 4 29
what are list of ebay api errors 1 19
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question