Solved

How do I retrieve an element from Atom Feed?

Posted on 2009-05-07
21
613 Views
Last Modified: 2013-11-18
Hi,

I have an atom feed containing the following:
<feed idx:index="no">
...
<entry>
...
<source gr:stream-id="feed/http://www.plantphysiol.org/rss/current.xml">

<id>
tag:google.com,2005:reader/feed/http://www.plantphysiol.org/rss/current.xml
</id>
<title type="html">Plant Physiology current issue</title>
<link rel="alternate" href="http://www.plantphysiol.org" type="text/html"/>
</source>
</entry>
</feed>

... represents other fields ignored.

I am using Perl, XML::Feed which in turn automatically decides whether to use XML::Atom::Feed or XML::RSS depending on which type of feed. In the above case, it is XML::Atom::Feed.

How do I get the 'source' element & sub-elements in the above xml? I looked at the documentation etc. and seemed like $entry->get($ns, $element) needs to be used for the same.
However, I have been running into lot of minor problems.

Any suggestions on how I can do this starting from a given feed, parse an entry and retrieve all the elements & sub-elements of 'source' will be appreciated.

Thanks.
0
Comment
Question by:Purdue_Pete
  • 10
  • 9
21 Comments
 
LVL 39

Expert Comment

by:Adam314
ID: 24338883
What are the errors you have with get?  Have you tried $entry->getlist?
0
 

Author Comment

by:Purdue_Pete
ID: 24339002
I would like to get the first entry and retrieve the 'source' element for it.
my @entries = $feed->entries; //all the entries in the feed

I tried the following:
CODE:
my $entry = @entries[0];
  my $dc = XML::Atom::Namespace->new(dc => 'http://purl.org/dc/elements/1.1/');
  my $source = $entry->get($dc, 'source');
  print $source;
ERROR:
Can't locate object method "get" via package "XML::Feed::Entry::Format::Atom" at rss.pm line 225.

CODE:
my $entry = XML::Atom::Entry->new(@entries[0]);
ERROR: read on filehandle failed: Can't locate object method "read" via package "XML::Feed::Entry::Format::Atom" at /usr/lib/perl5/XML/LibXML.pm line 607. at /usr/share/perl5/XML/Atom/Thing.pm line 27

Trying with getlist is same as that of ERROR1 (with making source as an array, i.e. @source = #entry->getlist...)
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24339031
What about this:
use Data::Dumper;   #put at top
 

my @entries = $feed->entries; //all the entries in the feed

print Dumper($entries[0]);

Open in new window

0
 

Author Comment

by:Purdue_Pete
ID: 24340554
Adam314,
I am not sure how this is useful - I got something like the following . If you can explain in code how I can manipulate this to retrieve 'source', that will be great.

$VAR1 = bless( {
                 'entry' => bless( {
                                     'ns' => 'http://www.w3.org/2005/Atom',
                                     'elem' => bless( do{\(my $o = 173718608)}, 'XML::LibXML::Element' ),
                                     'version' => '0.3'
                                   }, 'XML::Atom::Entry' )
               }, 'XML::Feed::Entry::Format::Atom' );
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24369212
In this case, what is 'source'?
0
 

Author Comment

by:Purdue_Pete
ID: 24431816
Adam314,

source is an xml element in Google Reader's shared items (Atom) - which I am trying to parse for each entry.
eg: look at this feed - you will see 'source' for an entry
http://www.google.com/reader/public/atom/user%2F03409051605136090036%2Fstate%2Fcom.google%2Fbroadcast

Hope the above answers the questions
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24433577
Can you provide the URI's you are trying?  I will play around with it.
0
 

Author Comment

by:Purdue_Pete
ID: 24483978
0
 
LVL 39

Accepted Solution

by:
Adam314 earned 250 total points
ID: 24497478
I'm not familiar with feeds and how namespaces work, so I couldn't figure out how to get it to work with the object directly.  If you want to call the get method with a namespace, you can do so like below.  If that doesn't work, you can parse the XML and get the source data from that, also below.


########## Get feed

my $url = 'http://www.google.com/reader/public/atom/user%2F03409051605136090036%2Fstate%2Fcom.google%2Fbroadcast';

my $feed = XML::Feed->parse(URI->new($url)) or die XML::Feed->errstr;
 

########## Method 1: Using the get method on the entry with a namespace

my $entry = ($feed->entries)[0];

my $dc = XML::Atom::Namespace->new(dc => 'http://purl.org/dc/elements/1.1/');

my $source = $entry->{entry}->get($dc, 'source');

#But here, $source is undef.  Not sure if the namespace isn't correct for this, or something else.
 
 

########## Method 2: Parsing the XML

my $xml = XMLin($feed->as_xml, forcearray => ['entry']);
 

foreach my $entry (@{$xml->{entry}}) {

	my $source = $entry->{source};

	

	##### You can access all of the entry properties through $entry

	print "Entry ID: $entry->{id}->{content}\n";

	

	##### You can now access all of the source properties through $source

	print "    Source Title: $source->{title}->{content}\n";

	print "    Source ID   : $source->{id}\n";

	#Access whatever properties from souce you want

}

Open in new window

0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 39

Expert Comment

by:Adam314
ID: 24795953
I assumed my latest post solved the problem, as there was no response after that.
0
 

Author Comment

by:Purdue_Pete
ID: 24805430
Adam314,
Sorry - forgot about this question. I will try your last solution.
BTW, how can I get the attribute of an element?
Can you modify your code (if not available) to reflect that?

Thanks,
K
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24807419
Some things appear to be available through the XML::Atom objects.  If what you want is not, you can access everything through the $xml variable.  You might want to
    print Dumper($xml);
to see how the $xml is structured.  If you need help with a particular attribute, post the output from the above command.
0
 

Author Comment

by:Purdue_Pete
ID: 24808297
Adam314,
I figured out how to get the attributes with some modifications to the code. But, I have a naive question on arrayref and hashref.

I had to set forcearray option to 1 so as not to mix up attributes and elements for the entire xml.
Line becomes:
my $xml = XMLin($feed->as_xml, forcearray => 1);

How can I now print the title of the entry (since it is now an arrayref)?
I tried the following, but none of them worked
print @$entry->{title};
print @($entry->{title});

Thanks,
K
0
 

Author Comment

by:Purdue_Pete
ID: 24808455
Adam314,
When I set forcearray = 1, all the elements in xml are now arrayrefs, right?

When I set forcearray = {entry}, only entry element becomes arrayref and others all are hashref?

Related to the above problem of getting title value, I tried something like this which worked (w/ forcearray set to 1):
$entry->{title}->[0]->{content};

But, I am not able to understand its logic. Can you explain?
 
Also,
foreach my $entry (@{$xml->{entry}}) -
what does @{$xml->{entry}} mean?
what ref is  $xml->{entry} - arrayref or hashref?
what ref is @{$xml->{entry}} - arrayref or hashref?

Thanks,
K
   
0
 
LVL 39

Assisted Solution

by:Adam314
Adam314 earned 250 total points
ID: 24808536
If you set forcearray=>1, then all elements will be an arrayref.  If you set forcearray=>[name1,name2,...], then only those named (eg: name1, name2) will be forced to an arrayref.  If there is more than 1 element with a given name, it will also be an arrayref.  Otherwise, it'll be a hashref.
eg1:
    <xml>
        <element attr1="val1" />
    </xml>
eg2:
    <xml>
        <element attr1="val1" />
        <element attr1="val2" />
    </xml>

Here, in example 1, element would be a hashref, with attr1 a key.  In example2, element will be an arrayref with 2 items.


In this: $entry->{title}->[0]->{content}
The $entry is a hashref. The ->{title} part gets the value associated with the "title" key in that hashref.
That value is an arrayref.  The ->[0] part gets the value of the first element of that arrayref.
That value is a hashref.  The ->{content} part gets the value associated with the "content" key in that hashref.

In this: foreach my $entry (@{$xml->{entry}}) -
The $xml is a hashref.  The ->{entry} part gets the value associated with the "entry" key in that hashref.
This value is an arrayref.  The "@{...}" turns this arrayref into a list.
The foreach part then loops over this list (eg: the values in the arrayref).
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24808545
There are a few (many at other sites also I'm sure) tutorials on references and complex data structures.  You might find them helpful.
    http://perldoc.perl.org/perlreftut.html
    http://perldoc.perl.org/perldsc.html
0
 

Author Comment

by:Purdue_Pete
ID: 24815501
Adam314,
Thanks - that clarified it! Will check out the links also.
0
 

Author Comment

by:Purdue_Pete
ID: 24840283
Adam314,
When I tried integrating the code, I am getting the following problem:
Undefined subroutine &B::C::RSS::Feed::XMLin around that XMLin line
Reason it could be I have
package B::C::RSS::Feed before the sub which has all this code

If I added XML::Simple->XMLin, I get the following error:
Can't use string ("XML::Simple") as a HASH ref while "strict refs" in use at /usr/local/share/perl/5.10.0/XML/Simple.pm line 697

How can this be fixed?

Thanks,
K
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24843175
Add:
    use XML::Simple 'XMLin';
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Windows Script Host (WSH) has been part of Windows since Windows NT4. Windows Script Host provides architecture for building dynamic scripts that consist of a core object model, scripting hosts, and scripting engines. The key components of Window…
The goal of this video is to provide viewers with basic examples to understand and use conditional statements in the C programming language.
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now