Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

How do I retrieve an element from Atom Feed?

Posted on 2009-05-07
21
Medium Priority
?
631 Views
Last Modified: 2013-11-18
Hi,

I have an atom feed containing the following:
<feed idx:index="no">
...
<entry>
...
<source gr:stream-id="feed/http://www.plantphysiol.org/rss/current.xml">

<id>
tag:google.com,2005:reader/feed/http://www.plantphysiol.org/rss/current.xml
</id>
<title type="html">Plant Physiology current issue</title>
<link rel="alternate" href="http://www.plantphysiol.org" type="text/html"/>
</source>
</entry>
</feed>

... represents other fields ignored.

I am using Perl, XML::Feed which in turn automatically decides whether to use XML::Atom::Feed or XML::RSS depending on which type of feed. In the above case, it is XML::Atom::Feed.

How do I get the 'source' element & sub-elements in the above xml? I looked at the documentation etc. and seemed like $entry->get($ns, $element) needs to be used for the same.
However, I have been running into lot of minor problems.

Any suggestions on how I can do this starting from a given feed, parse an entry and retrieve all the elements & sub-elements of 'source' will be appreciated.

Thanks.
0
Comment
Question by:Purdue_Pete
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 10
  • 9
21 Comments
 
LVL 39

Expert Comment

by:Adam314
ID: 24338883
What are the errors you have with get?  Have you tried $entry->getlist?
0
 

Author Comment

by:Purdue_Pete
ID: 24339002
I would like to get the first entry and retrieve the 'source' element for it.
my @entries = $feed->entries; //all the entries in the feed

I tried the following:
CODE:
my $entry = @entries[0];
  my $dc = XML::Atom::Namespace->new(dc => 'http://purl.org/dc/elements/1.1/');
  my $source = $entry->get($dc, 'source');
  print $source;
ERROR:
Can't locate object method "get" via package "XML::Feed::Entry::Format::Atom" at rss.pm line 225.

CODE:
my $entry = XML::Atom::Entry->new(@entries[0]);
ERROR: read on filehandle failed: Can't locate object method "read" via package "XML::Feed::Entry::Format::Atom" at /usr/lib/perl5/XML/LibXML.pm line 607. at /usr/share/perl5/XML/Atom/Thing.pm line 27

Trying with getlist is same as that of ERROR1 (with making source as an array, i.e. @source = #entry->getlist...)
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24339031
What about this:
use Data::Dumper;   #put at top
 
my @entries = $feed->entries; //all the entries in the feed
print Dumper($entries[0]);

Open in new window

0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 

Author Comment

by:Purdue_Pete
ID: 24340554
Adam314,
I am not sure how this is useful - I got something like the following . If you can explain in code how I can manipulate this to retrieve 'source', that will be great.

$VAR1 = bless( {
                 'entry' => bless( {
                                     'ns' => 'http://www.w3.org/2005/Atom',
                                     'elem' => bless( do{\(my $o = 173718608)}, 'XML::LibXML::Element' ),
                                     'version' => '0.3'
                                   }, 'XML::Atom::Entry' )
               }, 'XML::Feed::Entry::Format::Atom' );
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24369212
In this case, what is 'source'?
0
 

Author Comment

by:Purdue_Pete
ID: 24431816
Adam314,

source is an xml element in Google Reader's shared items (Atom) - which I am trying to parse for each entry.
eg: look at this feed - you will see 'source' for an entry
http://www.google.com/reader/public/atom/user%2F03409051605136090036%2Fstate%2Fcom.google%2Fbroadcast

Hope the above answers the questions
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24433577
Can you provide the URI's you are trying?  I will play around with it.
0
 
LVL 39

Accepted Solution

by:
Adam314 earned 1000 total points
ID: 24497478
I'm not familiar with feeds and how namespaces work, so I couldn't figure out how to get it to work with the object directly.  If you want to call the get method with a namespace, you can do so like below.  If that doesn't work, you can parse the XML and get the source data from that, also below.


########## Get feed
my $url = 'http://www.google.com/reader/public/atom/user%2F03409051605136090036%2Fstate%2Fcom.google%2Fbroadcast';
my $feed = XML::Feed->parse(URI->new($url)) or die XML::Feed->errstr;
 
########## Method 1: Using the get method on the entry with a namespace
my $entry = ($feed->entries)[0];
my $dc = XML::Atom::Namespace->new(dc => 'http://purl.org/dc/elements/1.1/');
my $source = $entry->{entry}->get($dc, 'source');
#But here, $source is undef.  Not sure if the namespace isn't correct for this, or something else.
 
 
########## Method 2: Parsing the XML
my $xml = XMLin($feed->as_xml, forcearray => ['entry']);
 
foreach my $entry (@{$xml->{entry}}) {
	my $source = $entry->{source};
	
	##### You can access all of the entry properties through $entry
	print "Entry ID: $entry->{id}->{content}\n";
	
	##### You can now access all of the source properties through $source
	print "    Source Title: $source->{title}->{content}\n";
	print "    Source ID   : $source->{id}\n";
	#Access whatever properties from souce you want
}

Open in new window

0
 
LVL 39

Expert Comment

by:Adam314
ID: 24795953
I assumed my latest post solved the problem, as there was no response after that.
0
 

Author Comment

by:Purdue_Pete
ID: 24805430
Adam314,
Sorry - forgot about this question. I will try your last solution.
BTW, how can I get the attribute of an element?
Can you modify your code (if not available) to reflect that?

Thanks,
K
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24807419
Some things appear to be available through the XML::Atom objects.  If what you want is not, you can access everything through the $xml variable.  You might want to
    print Dumper($xml);
to see how the $xml is structured.  If you need help with a particular attribute, post the output from the above command.
0
 

Author Comment

by:Purdue_Pete
ID: 24808297
Adam314,
I figured out how to get the attributes with some modifications to the code. But, I have a naive question on arrayref and hashref.

I had to set forcearray option to 1 so as not to mix up attributes and elements for the entire xml.
Line becomes:
my $xml = XMLin($feed->as_xml, forcearray => 1);

How can I now print the title of the entry (since it is now an arrayref)?
I tried the following, but none of them worked
print @$entry->{title};
print @($entry->{title});

Thanks,
K
0
 

Author Comment

by:Purdue_Pete
ID: 24808455
Adam314,
When I set forcearray = 1, all the elements in xml are now arrayrefs, right?

When I set forcearray = {entry}, only entry element becomes arrayref and others all are hashref?

Related to the above problem of getting title value, I tried something like this which worked (w/ forcearray set to 1):
$entry->{title}->[0]->{content};

But, I am not able to understand its logic. Can you explain?
 
Also,
foreach my $entry (@{$xml->{entry}}) -
what does @{$xml->{entry}} mean?
what ref is  $xml->{entry} - arrayref or hashref?
what ref is @{$xml->{entry}} - arrayref or hashref?

Thanks,
K
   
0
 
LVL 39

Assisted Solution

by:Adam314
Adam314 earned 1000 total points
ID: 24808536
If you set forcearray=>1, then all elements will be an arrayref.  If you set forcearray=>[name1,name2,...], then only those named (eg: name1, name2) will be forced to an arrayref.  If there is more than 1 element with a given name, it will also be an arrayref.  Otherwise, it'll be a hashref.
eg1:
    <xml>
        <element attr1="val1" />
    </xml>
eg2:
    <xml>
        <element attr1="val1" />
        <element attr1="val2" />
    </xml>

Here, in example 1, element would be a hashref, with attr1 a key.  In example2, element will be an arrayref with 2 items.


In this: $entry->{title}->[0]->{content}
The $entry is a hashref. The ->{title} part gets the value associated with the "title" key in that hashref.
That value is an arrayref.  The ->[0] part gets the value of the first element of that arrayref.
That value is a hashref.  The ->{content} part gets the value associated with the "content" key in that hashref.

In this: foreach my $entry (@{$xml->{entry}}) -
The $xml is a hashref.  The ->{entry} part gets the value associated with the "entry" key in that hashref.
This value is an arrayref.  The "@{...}" turns this arrayref into a list.
The foreach part then loops over this list (eg: the values in the arrayref).
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24808545
There are a few (many at other sites also I'm sure) tutorials on references and complex data structures.  You might find them helpful.
    http://perldoc.perl.org/perlreftut.html
    http://perldoc.perl.org/perldsc.html
0
 

Author Comment

by:Purdue_Pete
ID: 24815501
Adam314,
Thanks - that clarified it! Will check out the links also.
0
 

Author Comment

by:Purdue_Pete
ID: 24840283
Adam314,
When I tried integrating the code, I am getting the following problem:
Undefined subroutine &B::C::RSS::Feed::XMLin around that XMLin line
Reason it could be I have
package B::C::RSS::Feed before the sub which has all this code

If I added XML::Simple->XMLin, I get the following error:
Can't use string ("XML::Simple") as a HASH ref while "strict refs" in use at /usr/local/share/perl/5.10.0/XML/Simple.pm line 697

How can this be fixed?

Thanks,
K
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24843175
Add:
    use XML::Simple 'XMLin';
0

Featured Post

[Webinar] Lessons on Recovering from Petya

Skyport is working hard to help customers recover from recent attacks, like the Petya worm. This work has brought to light some important lessons. New malware attacks like this can take down your entire environment. Learn from others mistakes on how to prevent Petya like worms.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When we want to run, execute or repeat a statement multiple times, a loop is necessary. This article covers the two types of loops in Python: the while loop and the for loop.
Article by: evilrix
Looking for a way to avoid searching through large data sets for data that doesn't exist? A Bloom Filter might be what you need. This data structure is a probabilistic filter that allows you to avoid unnecessary searches when you know the data defin…
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…
Six Sigma Control Plans
Suggested Courses

597 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question