Avatar of trevor1940
trevor1940
 asked on

XPath Syntax help in conjunction with perl XML::LibXML

I’m  trying to extract  some data from the georss.xml file below using the perl script I’m struggling to get the  value of the following

<feed sc:countryCodes=”US”  # need US

  <category term="Wanted" label="Wanted" url="http://otherurl.com/pathto/FINDME" />  # need value of term where value of url contains FINDME

 The value of the 2 entry/title fields but know which one has type="text" /  value = fileName.pdf
<point xmlns="http://www.opengis.net/gml" srsName="urn:ogc:crs:EPSG:4326>  # need value of srsName




perl script

 
#!C:\strawberry\perl\bin\perl.exe

use strict;
use warnings;
use XML::LibXML;
use XML::LibXML::XPathContext;
use Data::Dump qw(dump);


my $filename = 'georss.xml';

my $dom = XML::LibXML->load_xml(location => $filename);
my $xpc = XML::LibXML::XPathContext->new($dom);
$xpc->registerNs(dft => "http://www.w3.org/2005/Atom");
$xpc->registerNs(georss => "http://www.georss.org/georss");

my $title =  $xpc->findnodes('//dft:feed/dft:title');
print "title $title\n"; # GOOD
#my $point = $xpc->findnodes('//dft:feed/georss:where/dft:Point/dft:pos'); ## this doesn't find anything
my $point = $xpc->findnodes('//dft:feed/georss:where');
   $point =~ s/^\s*//;  # clean white space unsure why but had loads
   $point =~ s/\s*$//;
   print "point $point\n"; # GOOD

foreach my $Etitle ($dom->findnodes('//dft:feed/dft:title)) {
    print "Etitle $Etitle\n";  # prints <title type="text">fileName.pdf</title>
    my $EtitleVal = $Etitle->findvalue('./title');
    if($Etilte =~ m/jpg/){
      print "Image $EtitleVal\n";  # prints Image
     } 
    elsif($Etilte =~ m/pdf/){
      print "PDF $EtitleVal\n";  # prints <title type="text">fileName.pdf</title>
     } 
    
}

Open in new window


georss.xml

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:georss="http://www.georss.org/georss" xmlns:gml="http://www.opengis.net/gml"sc:countryCodes=”US”>
  <title type="text">Earthquakes</title>
  <subtitle>International earthquake observation labs</subtitle>
  <link href="http://example.org/" />
  <updated>2005-12-13T18:30:02Z</updated>
  <category term="Note" label ="Note" url="http://exampleurl.com/pathto/" />
  <category term="Wanted" label ="Wanted" url="http://otherurl.com/pathto/FINDME" />
  <author>
    <name>Dr. Thaddeus Remor</name>
    <email>tremor@quakelab.edu</email>
  </author>
  <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>
  <georss:where>
     <point xmlns="http://www.opengis.net/gml" srsName="urn:ogc:crs:EPSG:4326>
      <pos>45.256 -71.92</pos>
     </point>
  </georss:where>

  <entry>
    <title type="text">fileName.pdf</title>
   
  </entry>
  <entry>
    <title>fileName.jpg</title>
  </entry>
</feed>

Open in new window

PerlXML

Avatar of undefined
Last Comment
Gertone (Geert Bormans)

8/22/2022 - Mon
ASKER CERTIFIED SOLUTION
Gertone (Geert Bormans)

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
trevor1940

ASKER
Thank you for that

Any idea about the  other  issues?
SOLUTION
Gertone (Geert Bormans)

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
SOLUTION
Gertone (Geert Bormans)

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
trevor1940

ASKER
Thanx

(the snippet you posted is unvalid, since the sc namespace is not declared)

I'll double check when back in the office but I'm pretty sure this is how it is in the actual file No I cannot post it before I'm asked
SOLUTION
Gertone (Geert Bormans)

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
trevor1940

ASKER
Apologies for errors in the xml these were caused by fat fingers from a closed system to internet PC all the above worked
One last question given this

foreach my $Etitle ($dom->findnodes('//dft:feed/dft:entry/dft:title)) {
    print "Etitle $Etitle\n";  # prints <title type="text">fileName.pdf</title>
    my $EtitleVal = $Etitle->findvalue('./title');  ## Fails
    if($Etilte =~ m/jpg/){
      print "Image $EtitleVal\n";  # prints Image
     } 
    elsif($Etilte =~ m/pdf/){
      print "PDF $EtitleVal\n";  # prints <title type="text">fileName.pdf</title>
     } 
    
}

Open in new window


First why is this printing the tag eg " <title type="text">fileName.pdf</title>"
Secondly how   do I tell the difference  between the two?
Your help has saved me hundreds of hours of internet surfing.
fblack61
SOLUTION
Gertone (Geert Bormans)

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
trevor1940

ASKER
Yes to copy / paste errors

So are you saying by getting the my $EtitleVal = $Etitle->data; then testing / pattern matching for m/jpg/ is the way to do this and not test if"<title type="text">?
SOLUTION
Gertone (Geert Bormans)

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
trevor1940

ASKER
Yes I was doing regex I like your way better Thank you.

It seems within a foreach loop need to do something like this

foreach my $Etitle ($dom->findnodes('//dft:feed/dft:entry/dft:title)) {
         my $EtitleVal = Etitle->textContent;
}

Open in new window

where as if you go after a single entry you don't.

  my $Etitle = $dom->findnodes('//dft:feed/dft:entry/dft:title);

Open in new window


Given
  <entry>
    <title>fileName.jpg</title>
     <link href="PathTo/fileName.jpg" />
  </entry>

Open in new window


Can I go after '<entry>' and get it's children in order to keep title and link together?
SOLUTION
Gertone (Geert Bormans)

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
trevor1940

ASKER
Thank you for your continued help

This seems to be how to get the child nodes

foreach my $Entry ($dom->findnodes("//dft:feed/dft:entry")) {

     foreach my $Images ($dom->findnodes("//dft:title[not(\@type='text')]", $Entry)) {
         my $ImageVal = $Images->textContent;
          ####  This finds all the Images

     }

}

Open in new window


Given this

  <entry xmlns:georss="http://www.georss.org/georss/10" xsi:schemaLocation ="http://www.url1.net/path/ http://www.url2.net/path/11  http://www.url3.net/path/23" >
    <title>fileName.jpg</title>
     <link href="PathTo/fileName.jpg" />
  </entry>
  <entry>
    <title type="text">fileName.pdf</title>
     <link type="application/pdf"  href="PathTo/fileName.pdf" />
  </entry>

Open in new window


Is there a way of testing if <entry> contains  a namespace or xsi:schemaLocation  I searched google  but found nothing possibly because not sure what to search for ie XPATH node has namespace
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
trevor1940

ASKER
Thank You
Gertone (Geert Bormans)

welcome,

I just noticed I missed one follow up question apparently.
I am not sure on how to test for namespace nodes
For a parser it is only relevant to know which is the default namespace and which prefixes are bound to which namespace at a specific location, regardless of at which level the binding is declared

note that XPath allows you to look for all namespace nodes
//namespace::*
that could help you to get the namespace node on your current node
Gertone (Geert Bormans)

stackoverflow.com/questions/7388555/xmllibxml-find-and-register-namespaces-used-in-a-document

for inspiration
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
trevor1940

ASKER
Thanx that was one of the few links I had found

I closed this because the task has been pulled however may need to revisit

For my own interest

If you can't  test for namespace nodes directly but can find the child via "(\@type='text')" then get the parent <entry> then search back down for <link> thus ensuring <type> and other siblings are dealt with together?
Gertone (Geert Bormans)

yes, you can do that