Link to home
Start Free TrialLog in
Avatar of lucavilla
lucavillaFlag for Italy

asked on

XML extraction (XQuery) mod. to add the root element and avoid to add namespaces?

With the XML parser Saxon (+tagsoup and with declared default namespace), if I make an extraction with the XPATH "//span" over an html webpage I get this correct result:

<?xml version="1.0" encoding="UTF-8"?>
<span xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">My Stuff</span>
<span xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">Search</span>
<span xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">What's Hot</span>
<span xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">Live Action</span>
<span xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">Community</span>

Open in new window



However, the problem is that I want to further parse this XML with a not tagsoupped parser/visualizer and I can't because this is not valid XML since it it says that it has many (span) root elements.
How can I tell in the XQUERY to add a predefined root element?

Furthermore, to make the XML more compact and easier for human lecture, is it possible to avoid that automatic addition of all those namespaces? (not there in the original html)
Avatar of Gertone (Geert Bormans)
Gertone (Geert Bormans)
Flag of Belgium image

do something like this

return
<spans xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">
{
// your original return goes here
}

</spans>

Open in new window

Avatar of lucavilla

ASKER

Do you mean to modify the XQUERY like this?

declare default element namespace 'http://www.w3.org/1999/xhtml';
declare option saxon:output 'method=xml';
return
<root>
{
doc('FILE:///C:/original.html')//span
}
</root>

Open in new window


Is my syntax correct?
that should work, but you should put the namespace declarations in the root element in order to not have the namespace nodes in every span
ah now I understood why you put it there again.

I tried with it:
declare default element namespace 'http://www.w3.org/1999/xhtml';
declare option saxon:output 'method=xml';
return
<root xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">
{
doc('FILE:///C:/original.html')//span
}
</root>

Open in new window



however I get this error:
"Error on line 4 column 7 of xquery.xq:
  XPST0003 XQuery syntax error near #...thod=xml'; return <root xmlns=#:
    Unexpected token name "xmlns" beyond end of query
Static error(s) in query"
ASKER CERTIFIED SOLUTION
Avatar of Gertone (Geert Bormans)
Gertone (Geert Bormans)
Flag of Belgium image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Fabulous! it works!  Thanks Gertone!

Result:
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:html="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml">
   <span>My Stuff</span>
   <span>Search</span>
   <span>What's Hot</span>
   <span>Live Action</span>
   <span>Community</span>
</root>

Open in new window