lucavilla
asked on
XML extraction (XQuery) mod. to add the root element and avoid to add namespaces?
With the XML parser Saxon (+tagsoup and with declared default namespace), if I make an extraction with the XPATH "//span" over an html webpage I get this correct result:
However, the problem is that I want to further parse this XML with a not tagsoupped parser/visualizer and I can't because this is not valid XML since it it says that it has many (span) root elements.
How can I tell in the XQUERY to add a predefined root element?
Furthermore, to make the XML more compact and easier for human lecture, is it possible to avoid that automatic addition of all those namespaces? (not there in the original html)
<?xml version="1.0" encoding="UTF-8"?>
<span xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">My Stuff</span>
<span xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">Search</span>
<span xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">What's Hot</span>
<span xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">Live Action</span>
<span xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">Community</span>
However, the problem is that I want to further parse this XML with a not tagsoupped parser/visualizer and I can't because this is not valid XML since it it says that it has many (span) root elements.
How can I tell in the XQUERY to add a predefined root element?
Furthermore, to make the XML more compact and easier for human lecture, is it possible to avoid that automatic addition of all those namespaces? (not there in the original html)
ASKER
Do you mean to modify the XQUERY like this?
Is my syntax correct?
declare default element namespace 'http://www.w3.org/1999/xhtml';
declare option saxon:output 'method=xml';
return
<root>
{
doc('FILE:///C:/original.html')//span
}
</root>
Is my syntax correct?
that should work, but you should put the namespace declarations in the root element in order to not have the namespace nodes in every span
ASKER
ah now I understood why you put it there again.
I tried with it:
however I get this error:
"Error on line 4 column 7 of xquery.xq:
XPST0003 XQuery syntax error near #...thod=xml'; return <root xmlns=#:
Unexpected token name "xmlns" beyond end of query
Static error(s) in query"
I tried with it:
declare default element namespace 'http://www.w3.org/1999/xhtml';
declare option saxon:output 'method=xml';
return
<root xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">
{
doc('FILE:///C:/original.html')//span
}
</root>
however I get this error:
"Error on line 4 column 7 of xquery.xq:
XPST0003 XQuery syntax error near #...thod=xml'; return <root xmlns=#:
Unexpected token name "xmlns" beyond end of query
Static error(s) in query"
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Fabulous! it works! Thanks Gertone!
Result:
Result:
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:html="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml">
<span>My Stuff</span>
<span>Search</span>
<span>What's Hot</span>
<span>Live Action</span>
<span>Community</span>
</root>
welcome
Open in new window