I would like to extract the XPATH //DIV[@id="ps-content"] out from this web page: http://www.amazon.com/dp/1449319432
(saved as a local file)
I would like to do it with a single line of command-line with one of the best parsers, that is Saxon-PE.
So far the shortest solution that I (seemed to have) found is with these two lines:
java -jar tagsoup-1.2.1.jar <page.html >page.xhtml"
java -cp saxon9pe.jar net.sf.saxon.Query -s:"test.xhtm" -qs:"//*:div[@id='ps-content']"
The first line (TagSoup) is necessary for correcting the original malformed HTML to wellformed XML however I read that Saxon-PE has embedded TagSoup capability (see http://saxonica.com/documentation9.4-demo/html/extensions/functions/parse-html.html
), how can I integrate my two lines into a single line?