I am trying to extract the tags from html files
specifically the <a and <p sections
I noticed several similar questions in the group, which recommended using HTML Tidy
however when I parse the file, all I see is
NODE_DOCUMENT_TYPE with the name "html"
FWIW Internet Explorer shows the XML file fine.
Am I just forgetting to do something really obvious?>
I use Delphi, but the code should be fairly similar to other languages
Work_DOMDocument := nil;
OleCheck(CoCreateInstance(Class_DOMDocument40, nil, CLSCTX_ALL,IXMLDOMDocument, Work_DOMDocument));
if not Work_DOMDocument.load( 'test.xml' ) then
ShowMessage('Error loading DOMDocument'
DisplayXMLStructure(Work_DOMDocument); // simple routine that walks the nodes