Parsing HTML - extracting <title> contents
Posted on 2002-06-27
I need to extract the contents from the <title> tag in several HTML documents. Up to now I've been using some self made construct which is starting to give me greif (especially when the title has attributes - which it seems to have on microsoft.com). Anyway I came accross the HTMLEditorKit.Parser class that maybe could do the job. However, from what I understand the parse() method runs in a separate thread. I need it to run in sequence (my getTitle() method returns the title as a String) to adapt it to the rest of the program. Can someone give an example of fetching the HTML title from a document given an InputStreamReader to the URL as input?