I want to analyze some html files (stored locally) and to be able to retrieve all the links, images or whatever other tags.
Now I do this by loading the file into a webbrowser control, then I can read
The problem is many files have images inserted with absolute http://
paths, try to connect to another websites(like visit counters etc), display alerts, prompts or confirmation boxes, script errors and so on.
I want to analyze the file without the user to see anything. But if he's offline, having anything that tries to access the web may have undesired results like launching phone dialers, error messages, etc. If he's online, the file will be loaded slowly because it accesses online stuffs.
Also there is no way to prevent alerts and prompts to appear. Setting offline and silent properties to True has no effect.
So my question is can I analyze a Html file without loading it into a webbrowser(or how to avoid the above problems if I use a webbrowser).
Of course I don't mean a substring search solution, like to look for "<A HREF", then to look for the closing ">" etc.