I am currently working on a web crawler.
It's a winforms application with developed with .net 3.5
I want to implement the following feature :
To be able to tag the pages that has a specified string in its source code.
How to use it :
The provides the string he is looking for and the application
will tag the pags that includes the provided string.
For now, the user has a textbox to type the word he is looking for.
He also has a combo box to chose if he wants the page's code source to
include the specified word or not.
And here is the issue :
Sometimes, the part of code querried can be hard to compute.
Let's say the user wants to search pages that contains a img tag
with attribute src equals "foo.jpg' and does not contain a link tag
with attribute href = "
http://domain.com".
I do not know how to implement a search on multiple criterias.
Any ideas?