A few years back there use to be this application called "Simple find". It was a nice Windows based app that could querry search engines by subject. It would then give you a list of results.
I have been trying to duplicate this feature in my own application and having some difficulty.
In my application the user can request a file type by extention.
I started out using Wininnit.dll with a VC++6.0 example called Tear that could download an html page.
My thinking was to download the page then parse it for either the location of files or other html page links. Then recursivley parse those pages and go on and on until the user depth -level had been reached or there where no more links found.
What I found was that not all the links are so straight forward. Some of them are full paths ="http://www.xxx.yyy.zzzz.com/apple.jpg
" while others where relative "apple.jpg". Besides that there seem to be many more page types than just htm or html.
So I went looking to see if someone had already conquered a parsing mechanism.
Then I was directed to WGet.
WGet is a great tool. But it still does not quite do what I want it to do. I use WGet with ShellExecute and just pass parms to the Wget.exe from my program.
This only sorta works. It seems that WGet has just as much problem with parsing as I precieve it to be a pain.
Not only this but I wanted to have the ability to querry search engines. As things stand now users have to enter a starting web page address.
I also notice that I can even look at some of the pages where WGet missed files and I see absolute and relative path files. ???
Then one day I was chatting and someone suggested that I have a common server and a php script. The users would go to the one site (not too sure about how that would perform) and each of there applications would querry that site and the php script would return them results that it obtained from the search engines.
The persons thought being that you can querry search engines but you have to be carefull because from time to time they change their format.
But it sounded as he was only guessing and had never done the scripting himself.
I know that by studying the address bar when I do searches from some of the less popular search engines that I could adjust the variable to change search content and page starting. That seemed hopefull.
But then I noticed that Google and perhaps Yahoo had some kind of restriction because I would get "page forbidden or "no access" (cant exactly remember) but I was not allowed access. Somehow it could detect that it was not an original querry but a machine generated one.
I notice that some of the less popular ones did not do this.
So I am here fishing for guidence.