Hi Guys,
I am a VB.net newbie and am learning through creating applications I can actually use. Now my application is designed to crawl certain url's and extract needed information from them.
It is working fine in one threaded mode, but when you have to do 17k url's and extract information from 3 key seed url's (yahoo and google) then it needs multithreading.
For the parsing we are currently using webbrowser control but I realize now this is definately not going to work for multithreading.
We are currently using Document.GetAttribute type calls to exctract the information we need (href tags and innerhtml etc from links within the document).
What I need is to be able to say start 10 threads (or enter number of threads to use in main box in gui though I can add that later if we work out the thread part first).
That will read from the dataset containing the url's sequentially, so first thread grabs first url from first row, second will know that the first url has been taken so grab the next one in line, and so on.
Then each thread should be able to get the url and parse the contents of it so we can then extrac html links etc etc. Then write the extracted data back to the table.
So who can help me with setting up a multi url parsing multi threaded code to get moving?
I would be greatly appreciative.
Thanks
Start Free Trial