Who have multi-thread Spider source code?

Who have Spider source code?(Need Delphi source)
Thanks!
yuwangAsked:
Who is Participating?
 
djmcraeConnect With a Mentor Commented:
Damn - I do not have a server at the moment, but I can post you an application. It is not a spider, but a site-saver (developed to start at the home page and follow all links within a dynamic (.asp or .php) site, saving the raw html and renamed graphical links to hard drive so that a dynamic site could be burnt to CD for demo purposes). Give me your email address to dmcrae@hotmail.com and I'll email the source.

But for others that may be following - and for yuwang in the meantime, hope this helps (basically the guts of it).

a TWebBrowser and a button on your form

procedure TForm1.btnGoClick(Sender: TObject);
var Flags: OLEVariant;
  URL: string;
begin
  //start processing the web site
  HTMLLinkCount:= 0;
  Flags:= 4; //navNoReadFromCache=4
  URL:= edWebAddress.Text;
  bFirstFire:= false; //global var
  WB1.Navigate(URL,Flags);
end;

this is the onDocumentComplete event of the TWebBrowser

procedure TForm1.WB1DocumentComplete(Sender: TObject;
  const pDisp: IDispatch; var URL: OleVariant);
begin
  if not(bFirstFire) then
  begin
    //set this boolean to prevent multiple firings in frames (we'll get the
    // frame contents seperately)
    if bInFrame then
      bInFrame:= false
    else
    begin
      bFirstFire:= true;
      WB1.Stop;
      ProcessPage;
    end;
  end;
end;

procedure TForm1.ProcessPage;
var Doc: IHTMLDocument2;
  PageAll: IHTMLElementCollection;
  pageItem: OLEVariant;
  k: integer;
begin
  Doc:= wb1.document as IHTMLDocument2;
  PageAll:= Doc.all;
  //showmessage(PageAll.toString);
  //showmessage(IntToStr(iCurrentParentLink)+' '+Doc.url);
  //this delay is a bodge to stop some framesets refiring
  //it may not be necessay
  Delay(300); //delay is a very handy utility from RxLib
  if CompareText(Copy(Doc.URL,1,6),'res://')=0 then
  begin
    //page is busted - not found
  end
  else begin
    bInFrame:= false;
    for k:= 0 to PageAll.Length-1 do
    begin
      pageItem:= pageAll.item(k, varEmpty);
      if pageItem.tagname='FRAME' then
      begin
        ProcessPageSaveLink(iCurrentParentLink, pageItem.src, 'FRAME');
        bInFrame:= true;
      end;
      if pageItem.tagname='IMG' then
      begin
        ProcessPageSaveLink(iCurrentParentLink, pageItem.src, 'IMG');
      end;
....
    end;
  end;
  //HTMLLinkArray[iCurrentParentLink].URLtoFollow := Doc.URL;
  HTMLLinkArray[iCurrentParentLink].isProcessed:= true;
  ProcessNextLink;
end

ProcessPageSaveLink just adds the link and link type to a dynamic array
the processNextLink reads any unfollowed links in the array, sets its followed flag, and points the TWebBrowser at this link to fetch (unless it is an image or document, then I use a TNMHTTP1 (this may not be in D5 pro - I have D5 ent - in that case, the Indy one may even be a better choice) to save it on to the hard drive).

Anything that need clearing up, just yell.
Apololgies for the delays, being in Australia, I'm probably asleep when you're up and vice versa.


0
 
yuwangAuthor Commented:
No one know???
0
 
rondiCommented:
what is Spider ?
0
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

 
djmcraeCommented:
Rondi - A spider, web-robot, bot etc simply a program that visits a number of Web sites - some search sites use them to index pages, others use them to target shopping sites etc.

yuwang - I once did a little one for fun, but all the code was lifted from this excellent article: http://www.inprise.com/delphi/news/delphi_developer/bolton/ everything you need, including downloadable code.
0
 
yuwangAuthor Commented:
It is very good!
But I want get a webpage,and search the webpage'URL,then
get the webpage'url page,and get url...
How to do?
Thanks again!
0
 
geobulCommented:
No comment has been added lately, so it's time to clean up this TA.
I will leave a recommendation in the Cleanup topic area that this question is:

accept djmcrae's comment as answer

Please leave any comments here within the next seven days.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

Thanks,

geobul
EE Cleanup Volunteer
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.