Solved

Who have multi-thread Spider source code?

Posted on 2001-08-25
6
567 Views
Last Modified: 2010-04-06
Who have Spider source code?(Need Delphi source)
Thanks!
0
Comment
Question by:yuwang
6 Comments
 

Author Comment

by:yuwang
ID: 6427980
No one know???
0
 
LVL 3

Expert Comment

by:rondi
ID: 6428053
what is Spider ?
0
 

Expert Comment

by:djmcrae
ID: 6428663
Rondi - A spider, web-robot, bot etc simply a program that visits a number of Web sites - some search sites use them to index pages, others use them to target shopping sites etc.

yuwang - I once did a little one for fun, but all the code was lifted from this excellent article: http://www.inprise.com/delphi/news/delphi_developer/bolton/ everything you need, including downloadable code.
0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 

Author Comment

by:yuwang
ID: 6430581
It is very good!
But I want get a webpage,and search the webpage'URL,then
get the webpage'url page,and get url...
How to do?
Thanks again!
0
 

Accepted Solution

by:
djmcrae earned 100 total points
ID: 6434711
Damn - I do not have a server at the moment, but I can post you an application. It is not a spider, but a site-saver (developed to start at the home page and follow all links within a dynamic (.asp or .php) site, saving the raw html and renamed graphical links to hard drive so that a dynamic site could be burnt to CD for demo purposes). Give me your email address to dmcrae@hotmail.com and I'll email the source.

But for others that may be following - and for yuwang in the meantime, hope this helps (basically the guts of it).

a TWebBrowser and a button on your form

procedure TForm1.btnGoClick(Sender: TObject);
var Flags: OLEVariant;
  URL: string;
begin
  //start processing the web site
  HTMLLinkCount:= 0;
  Flags:= 4; //navNoReadFromCache=4
  URL:= edWebAddress.Text;
  bFirstFire:= false; //global var
  WB1.Navigate(URL,Flags);
end;

this is the onDocumentComplete event of the TWebBrowser

procedure TForm1.WB1DocumentComplete(Sender: TObject;
  const pDisp: IDispatch; var URL: OleVariant);
begin
  if not(bFirstFire) then
  begin
    //set this boolean to prevent multiple firings in frames (we'll get the
    // frame contents seperately)
    if bInFrame then
      bInFrame:= false
    else
    begin
      bFirstFire:= true;
      WB1.Stop;
      ProcessPage;
    end;
  end;
end;

procedure TForm1.ProcessPage;
var Doc: IHTMLDocument2;
  PageAll: IHTMLElementCollection;
  pageItem: OLEVariant;
  k: integer;
begin
  Doc:= wb1.document as IHTMLDocument2;
  PageAll:= Doc.all;
  //showmessage(PageAll.toString);
  //showmessage(IntToStr(iCurrentParentLink)+' '+Doc.url);
  //this delay is a bodge to stop some framesets refiring
  //it may not be necessay
  Delay(300); //delay is a very handy utility from RxLib
  if CompareText(Copy(Doc.URL,1,6),'res://')=0 then
  begin
    //page is busted - not found
  end
  else begin
    bInFrame:= false;
    for k:= 0 to PageAll.Length-1 do
    begin
      pageItem:= pageAll.item(k, varEmpty);
      if pageItem.tagname='FRAME' then
      begin
        ProcessPageSaveLink(iCurrentParentLink, pageItem.src, 'FRAME');
        bInFrame:= true;
      end;
      if pageItem.tagname='IMG' then
      begin
        ProcessPageSaveLink(iCurrentParentLink, pageItem.src, 'IMG');
      end;
....
    end;
  end;
  //HTMLLinkArray[iCurrentParentLink].URLtoFollow := Doc.URL;
  HTMLLinkArray[iCurrentParentLink].isProcessed:= true;
  ProcessNextLink;
end

ProcessPageSaveLink just adds the link and link type to a dynamic array
the processNextLink reads any unfollowed links in the array, sets its followed flag, and points the TWebBrowser at this link to fetch (unless it is an image or document, then I use a TNMHTTP1 (this may not be in D5 pro - I have D5 ent - in that case, the Indy one may even be a better choice) to save it on to the hard drive).

Anything that need clearing up, just yell.
Apololgies for the delays, being in Australia, I'm probably asleep when you're up and vice versa.


0
 
LVL 17

Expert Comment

by:geobul
ID: 9288324
No comment has been added lately, so it's time to clean up this TA.
I will leave a recommendation in the Cleanup topic area that this question is:

accept djmcrae's comment as answer

Please leave any comments here within the next seven days.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

Thanks,

geobul
EE Cleanup Volunteer
0

Featured Post

Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
add combobox item based on numbers 9 154
Delphi - replicating a form 8 82
Delphi IDE crash without error message ... 7 99
FMX and jaudiotracker playing memory stream 29 87
This article explains how to create forms/units independent of other forms/units object names in a delphi project. Have you ever created a form for user input in a Delphi project and then had the need to have that same form in a other Delphi proj…
Hello everybody This Article will show you how to validate number with TEdit control, What's the TEdit control? TEdit is a standard Windows edit control on a form, it allows to user to write, read and copy/paste single line of text. Usua…
This video shows how to quickly and easily add an email signature for all users on Exchange 2016. The resulting signature is applied on a server level by Exchange Online. The email signature template has been downloaded from: www.mail-signatures…
The Email Laundry PDF encryption service allows companies to send confidential encrypted  emails to anybody. The PDF document can also contain attachments that are embedded in the encrypted PDF. The password is randomly generated by The Email Laundr…

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question