Solved

Who have multi-thread Spider source code?

Posted on 2001-08-25
6
569 Views
Last Modified: 2010-04-06
Who have Spider source code?(Need Delphi source)
Thanks!
0
Comment
Question by:yuwang
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
6 Comments
 

Author Comment

by:yuwang
ID: 6427980
No one know???
0
 
LVL 3

Expert Comment

by:rondi
ID: 6428053
what is Spider ?
0
 

Expert Comment

by:djmcrae
ID: 6428663
Rondi - A spider, web-robot, bot etc simply a program that visits a number of Web sites - some search sites use them to index pages, others use them to target shopping sites etc.

yuwang - I once did a little one for fun, but all the code was lifted from this excellent article: http://www.inprise.com/delphi/news/delphi_developer/bolton/ everything you need, including downloadable code.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:yuwang
ID: 6430581
It is very good!
But I want get a webpage,and search the webpage'URL,then
get the webpage'url page,and get url...
How to do?
Thanks again!
0
 

Accepted Solution

by:
djmcrae earned 100 total points
ID: 6434711
Damn - I do not have a server at the moment, but I can post you an application. It is not a spider, but a site-saver (developed to start at the home page and follow all links within a dynamic (.asp or .php) site, saving the raw html and renamed graphical links to hard drive so that a dynamic site could be burnt to CD for demo purposes). Give me your email address to dmcrae@hotmail.com and I'll email the source.

But for others that may be following - and for yuwang in the meantime, hope this helps (basically the guts of it).

a TWebBrowser and a button on your form

procedure TForm1.btnGoClick(Sender: TObject);
var Flags: OLEVariant;
  URL: string;
begin
  //start processing the web site
  HTMLLinkCount:= 0;
  Flags:= 4; //navNoReadFromCache=4
  URL:= edWebAddress.Text;
  bFirstFire:= false; //global var
  WB1.Navigate(URL,Flags);
end;

this is the onDocumentComplete event of the TWebBrowser

procedure TForm1.WB1DocumentComplete(Sender: TObject;
  const pDisp: IDispatch; var URL: OleVariant);
begin
  if not(bFirstFire) then
  begin
    //set this boolean to prevent multiple firings in frames (we'll get the
    // frame contents seperately)
    if bInFrame then
      bInFrame:= false
    else
    begin
      bFirstFire:= true;
      WB1.Stop;
      ProcessPage;
    end;
  end;
end;

procedure TForm1.ProcessPage;
var Doc: IHTMLDocument2;
  PageAll: IHTMLElementCollection;
  pageItem: OLEVariant;
  k: integer;
begin
  Doc:= wb1.document as IHTMLDocument2;
  PageAll:= Doc.all;
  //showmessage(PageAll.toString);
  //showmessage(IntToStr(iCurrentParentLink)+' '+Doc.url);
  //this delay is a bodge to stop some framesets refiring
  //it may not be necessay
  Delay(300); //delay is a very handy utility from RxLib
  if CompareText(Copy(Doc.URL,1,6),'res://')=0 then
  begin
    //page is busted - not found
  end
  else begin
    bInFrame:= false;
    for k:= 0 to PageAll.Length-1 do
    begin
      pageItem:= pageAll.item(k, varEmpty);
      if pageItem.tagname='FRAME' then
      begin
        ProcessPageSaveLink(iCurrentParentLink, pageItem.src, 'FRAME');
        bInFrame:= true;
      end;
      if pageItem.tagname='IMG' then
      begin
        ProcessPageSaveLink(iCurrentParentLink, pageItem.src, 'IMG');
      end;
....
    end;
  end;
  //HTMLLinkArray[iCurrentParentLink].URLtoFollow := Doc.URL;
  HTMLLinkArray[iCurrentParentLink].isProcessed:= true;
  ProcessNextLink;
end

ProcessPageSaveLink just adds the link and link type to a dynamic array
the processNextLink reads any unfollowed links in the array, sets its followed flag, and points the TWebBrowser at this link to fetch (unless it is an image or document, then I use a TNMHTTP1 (this may not be in D5 pro - I have D5 ent - in that case, the Indy one may even be a better choice) to save it on to the hard drive).

Anything that need clearing up, just yell.
Apololgies for the delays, being in Australia, I'm probably asleep when you're up and vice versa.


0
 
LVL 17

Expert Comment

by:geobul
ID: 9288324
No comment has been added lately, so it's time to clean up this TA.
I will leave a recommendation in the Cleanup topic area that this question is:

accept djmcrae's comment as answer

Please leave any comments here within the next seven days.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

Thanks,

geobul
EE Cleanup Volunteer
0

Featured Post

[Webinar] How Hackers Steal Your Credentials

Do You Know How Hackers Steal Your Credentials? Join us and Skyport Systems to learn how hackers steal your credentials and why Active Directory must be secure to stop them. Thursday, July 13, 2017 10:00 A.M. PDT

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction Raise your hands if you were as upset with FireMonkey as I was when I discovered that there was no TListview.  I use TListView in almost all of my applications I've written, and I was not going to compromise by resorting to TStringGrid…
In my programming career I have only very rarely run into situations where operator overloading would be of any use in my work.  Normally those situations involved math with either overly large numbers (hundreds of thousands of digits or accuracy re…
Come and listen to Percona CEO Peter Zaitsev discuss what’s new in Percona open source software, including Percona Server for MySQL (https://www.percona.com/software/mysql-database/percona-server) and MongoDB (https://www.percona.com/software/mongo-…
This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have. In this…

729 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question