• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1139
  • Last Modified:

delphi spider website

I need a code or component (might b efree or commercial)  that :
- copy an entire (dynamic or not) website to the local hard disk
- copying must be multi thread

Structure of the website should be preserved
0
yarek
Asked:
yarek
  • 8
  • 4
  • 4
  • +1
1 Solution
 
TheRealLokiSenior DeveloperCommented:
You can do this by using a TWebBrowser to save is as a .mht (includes images, so you can email the page as 1 file etc)
(example halfway down the page)
http://delphi.about.com/od/internetintranet/l/aa062904a.htm

There is an "internet explorer" call you can do which will do the same as "save as" does in IE (creates a directory and puts all the images etc in it along with the main html)
I can't find the code until I get home, but maybe someone else here has it
0
 
yarekAuthor Commented:
not good: it mist be multihread nad grab a WHOLE website, not only a webpage
0
 
Eddie ShipmanAll-around developerCommented:
Well, you can use the Chilkat Spider ActiveX, a free ActiveX component and their sample:
http://www.example-code.com/vb/spiderSite.asp (VB Sample source)

Or felix Colibri's fine Delphi code sample and article:
spider/web_spider.html" target="_blank" onclick="return openNew(this.href);">http://www.felix-colibri.com/papers/web/web_spider/web_spider.html
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
yarekAuthor Commented:
yes I know this link but it is not MULTI THREAD : it spiders page after page.
0
 
Eddie ShipmanAll-around developerCommented:
that is the ONLY Delphi spider code I have found in 5 yrs of looking.
If you want it multi-thread, do it yourself.
0
 
yarekAuthor Commented:
I have tried the Chilkat Spider ActiveX : it saves some .DAT files with corrupted headers and works pretty bad: freezes PC for a while...
0
 
ginsonicCommented:
take a look at www.torry.net for ALWebSpider
0
 
yarekAuthor Commented:
ALWebSpider is excellent.. except, it does not run with Delphi 6 !
0
 
ginsonicCommented:
Have you test it? I know that is specified that is for D7 , but this don't mean that can't work on D6.
0
 
yarekAuthor Commented:
I have tested it and there are some functions that cannot be compiled
like valuefromindex...
maybe you can try... +2000 pts more
0
 
TheRealLokiSenior DeveloperCommented:
can you run the demo\ALWebSpider\ALWebSpider.exe? does this do everything you need?
0
 
yarekAuthor Commented:
yes : the .EXE DEMO is good.
But the ALWebSpider component does not install in D6: there are a few D7 specific functions inside: I tried to adapt it (very quickly) to D6 and It did install but running the demo,  only the first page page was spired : I believe I have done it too quickly

There are mainly 2 errors:

an error about TdateTime with regional parameter (do not remember the error)
and an error about TstringList.valuefromIndex property that does not exist in D6 and that I tried to translated into a nested .values propetries that are recoginzed in D6...

->BUT I FAILED

I think it must be a piece of cake for someone who has already transalted some D7 to D6
Thanks
0
 
TheRealLokiSenior DeveloperCommented:
The ValueFromIndex of Delphi 7 looks like the following

function TStrings.GetValueFromIndex(Index: Integer): string;
begin
  if Index >= 0 then
    Result := Copy(Get(Index), Length(Names[Index]) + 2, MaxInt) else
    Result := '';
end;

procedure TStrings.SetValueFromIndex(Index: Integer; const Value: string);
begin
  if Value <> '' then
  begin
    if Index < 0 then Index := Add('');
    Put(Index, Names[Index] + NameValueSeparator + Value);
  end
  else
    if Index >= 0 then Delete(Index);
end;


For Delphi 5, 6, etc you could write a function like the following to do the same thing in
(put these just after the "implementation" line in any units that need it, or use a shared unit)

function GetValueFromIndex(S: TStrings; Index: Integer): string;
begin
  if Index >= 0 then
    Result := Copy(S[Index], Length(S.Names[Index]) + 2, MaxInt) else
    Result := '';
end;

procedure SetValueFromIndex(S: TStrings; Index: Integer; const Value: string);
begin
  if Value <> '' then
  begin
    if Index < 0 then Index := S.Add('');
    S[Index] := S.Names[Index] + '='{NameValueSeparator} + Value;
  end
  else
    if Index >= 0 then S.Delete(Index);
end;

so instead of saying
label1.caption := S.ValueFromIndex(2);
you use
label1.caption := GetValueFromIndex(S, 2);

instead of saying
S.ValueFromIndex(2) := 'hello';
you use
SetValueFromIndex(S, 2, 'hello')

hope this helps.
let me know in more detail what the other error was;
0
 
yarekAuthor Commented:
ok, I will try to compile it again and will send ALL ERRORS.
maybe the simplest would be that you do compile it in D6
0
 
TheRealLokiSenior DeveloperCommented:
I can't even run the .exe demo on my pc. complains about a missing winhttp.dll, which is why I asked if you could :-) Prefer to just help you D6 it if that's ok
0
 
ginsonicCommented:
I started to modify. Still get problems with GetLocaleFormatSettings
0
 
Eddie ShipmanAll-around developerCommented:
What problems are you having with GetLocaleFormatSettings?
0
 
yarekAuthor Commented:
GetLocaleFormatSettings : is not an DELPHI6 function.

I simply deleted this line and that is maybe why this component does not work properly anymore.
0
 
Eddie ShipmanAll-around developerCommented:
It was added in D7 in sysUtils. It sets the FormatSetting identifiers, i.e. ShortDateFormat,
based on the Windows Locale settings.

Not difficult to write this function yourself.
0
 
ginsonicCommented:
I solve already this  function. But after that a lots of new function wait after that. All from D7. How I see D7 get a lots of functions from C++ version.
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

  • 8
  • 4
  • 4
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now