Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

URL Spider

Posted on 2004-03-27
8
Medium Priority
?
389 Views
Last Modified: 2010-04-05
i need a method to nput a URL, search out all of the html files in that directory and all subdirectories (online). Make a list of these files
Then load each of these html files in the default browser. It should use the same browser window for all of them.
0
Comment
Question by:aliahmedali
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
8 Comments
 
LVL 8

Accepted Solution

by:
gmayo earned 216 total points
ID: 10694002
Most web servers won't allow you to browse their directory contents.

What Google-type spiders do is start from a given page and then find the links on that page. From each of those links, it downloads that page and looks for the links on that page. And so on.

Geoff M.
0
 
LVL 5

Assisted Solution

by:Jeff_2
Jeff_2 earned 212 total points
ID: 10694071
I don't know of an easy Delphi-based solution for this, but there are
some command-line tools that have somewhat similar functionality:
  http://www.gnu.org/software/wget/wget.html
  http://www.w3.org/Robot/
0
 

Author Comment

by:aliahmedali
ID: 10694102
thanks for caring but,

i want the code not an external appilcaon.

the code may be in other languages like c++, JAVA OR DELPHI

THANX AGAIN
0
 
LVL 5

Expert Comment

by:Jeff_2
ID: 10694122
Both of the links I posted are open-source software
0
 
LVL 11

Assisted Solution

by:shaneholmes
shaneholmes earned 212 total points
ID: 10694781
Here's a piece of code that'll take a URL and grab the contents into a TMemoryStream passed to the procedure, and save to a file, which you could then use to parse for all html files, and repeat the process over again...

Call it using, e.g.

  aMS := TMemoryStream.Create;
  try
    getURLOnStream('http://www.somewhere.com/datafile.xyz');
    aMs.Position := 0;
    aMS.SaveToFile('c:\myapp\datafile.xyz');
  finally
    aMS.Free;
  end;

Don't worry about the custom exception classes below - change them to
Exception if you like.  Also, there are a couple of global variables you
may have to define (like proxyServer - the proxyserver address, if you
need one).

Hope this helps.

Shane

procedure getURLOnStream(const aURL: string; aMS: TMemoryStream);
// Go get the URL aURL and write it to the stream aMS.
// General version, that can use either GET or POST
resourceString
  sMethod = 'GET';
  // sMethod = 'POST';
var
  aHi, aHConnect, aHFile: HInternet;
  bytesRead: DWORD;
  aBuf: PByteArray;
  s, t, u: string;
  gotIt: boolean;
  aURLc: TURLComponents;
begin
  // Initialization, fall-through
  aHi := nil;
  aHConnect := nil;
  aHFile := nil;

  // Bale out if no stream
  if not assigned(aMS) then
    raise EInetStreamError.create('No stream passed');

  // Crack the incoming URL
  setLength(s, INTERNET_MAX_PATH_LENGTH);
  setLength(t, INTERNET_MAX_PATH_LENGTH);
  setLength(u, INTERNET_MAX_PATH_LENGTH);

  //Clear the structure
  FillChar(aURLC, sizeOf(TURLComponents), 0);
  with aURLC do
  begin
    dwStructSize := sizeOf(TURLComponents);
    lpSzExtraInfo := PChar(s);
    dwExtraInfoLength := INTERNET_MAX_PATH_LENGTH;
    lpSzHostName := PChar(t);
    dwHostNameLength := INTERNET_MAX_PATH_LENGTH;
    lpszUrlPath := PChar(u);
    dwUrlPathLength := INTERNET_MAX_PATH_LENGTH;
  end;

  // Attempt to crack the URL
  if not InternetCrackUrl(PChar(aURL), 0, ICU_ESCAPE, aURLC) then
    raise EInetCrackURLError.createFmt('Error - %d = ', [GetLastError,
SysErrorMessage(GetLastError)]);

  // Get hold of a buffer that'll be used over and over for each read
  GetMem(aBuf, inetBufferSize);

  // Now go do it
  try
    // Open the internet
    if useProxyServer then // explicitly use the proxy server
      aHi := InternetOpen(PChar(Application.Name),
INTERNET_OPEN_TYPE_PROXY,
        PChar(proxyServer), nil, 0)
    else  // do default.  May still use a proxy server if one is set up
      ahI := InternetOpen(PChar(Application.Name),
INTERNET_OPEN_TYPE_PRECONFIG,
        nil, nil, 0);
    if (aHi = nil) then
      raise EInetOpenError.create('Could not open Internet');

    // Set options for the internet handle
    InternetSetOption(aHi, INTERNET_OPTION_CONNECT_TIMEOUT, @timeOutMS,
sizeOf(timeOutMS));

    // Make a connection to that host, raising an exception if no
connection}
    aHConnect := InternetConnect(aHI, aURLc.lpSzHostName,
INTERNET_INVALID_PORT_NUMBER, nil, nil,
      INTERNET_SERVICE_HTTP, 0, 0);
    if (aHConnect = nil) then
      raise EInetConnectError.createFmt('Could not connect to server
%s', [aURLc.lpSzHostName]);

    // Open a reqest to get ready to GET data, raising an exception if
not successful
    aHFile := HTTPOpenRequest(aHConnect, PChar(sMethod),
aURLc.lpSzUrlPath, HTTP_VERSION, nil,
      nil, INTERNET_FLAG_DONT_CACHE, 0);
    if (aHFile = nil) then
      raise(EHTTPOpenReqError.create('Could not open HTTP request'));

    // Add any extra headers to the request, raising an exception if not
successful
    //   if not HTTPAddRequestHeaders(aHFile, PChar(s), length(s),
HTTP_ADDREQ_FLAG_ADD) then
    //     raise(EHTTPAddReqError.create('Could not add HTTP request
header'));

    // Send the request, raising an exception if not successful
    if not HTTPSendRequest(aHFile, nil, 0, aURLc.lpSzExtraInfo,
aURLc.dwExtraInfoLength) then
      raise(EHTTPSendReqError.create('Could not send HTTP request'));

    // Loop to read the content from the URL in chunks of size
inetBufferSize.
    repeat
      // Let the program do other things
      Application.processMessages;

      // Get the next chunk
      gotIt := InternetReadFile(aHFile, aBuf, inetBufferSize,
bytesRead);

      // Pass it along to the stream
      if (gotIt and (bytesRead <> 0)) then
        aMS.WriteBuffer(aBuf^, bytesRead);

      // Repeat until we get no more data
    until (gotIt and (bytesRead = 0)) or (not gotIt);

  finally
    // Clean up memory
    FreeMem(aBuf, inetBufferSize);

    //Clean up by closing the handles.
    // According to the docs, we only need to close aHI,
    // which should automatically close the other ones that descend from
it
    InternetCloseHandle(aHFile);
    InternetCloseHandle(aHConnect);
    InternetCloseHandle(aHI);
  end;
end;
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article explains how to create forms/units independent of other forms/units object names in a delphi project. Have you ever created a form for user input in a Delphi project and then had the need to have that same form in a other Delphi proj…
Have you ever had your Delphi form/application just hanging while waiting for data to load? This is the article to read if you want to learn some things about adding threads for data loading in the background. First, I'll setup a general applica…
How to fix incompatible JVM issue while installing Eclipse While installing Eclipse in windows, got one error like above and unable to proceed with the installation. This video describes how to successfully install Eclipse. How to solve incompa…
In response to a need for security and privacy, and to continue fostering an environment members can turn to for support, solutions, and education, Experts Exchange has created anonymous question capabilities. This new feature is available to our Pr…
Suggested Courses

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question