Solved

WEB BOT

Posted on 1997-08-02
11
395 Views
Last Modified: 2010-04-04
I ask this question a few months ago, but the answer I got was not very clear, so here I go again.  I am building a Web Bot, that I want to extract images, links, email, movies, etc., and to do it several levels deep, I have everything now except that it will not extract emails or links, and will not go further than the first page, can someone point me in the right direction this time?  I know I need the program to parse the information, and I have created a parser.  However I appearantly have the wrong code for it to pull the mailto's and links.  Please answer as soon as possible.  Also if someone nows of a better way to do this I am open to suggestions.  I currently use Delphi 2.0,.  I know of only one book on this topic and I currently have it but it is not very clear either.

Thank you,
Tony
0
Comment
Question by:aj85
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 4
11 Comments
 

Author Comment

by:aj85
ID: 1340376
Edited text of question
0
 

Expert Comment

by:kimfriis
ID: 1340377
I am not sure that this is what you are looking for? But is it the Tags for the links and mailto's like: <A HREF ...>???
This should be easy if you know how to extract images and so on, you just say that if you find a <A HREF=mailto:...> then it is a mailto ??
Please clerify if this is not what you want
0
 

Author Comment

by:aj85
ID: 1340378
Actually I have figured out the problem of extracting links & mailto's since I posted this question.  However I can't separate the two, i.e. the links and mialto's come in on the same page.  Also I still need the answer on how to make the program go beyond the first page.  I will be waiting for an answer.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 1

Accepted Solution

by:
kyriacos earned 250 total points
ID: 1340379
I am neither sure for the information you want.
So you made a parcer.
That's good... Parcers work in an intelligent way so they do not be confused with multiple versions of the same meaning.

So if you want to separete links, emails, images from an HTML content (the source HTML) you will have to read the whole tag command with its parameters, and then YOU decide what the <A HREF ...> command will to.

To make things more clear.

Read the HTML until you find the string "<A HREF"
This is necessary.
Then read any parameter until you find the ">".
This is also necessary.

So whatever the HTML command will do will be enclosed the mentioned "<A HREF" AND ">" pairs.
If you encounter:
 a MAILTO parameter read the address
 a .jpg read the image location
 a .gif read the image location as well
 a .wav also read the location
 an .html -> read the new page location AND ALSO SAVE THE URL in a linked list of URLs because you will need this to find other links to DEPTH 2.

After you finish with the parcing call your main parce procedure with each link you found in page 1. Then do the same with links from pages in depth 2 to find the depth 3 pages and so on...

I can write you some code of this, on request for free if i got in the spirit of your question.
0
 

Author Comment

by:aj85
ID: 1340380
Yes you have got a good idea for what I am looking for, if you could write the code with an example I will increase my points to 220.  Also can you tell me how to get a count of the number of emails, images, etc. that have been collected.  I will give a bonus of 50 points if you can answer this.  Please answer as soon as possible.

Thanks
Tony
0
 

Author Comment

by:aj85
ID: 1340381
Follow up to comment added.  I want to get an automatic count of the emails, images, etc., as they come in.
0
 

Author Comment

by:aj85
ID: 1340382
Adjusted points to 250
0
 
LVL 1

Expert Comment

by:kyriacos
ID: 1340383
hello,
  sorry if i'm late
have a look at
http://members.tripod.com/~kyriacos/htmlparcer.zip

it contains a sample prorgam that adds the links, images and email addresses.

NOTE: There are 2 buttons in the program. Before you press the "Process" button, you must press the button "Save..."

How it works for your needs...


0
 
LVL 1

Expert Comment

by:kyriacos
ID: 1340384
NOTE: This program works by searching in the source file for the keywords:
HREF - which indicates a link
IMG - which indicates an image and
MAILTO - which indicates a MAILTO

then it updates the variables used to count the instanses of each one...
0
 

Author Comment

by:aj85
ID: 1340385

The sample code you wrote was fine except that I already have a parser.  However the count part of the program gives me some insight.  But what I need to know is how to make the program go serveral levels deep, and get an automatic count as it finds the images, etc..  I am not sure this can be accomplish in Delphi 2.0, do you think there is another direction I should be headed in?  Prehaps another lang.  Please answer at your earliest convenience.

Thanks
Tony
0
 
LVL 1

Expert Comment

by:kyriacos
ID: 1340386
delphi is fine...just fine... it will do anything you want.
i will give you an example later tomorrow...
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A lot of questions regard threads in Delphi.   One of the more specific questions is how to show progress of the thread.   Updating a progressbar from inside a thread is a mistake. A solution to this would be to send a synchronized message to the…
Introduction I have seen many questions in this Delphi topic area where queries in threads are needed or suggested. I know bumped into a similar need. This article will address some of the concepts when dealing with a multithreaded delphi database…
In this video, viewers are given an introduction to using the Windows 10 Snipping Tool, how to quickly locate it when it's needed and also how make it always available with a single click of a mouse button, by pinning it to the Desktop Task Bar. Int…
NetCrunch network monitor is a highly extensive platform for network monitoring and alert generation. In this video you'll see a live demo of NetCrunch with most notable features explained in a walk-through manner. You'll also get to know the philos…

691 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question