[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

WEB BOT

Posted on 1997-08-02
11
Medium Priority
?
398 Views
Last Modified: 2010-04-04
I ask this question a few months ago, but the answer I got was not very clear, so here I go again.  I am building a Web Bot, that I want to extract images, links, email, movies, etc., and to do it several levels deep, I have everything now except that it will not extract emails or links, and will not go further than the first page, can someone point me in the right direction this time?  I know I need the program to parse the information, and I have created a parser.  However I appearantly have the wrong code for it to pull the mailto's and links.  Please answer as soon as possible.  Also if someone nows of a better way to do this I am open to suggestions.  I currently use Delphi 2.0,.  I know of only one book on this topic and I currently have it but it is not very clear either.

Thank you,
Tony
0
Comment
Question by:aj85
  • 6
  • 4
11 Comments
 

Author Comment

by:aj85
ID: 1340376
Edited text of question
0
 

Expert Comment

by:kimfriis
ID: 1340377
I am not sure that this is what you are looking for? But is it the Tags for the links and mailto's like: <A HREF ...>???
This should be easy if you know how to extract images and so on, you just say that if you find a <A HREF=mailto:...> then it is a mailto ??
Please clerify if this is not what you want
0
 

Author Comment

by:aj85
ID: 1340378
Actually I have figured out the problem of extracting links & mailto's since I posted this question.  However I can't separate the two, i.e. the links and mialto's come in on the same page.  Also I still need the answer on how to make the program go beyond the first page.  I will be waiting for an answer.
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 1

Accepted Solution

by:
kyriacos earned 500 total points
ID: 1340379
I am neither sure for the information you want.
So you made a parcer.
That's good... Parcers work in an intelligent way so they do not be confused with multiple versions of the same meaning.

So if you want to separete links, emails, images from an HTML content (the source HTML) you will have to read the whole tag command with its parameters, and then YOU decide what the <A HREF ...> command will to.

To make things more clear.

Read the HTML until you find the string "<A HREF"
This is necessary.
Then read any parameter until you find the ">".
This is also necessary.

So whatever the HTML command will do will be enclosed the mentioned "<A HREF" AND ">" pairs.
If you encounter:
 a MAILTO parameter read the address
 a .jpg read the image location
 a .gif read the image location as well
 a .wav also read the location
 an .html -> read the new page location AND ALSO SAVE THE URL in a linked list of URLs because you will need this to find other links to DEPTH 2.

After you finish with the parcing call your main parce procedure with each link you found in page 1. Then do the same with links from pages in depth 2 to find the depth 3 pages and so on...

I can write you some code of this, on request for free if i got in the spirit of your question.
0
 

Author Comment

by:aj85
ID: 1340380
Yes you have got a good idea for what I am looking for, if you could write the code with an example I will increase my points to 220.  Also can you tell me how to get a count of the number of emails, images, etc. that have been collected.  I will give a bonus of 50 points if you can answer this.  Please answer as soon as possible.

Thanks
Tony
0
 

Author Comment

by:aj85
ID: 1340381
Follow up to comment added.  I want to get an automatic count of the emails, images, etc., as they come in.
0
 

Author Comment

by:aj85
ID: 1340382
Adjusted points to 250
0
 
LVL 1

Expert Comment

by:kyriacos
ID: 1340383
hello,
  sorry if i'm late
have a look at
http://members.tripod.com/~kyriacos/htmlparcer.zip

it contains a sample prorgam that adds the links, images and email addresses.

NOTE: There are 2 buttons in the program. Before you press the "Process" button, you must press the button "Save..."

How it works for your needs...


0
 
LVL 1

Expert Comment

by:kyriacos
ID: 1340384
NOTE: This program works by searching in the source file for the keywords:
HREF - which indicates a link
IMG - which indicates an image and
MAILTO - which indicates a MAILTO

then it updates the variables used to count the instanses of each one...
0
 

Author Comment

by:aj85
ID: 1340385

The sample code you wrote was fine except that I already have a parser.  However the count part of the program gives me some insight.  But what I need to know is how to make the program go serveral levels deep, and get an automatic count as it finds the images, etc..  I am not sure this can be accomplish in Delphi 2.0, do you think there is another direction I should be headed in?  Prehaps another lang.  Please answer at your earliest convenience.

Thanks
Tony
0
 
LVL 1

Expert Comment

by:kyriacos
ID: 1340386
delphi is fine...just fine... it will do anything you want.
i will give you an example later tomorrow...
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction The parallel port is a very commonly known port, it was widely used to connect a printer to the PC, if you look at the back of your computer, for those who don't have newer computers, there will be a port with 25 pins and a small print…
Introduction I have seen many questions in this Delphi topic area where queries in threads are needed or suggested. I know bumped into a similar need. This article will address some of the concepts when dealing with a multithreaded delphi database…
Loops Section Overview
When cloud platforms entered the scene, users and companies jumped on board to take advantage of the many benefits, like the ability to work and connect with company information from various locations. What many didn't foresee was the increased risk…
Suggested Courses

873 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question