[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now



Posted on 1997-08-02
Medium Priority
Last Modified: 2010-04-04
I ask this question a few months ago, but the answer I got was not very clear, so here I go again.  I am building a Web Bot, that I want to extract images, links, email, movies, etc., and to do it several levels deep, I have everything now except that it will not extract emails or links, and will not go further than the first page, can someone point me in the right direction this time?  I know I need the program to parse the information, and I have created a parser.  However I appearantly have the wrong code for it to pull the mailto's and links.  Please answer as soon as possible.  Also if someone nows of a better way to do this I am open to suggestions.  I currently use Delphi 2.0,.  I know of only one book on this topic and I currently have it but it is not very clear either.

Thank you,
Question by:aj85
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 4

Author Comment

ID: 1340376
Edited text of question

Expert Comment

ID: 1340377
I am not sure that this is what you are looking for? But is it the Tags for the links and mailto's like: <A HREF ...>???
This should be easy if you know how to extract images and so on, you just say that if you find a <A HREF=mailto:...> then it is a mailto ??
Please clerify if this is not what you want

Author Comment

ID: 1340378
Actually I have figured out the problem of extracting links & mailto's since I posted this question.  However I can't separate the two, i.e. the links and mialto's come in on the same page.  Also I still need the answer on how to make the program go beyond the first page.  I will be waiting for an answer.
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!


Accepted Solution

kyriacos earned 500 total points
ID: 1340379
I am neither sure for the information you want.
So you made a parcer.
That's good... Parcers work in an intelligent way so they do not be confused with multiple versions of the same meaning.

So if you want to separete links, emails, images from an HTML content (the source HTML) you will have to read the whole tag command with its parameters, and then YOU decide what the <A HREF ...> command will to.

To make things more clear.

Read the HTML until you find the string "<A HREF"
This is necessary.
Then read any parameter until you find the ">".
This is also necessary.

So whatever the HTML command will do will be enclosed the mentioned "<A HREF" AND ">" pairs.
If you encounter:
 a MAILTO parameter read the address
 a .jpg read the image location
 a .gif read the image location as well
 a .wav also read the location
 an .html -> read the new page location AND ALSO SAVE THE URL in a linked list of URLs because you will need this to find other links to DEPTH 2.

After you finish with the parcing call your main parce procedure with each link you found in page 1. Then do the same with links from pages in depth 2 to find the depth 3 pages and so on...

I can write you some code of this, on request for free if i got in the spirit of your question.

Author Comment

ID: 1340380
Yes you have got a good idea for what I am looking for, if you could write the code with an example I will increase my points to 220.  Also can you tell me how to get a count of the number of emails, images, etc. that have been collected.  I will give a bonus of 50 points if you can answer this.  Please answer as soon as possible.


Author Comment

ID: 1340381
Follow up to comment added.  I want to get an automatic count of the emails, images, etc., as they come in.

Author Comment

ID: 1340382
Adjusted points to 250

Expert Comment

ID: 1340383
  sorry if i'm late
have a look at

it contains a sample prorgam that adds the links, images and email addresses.

NOTE: There are 2 buttons in the program. Before you press the "Process" button, you must press the button "Save..."

How it works for your needs...


Expert Comment

ID: 1340384
NOTE: This program works by searching in the source file for the keywords:
HREF - which indicates a link
IMG - which indicates an image and
MAILTO - which indicates a MAILTO

then it updates the variables used to count the instanses of each one...

Author Comment

ID: 1340385

The sample code you wrote was fine except that I already have a parser.  However the count part of the program gives me some insight.  But what I need to know is how to make the program go serveral levels deep, and get an automatic count as it finds the images, etc..  I am not sure this can be accomplish in Delphi 2.0, do you think there is another direction I should be headed in?  Prehaps another lang.  Please answer at your earliest convenience.


Expert Comment

ID: 1340386
delphi is fine...just fine... it will do anything you want.
i will give you an example later tomorrow...

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Have you ever had your Delphi form/application just hanging while waiting for data to load? This is the article to read if you want to learn some things about adding threads for data loading in the background. First, I'll setup a general applica…
In my programming career I have only very rarely run into situations where operator overloading would be of any use in my work.  Normally those situations involved math with either overly large numbers (hundreds of thousands of digits or accuracy re…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…
In response to a need for security and privacy, and to continue fostering an environment members can turn to for support, solutions, and education, Experts Exchange has created anonymous question capabilities. This new feature is available to our Pr…
Suggested Courses

656 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question