Solved

How to write a Website crawler (mirror) in java?

Posted on 2011-09-20
2
358 Views
Last Modified: 2012-05-12
Hi,

I need to write a simple web crawler to download (mirror) web site and its subfolder contents.
I googled many java web crawler but I am having a hard time understand them.

I am told to use the httpclient api but I don't know how to use it.
I need to find something simple so I can test it out.
Can you help me start this off?

I played with the lucene indexer. I was able to index the website.
But how can i download files and from subfolders and keep the tree strucutres?
I never done this kind of thing so I am lost.

0
Comment
Question by:dkim18
2 Comments
 
LVL 40

Assisted Solution

by:gurvinder372
gurvinder372 earned 100 total points
ID: 36568727
0
 
LVL 47

Accepted Solution

by:
for_yan earned 400 total points
ID: 36568741

If you type this in Google:
"how to write web crawler in java"
you find lots of links, some  of them conatin codes like

http://forums.techarena.in/technology-internet/1297810.htm

which they claim are working

others are tutorials with much more stuff

httpclient is only part of the issue - seems to me the simpler part

I would start looking at these recommendations - it is quite probable some of them
are quite reasonable



0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
JDeveloper 12c for 32 bit 4 84
custom annotations 9 38
xampp tool 12 48
servlet example 11 40
An old method to applying the Singleton pattern in your Java code is to check if a static instance, defined in the same class that needs to be instantiated once and only once, is null and then create a new instance; otherwise, the pre-existing insta…
Introduction This article is the second of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers the basic installation and configuration of the test automation tools used by…
Viewers learn how to read error messages and identify possible mistakes that could cause hours of frustration. Coding is as much about debugging your code as it is about writing it. Define Error Message: Line Numbers: Type of Error: Break Down…
This video teaches viewers about errors in exception handling.

808 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question