?
Solved

How to write a Website crawler (mirror) in java?

Posted on 2011-09-20
2
Medium Priority
?
365 Views
Last Modified: 2012-05-12
Hi,

I need to write a simple web crawler to download (mirror) web site and its subfolder contents.
I googled many java web crawler but I am having a hard time understand them.

I am told to use the httpclient api but I don't know how to use it.
I need to find something simple so I can test it out.
Can you help me start this off?

I played with the lucene indexer. I was able to index the website.
But how can i download files and from subfolders and keep the tree strucutres?
I never done this kind of thing so I am lost.

0
Comment
Question by:dkim18
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 40

Assisted Solution

by:Gurvinder Pal Singh
Gurvinder Pal Singh earned 400 total points
ID: 36568727
0
 
LVL 47

Accepted Solution

by:
for_yan earned 1600 total points
ID: 36568741

If you type this in Google:
"how to write web crawler in java"
you find lots of links, some  of them conatin codes like

http://forums.techarena.in/technology-internet/1297810.htm

which they claim are working

others are tutorials with much more stuff

httpclient is only part of the issue - seems to me the simpler part

I would start looking at these recommendations - it is quite probable some of them
are quite reasonable



0

Featured Post

[Webinar] Lessons on Recovering from Petya

Skyport is working hard to help customers recover from recent attacks, like the Petya worm. This work has brought to light some important lessons. New malware attacks like this can take down your entire environment. Learn from others mistakes on how to prevent Petya like worms.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article is the last of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers our test design approach and then goes through a simple test case example, how …
Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
Viewers will learn about arithmetic and Boolean expressions in Java and the logical operators used to create Boolean expressions. We will cover the symbols used for arithmetic expressions and define each logical operator and how to use them in Boole…
How to fix incompatible JVM issue while installing Eclipse While installing Eclipse in windows, got one error like above and unable to proceed with the installation. This video describes how to successfully install Eclipse. How to solve incompa…
Suggested Courses
Course of the Month10 days, 6 hours left to enroll

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question