Solved

How to write a Website crawler (mirror) in java?

Posted on 2011-09-20
2
363 Views
Last Modified: 2012-05-12
Hi,

I need to write a simple web crawler to download (mirror) web site and its subfolder contents.
I googled many java web crawler but I am having a hard time understand them.

I am told to use the httpclient api but I don't know how to use it.
I need to find something simple so I can test it out.
Can you help me start this off?

I played with the lucene indexer. I was able to index the website.
But how can i download files and from subfolders and keep the tree strucutres?
I never done this kind of thing so I am lost.

0
Comment
Question by:dkim18
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 40

Assisted Solution

by:gurvinder372
gurvinder372 earned 100 total points
ID: 36568727
0
 
LVL 47

Accepted Solution

by:
for_yan earned 400 total points
ID: 36568741

If you type this in Google:
"how to write web crawler in java"
you find lots of links, some  of them conatin codes like

http://forums.techarena.in/technology-internet/1297810.htm

which they claim are working

others are tutorials with much more stuff

httpclient is only part of the issue - seems to me the simpler part

I would start looking at these recommendations - it is quite probable some of them
are quite reasonable



0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

INTRODUCTION Working with files is a moderately common task in Java.  For most projects hard coding the file names, using parameters in configuration files, or using command-line arguments is sufficient.   However, when your application has vi…
Introduction This article is the last of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers our test design approach and then goes through a simple test case example, how …
Viewers learn about the scanner class in this video and are introduced to receiving user input for their programs. Additionally, objects, conditional statements, and loops are used to help reinforce the concepts. Introduce Scanner class: Importing…
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …

691 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question