?
Solved

How to write a Website crawler (mirror) in java?

Posted on 2011-09-20
2
Medium Priority
?
371 Views
Last Modified: 2012-05-12
Hi,

I need to write a simple web crawler to download (mirror) web site and its subfolder contents.
I googled many java web crawler but I am having a hard time understand them.

I am told to use the httpclient api but I don't know how to use it.
I need to find something simple so I can test it out.
Can you help me start this off?

I played with the lucene indexer. I was able to index the website.
But how can i download files and from subfolders and keep the tree strucutres?
I never done this kind of thing so I am lost.

0
Comment
Question by:dkim18
2 Comments
 
LVL 40

Assisted Solution

by:Gurvinder Pal Singh
Gurvinder Pal Singh earned 400 total points
ID: 36568727
0
 
LVL 47

Accepted Solution

by:
for_yan earned 1600 total points
ID: 36568741

If you type this in Google:
"how to write web crawler in java"
you find lots of links, some  of them conatin codes like

http://forums.techarena.in/technology-internet/1297810.htm

which they claim are working

others are tutorials with much more stuff

httpclient is only part of the issue - seems to me the simpler part

I would start looking at these recommendations - it is quite probable some of them
are quite reasonable



0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This was posted to the Netbeans forum a Feb, 2010 and I also sent it to Verisign. Who didn't help much in my struggles to get my application signed. ------------------------- Start The idea here is to target your cell phones with the correct…
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
Viewers will learn about basic arrays, how to declare them, and how to use them. Introduction and definition: Declare an array and cover the syntax of declaring them: Initialize every index in the created array: Example/Features of a basic arr…
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.
Suggested Courses
Course of the Month14 days, 10 hours left to enroll

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question