Solved

How to write a Website crawler (mirror) in java?

Posted on 2011-09-20
2
362 Views
Last Modified: 2012-05-12
Hi,

I need to write a simple web crawler to download (mirror) web site and its subfolder contents.
I googled many java web crawler but I am having a hard time understand them.

I am told to use the httpclient api but I don't know how to use it.
I need to find something simple so I can test it out.
Can you help me start this off?

I played with the lucene indexer. I was able to index the website.
But how can i download files and from subfolders and keep the tree strucutres?
I never done this kind of thing so I am lost.

0
Comment
Question by:dkim18
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 40

Assisted Solution

by:gurvinder372
gurvinder372 earned 100 total points
ID: 36568727
0
 
LVL 47

Accepted Solution

by:
for_yan earned 400 total points
ID: 36568741

If you type this in Google:
"how to write web crawler in java"
you find lots of links, some  of them conatin codes like

http://forums.techarena.in/technology-internet/1297810.htm

which they claim are working

others are tutorials with much more stuff

httpclient is only part of the issue - seems to me the simpler part

I would start looking at these recommendations - it is quite probable some of them
are quite reasonable



0

Featured Post

Enroll in May's Course of the Month

May’s Course of the Month is now available! Experts Exchange’s Premium Members and Team Accounts have access to a complimentary course each month as part of their membership—an extra way to increase training and boost professional development.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

By the end of 1980s, object oriented programming using languages like C++, Simula69 and ObjectPascal gained momentum. It looked like programmers finally found the perfect language. C++ successfully combined the object oriented principles of Simula w…
In this post we will learn how to connect and configure Android Device (Smartphone etc.) with Android Studio. After that we will run a simple Hello World Program.
Viewers learn about the “for” loop and how it works in Java. By comparing it to the while loop learned before, viewers can make the transition easily. You will learn about the formatting of the for loop as we write a program that prints even numbers…
Viewers learn about the scanner class in this video and are introduced to receiving user input for their programs. Additionally, objects, conditional statements, and loops are used to help reinforce the concepts. Introduce Scanner class: Importing…
Suggested Courses

732 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question