Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

how to download a webpage including images,applets,..

Posted on 2004-09-15
12
Medium Priority
?
163 Views
Last Modified: 2012-05-05
Hi,

How can I write a routine that will download a web page and store it on my hard disk.  It also has to download all associated objects including Images,Applets, etc ?

for eg:if i type "www.yahoo.com" ,i should download the complete webpage including all.

i have tried with the following code i can not able to get the images

                        URL url=new URL("http://www.kumudam.com");
                        InputStream conn=url.openStream();
              BufferedReader input = new BufferedReader(new InputStreamReader(conn));
                    String line ;
                        byte data[]= new byte[1000];
                        int size = input.read();
                        System.out.println(size);
                        while ((line = input.readLine()) != null)
                  
                            System.out.println(line);
                            input.close();
Thanks in advance
0
Comment
Question by:vihar123
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
  • +1
12 Comments
 
LVL 92

Accepted Solution

by:
objects earned 1000 total points
ID: 12071200
you'll need to parse the html, here's an example showing how to extract all the links.

http://www.javaalmanac.com/egs/javax.swing.text.html/GetLinks.html

You can then download the data pointed to in the link.
0
 
LVL 21

Expert Comment

by:MogalManic
ID: 12071580
objects is right.

When a browser access a page, it loads the page and all of its assocated objects through a series of GETs.  The first get is the HTML of course.  Then the browser parses the HTML and as it encounters a tag that refers to an external resource it issues another GET to retrieve and render the resource.  Your code would have to do something simalar.
0
 
LVL 18

Expert Comment

by:armoghan
ID: 12071997
and when you get the image links as described by the link given by objects,
you can save the images to local file system like

http://forum.java.sun.com/thread.jsp?thread=433352&forum=31&message=1940583

basically you would be looking for
HTML.Attribute.IMG

you will need to find the <APPLET tag as well for downloading applets


0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:vihar123
ID: 12083034
hi objects,

i have seen the same code and tried i am getting javax.swing.text.changedcharsetexception,i cannot able to rectify.

pls help me  :)
0
 
LVL 92

Expert Comment

by:objects
ID: 12089306
doc.putProperties("IgnoreCharacterSet", new Boolean(true));
0
 

Author Comment

by:vihar123
ID: 12224618
hi,
this HTML parser is not working for all websites.pls help me out :)
0
 
LVL 92

Expert Comment

by:objects
ID: 12224634
yes if it not standard html, or uses recent features it will have problems.
0
 

Author Comment

by:vihar123
ID: 12228311
hi objects,
what to do in this case? any idea..
0
 
LVL 92

Expert Comment

by:objects
ID: 12232177
you need to look at a different parser, perhaps a commercial one.
0
 
LVL 18

Assisted Solution

by:armoghan
armoghan earned 1000 total points
ID: 12234430
0

Featured Post

Build and deliver software with DevOps

A digital transformation requires faster time to market, shorter software development lifecycles, and the ability to adapt rapidly to changing customer demands. DevOps provides the solution.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Java Flight Recorder and Java Mission Control together create a complete tool chain to continuously collect low level and detailed runtime information enabling after-the-fact incident analysis. Java Flight Recorder is a profiling and event collectio…
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
Viewers learn about the “for” loop and how it works in Java. By comparing it to the while loop learned before, viewers can make the transition easily. You will learn about the formatting of the for loop as we write a program that prints even numbers…
Viewers will learn about basic arrays, how to declare them, and how to use them. Introduction and definition: Declare an array and cover the syntax of declaring them: Initialize every index in the created array: Example/Features of a basic arr…
Suggested Courses

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question