Solved

Read from URL

Posted on 2007-03-29
22
273 Views
Last Modified: 2013-11-23
How do I download a web page to a string buffer using the URL, and not take like a minute per 100 kb.

The following will work, but takes way too long.
private static String downloadURL(String url)
      {
            URL u;
            InputStream is = null;
            DataInputStream dis;
            String s;
            StringBuffer output = new StringBuffer(500000);
            try
                {
                    u = new URL(url);
                    is = u.openStream();
                    dis = new DataInputStream(new BufferedInputStream(is));
                    while ((s = dis.readLine()) != null)
                    {
                        output.append(s);
                    }
            }
            catch (MalformedURLException mue)
                {
                    System.out.println("Ouch - a MalformedURLException happened."+ url);
                    mue.printStackTrace();
                    System.exit(1);
                }
            catch (IOException ioe)
                {
                    System.out.println("Oops- an IOException happened."+url);
                    ioe.printStackTrace();
                    System.exit(1);
                }
            finally
                {
                    try
                    {
                        is.close();
                    }
                    catch (IOException ioe)
                    {}
                }
            return output.toString();
      }
0
Comment
Question by:Titanium_Sniper
  • 12
  • 4
  • 4
  • +1
22 Comments
 
LVL 23

Expert Comment

by:Ajay-Singh
ID: 18821572
are you sure the web page is text/?

For binary, the above code may not work.
0
 
LVL 5

Author Comment

by:Titanium_Sniper
ID: 18821614
its just a normal web page of product lists that I want to read, and extract the information from to calculate which is the best deal.
0
 
LVL 23

Expert Comment

by:Ajay-Singh
ID: 18821626
> I want to read, and extract the information from to calculate which is the best deal.

Use httpunit for simplicity: http://httpunit.sourceforget.net
0
 
LVL 5

Author Comment

by:Titanium_Sniper
ID: 18821643
Where is the code to download a web page in that zip file, it is really big and I am lost.
0
 
LVL 23

Expert Comment

by:Ajay-Singh
ID: 18821668
Its as simple as:

WebConversation conversation = new WebConversation();
String server = "http://www.dilbert.com/";

WebRequest request = new GetMethodWebRequest( server );
WebResponse page = conversation.getResponse( request );
0
 
LVL 5

Author Comment

by:Titanium_Sniper
ID: 18821686
so how do I get my program to recognize things like WebConversation, and WebRequst, I am new to Java.
0
 
LVL 5

Author Comment

by:Titanium_Sniper
ID: 18821698
wait, I found it, I was using the wrong jar file, now how do I use the methods in the jar file?, I am new.
0
 
LVL 23

Expert Comment

by:Ajay-Singh
ID: 18821794
keep the jar file in the classpath
0
 
LVL 5

Author Comment

by:Titanium_Sniper
ID: 18822780
I have the following and it says it cant find the new methods:

import com.metaware.httpunit.*;
import java.io.*;
import java.net.*;
import java.util.*;
public class classname {
...
...
...
 private static StringBuffer downloadURL(String link) throws IOException
    {
        StringBuffer output = new StringBuffer();
        WebConversation conversation = new WebConversation();
        String server = "http://www.dilbert.com/";

        WebRequest request = new GetMethodWebRequest( server );
        WebResponse page = conversation.getResponse( request );
    }
...
...
...
}
0
 
LVL 30

Expert Comment

by:mayankeagle
ID: 18825792
Did you add the JAR file to the classpath?

http://www.mindprod.com/jgloss/classpath.html
0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 5

Author Comment

by:Titanium_Sniper
ID: 18834252
ok, I added ALL of them, and how it gived an IO exception error on the line where it downloads the page.
0
 
LVL 30

Expert Comment

by:mayankeagle
ID: 19047230
Can you post the updated code and the line where it occurs? Maybe you are using a wrong method (does it require a post?) Perhaps there is a connection problem or some data transfer problem
0
 
LVL 5

Author Comment

by:Titanium_Sniper
ID: 19058892
Sorry it took so long to respond, my motherboard dies a few days ago.

I cannot access that code as I had not backed up after I wrote it and I will not be able to access that RAID 0 array until I receive my replacement Motherboard. So much for fault tolerance when the motherboard goes kaput.

And yes I backup regularly, but my Domain controller died and made it kind of hard to do much of anything, including logging on, let alone having my profile automatically backed up.

I found that C# is way easier to program in than java or even VB, and I like it much better now.

Do you have any code that will download web pages that need no post, because all the data sent is included  in the URL. If it is not just a few lines of code, and something that uses external files, please explain where to get them, and how to include them in my project as I am new to C#. Also, I want it to load the page in at most a couple seconds, the way I had it working before would take 30 seconds to a minute per page.

If it is not too hard, how do you do posts, so I could do better stuff, like the code I would need to send the data in the following cases:
get the search results from Google, check my bank account balance from inside a program, or log into my school web page and try to add a full class to my schedule every 5 minutes.

The sbest way would be a website with a tutorial, or simple instructins copy pasted here.
0
 
LVL 1

Accepted Solution

by:
spice_stellina earned 250 total points
ID: 19110235
Sorry, I don't know C#.. here a piece of Java code (sun jdk 1.5). This works for getting text and xml (I used it for that reason): try to take a look..It seems to me that doesn't take a lot for reading content from urls.. but I haven't done a test with heavy pages..  Sorry for bad english.. hope it helps..
Greets, Anna


import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;

/**
 * Please put  :<br>
 * <code>java -Dsun.net.client.defaultReadTimeout=60000 -Dsun.net.client.defaultConnectTimeout=60000</code><br>
 * in the options of your application server in order to avoid waiting undefinitely/with "no end"...
 */
public class UrlUtil {
      /**
       * Check if the URL is online/reachable
       * @param urlValue String
       * @throws Exception
       */
      public void checkURL(String urlValue) throws Exception {
            if (urlValue == null || urlValue.equalsIgnoreCase("")) {
                  throw new Exception("Input parameters: not valid");
            }
            try {
                  URL theURL = new URL(urlValue);
                  try {
                        URLConnection con = theURL.openConnection();
                        con.getContent();
                        String temp = con.getHeaderField(0);
                        int fileNotFound = temp.indexOf("Not Found");
                        if (fileNotFound >= 0)
                              throw new Exception("urlValue: [" + urlValue + "] --- " + "URL not found, bad link");
                  } catch (IOException ioe) {
                        throw new Exception("urlValue: [" + urlValue + "] --- " + "IOException, bad link: " + ioe.toString());
                  } catch (Exception e) {
                        throw new Exception(e.toString());
                  }
            } catch (MalformedURLException mue) {
                  throw new Exception("urlValue: [" + urlValue + "] --- " + "MalformedURLException: " + mue.toString());
            } catch (Exception e) {
                  throw e;
            }
      }



      /**
       * Reading from an URL
       * @param urlValue String
       * @throws Exception
       */
      public String fetchURL(String urlValue) throws Exception {
            String ret = "";

            if (urlValue == null || urlValue.equalsIgnoreCase("")) {
                  throw new Exception("Input parameters: not valid.");
            }
            try {
                  checkURL(urlValue);
                  try {
                        URL theURL = new URL(urlValue);
                        URLConnection con = theURL.openConnection();

                        BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
                        String inputLine;

                        while ((inputLine = in.readLine()) != null) {
                              if (!ret.equalsIgnoreCase("")){
                                    ret += "\n";
                              }
                              ret += inputLine;
                        }
                        in.close();
                  } catch (MalformedURLException mue) {
                        throw new Exception("urlValue: [" + urlValue + "] --- " + "MalformedURLException: " + mue.toString());
                  } catch (IOException ioe) {
                        throw new Exception("urlValue: [" + urlValue + "] --- " + "IOException: " + ioe.toString());
                  }
            } catch (Exception e) {
                  throw e;
            }
            return ret;
      }

}



0
 
LVL 5

Author Comment

by:Titanium_Sniper
ID: 19120323
Thanks a bunch, I will test as soon as I fix my main computer.
0
 
LVL 5

Author Comment

by:Titanium_Sniper
ID: 19456068
Sorry, i completly forgot about this project of mine, and no that code is not even near the speed a web browser reads at, it has taken a few minutes so far, and is still not done.
0
 
LVL 5

Author Comment

by:Titanium_Sniper
ID: 19456163
Actually, it was just a small bug in your code, you used strings, so it used up too much memory, here is a version of your code that will work better:

/**
       * Reading from an URL
       * @param urlValue String
       * @throws Exception
       */
      public String fetchURL(String urlValue) throws Exception {
            StringBuffer ret = new StringBuffer();

            if (urlValue == null || urlValue.equalsIgnoreCase("")) {
                  throw new Exception("Input parameters: not valid.");
            }
            try {
                  checkURL(urlValue);
                  try {
                        URL theURL = new URL(urlValue);
                        URLConnection con = theURL.openConnection();

                        BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
                        String inputLine;

                        while ((inputLine = in.readLine()) != null) {
                              if (!(ret.length()==0)){
                                    ret.append("\n");
                              }
                              ret.append(inputLine);
                        }
                        in.close();
                  } catch (MalformedURLException mue) {
                        throw new Exception("urlValue: [" + urlValue + "] --- " + "MalformedURLException: " + mue.toString());
                  } catch (IOException ioe) {
                        throw new Exception("urlValue: [" + urlValue + "] --- " + "IOException: " + ioe.toString());
                  }
            } catch (Exception e) {
                  throw e;
            }
            return ret.toString();
      }
0
 
LVL 30

Assisted Solution

by:mayankeagle
mayankeagle earned 250 total points
ID: 19458187
Better optimizations:

* use StringBuilder instead of StringBuffer

* in the constructor, initialize it with the approximate number of characters expected (you could use the File.length () to calculate the number of characters in the file, e.g.).
0
 
LVL 5

Author Comment

by:Titanium_Sniper
ID: 19459089
thanks, can you tell me when you would use string builders over string buffers for future reference?
0
 
LVL 30

Expert Comment

by:mayankeagle
ID: 19484318
I would always use StringBuilders since they are unsynchronized and better in performance, unless I have multiple threads accessing it at the same time. They are available >= Java 5.0
0
 
LVL 5

Author Comment

by:Titanium_Sniper
ID: 19485363
thanks a bunch
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Suggested Solutions

For customizing the look of your lightweight component and making it look opaque like it was made of plastic.  This tip assumes your component to be of rectangular shape and completely opaque.   (CODE)
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
Viewers learn about the third conditional statement “else if” and use it in an example program. Then additional information about conditional statements is provided, covering the topic thoroughly. Viewers learn about the third conditional statement …
This theoretical tutorial explains exceptions, reasons for exceptions, different categories of exception and exception hierarchy.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now