Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 289
  • Last Modified:

Read from URL

How do I download a web page to a string buffer using the URL, and not take like a minute per 100 kb.

The following will work, but takes way too long.
private static String downloadURL(String url)
      {
            URL u;
            InputStream is = null;
            DataInputStream dis;
            String s;
            StringBuffer output = new StringBuffer(500000);
            try
                {
                    u = new URL(url);
                    is = u.openStream();
                    dis = new DataInputStream(new BufferedInputStream(is));
                    while ((s = dis.readLine()) != null)
                    {
                        output.append(s);
                    }
            }
            catch (MalformedURLException mue)
                {
                    System.out.println("Ouch - a MalformedURLException happened."+ url);
                    mue.printStackTrace();
                    System.exit(1);
                }
            catch (IOException ioe)
                {
                    System.out.println("Oops- an IOException happened."+url);
                    ioe.printStackTrace();
                    System.exit(1);
                }
            finally
                {
                    try
                    {
                        is.close();
                    }
                    catch (IOException ioe)
                    {}
                }
            return output.toString();
      }
0
Titanium_Sniper
Asked:
Titanium_Sniper
  • 12
  • 4
  • 4
  • +1
2 Solutions
 
Ajay-SinghCommented:
are you sure the web page is text/?

For binary, the above code may not work.
0
 
Titanium_SniperAuthor Commented:
its just a normal web page of product lists that I want to read, and extract the information from to calculate which is the best deal.
0
 
Ajay-SinghCommented:
> I want to read, and extract the information from to calculate which is the best deal.

Use httpunit for simplicity: http://httpunit.sourceforget.net
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Titanium_SniperAuthor Commented:
Where is the code to download a web page in that zip file, it is really big and I am lost.
0
 
Ajay-SinghCommented:
Its as simple as:

WebConversation conversation = new WebConversation();
String server = "http://www.dilbert.com/";

WebRequest request = new GetMethodWebRequest( server );
WebResponse page = conversation.getResponse( request );
0
 
Titanium_SniperAuthor Commented:
so how do I get my program to recognize things like WebConversation, and WebRequst, I am new to Java.
0
 
Titanium_SniperAuthor Commented:
wait, I found it, I was using the wrong jar file, now how do I use the methods in the jar file?, I am new.
0
 
Ajay-SinghCommented:
keep the jar file in the classpath
0
 
Titanium_SniperAuthor Commented:
I have the following and it says it cant find the new methods:

import com.metaware.httpunit.*;
import java.io.*;
import java.net.*;
import java.util.*;
public class classname {
...
...
...
 private static StringBuffer downloadURL(String link) throws IOException
    {
        StringBuffer output = new StringBuffer();
        WebConversation conversation = new WebConversation();
        String server = "http://www.dilbert.com/";

        WebRequest request = new GetMethodWebRequest( server );
        WebResponse page = conversation.getResponse( request );
    }
...
...
...
}
0
 
Mayank SAssociate Director - Product EngineeringCommented:
Did you add the JAR file to the classpath?

http://www.mindprod.com/jgloss/classpath.html
0
 
Titanium_SniperAuthor Commented:
ok, I added ALL of them, and how it gived an IO exception error on the line where it downloads the page.
0
 
Mayank SAssociate Director - Product EngineeringCommented:
Can you post the updated code and the line where it occurs? Maybe you are using a wrong method (does it require a post?) Perhaps there is a connection problem or some data transfer problem
0
 
Titanium_SniperAuthor Commented:
Sorry it took so long to respond, my motherboard dies a few days ago.

I cannot access that code as I had not backed up after I wrote it and I will not be able to access that RAID 0 array until I receive my replacement Motherboard. So much for fault tolerance when the motherboard goes kaput.

And yes I backup regularly, but my Domain controller died and made it kind of hard to do much of anything, including logging on, let alone having my profile automatically backed up.

I found that C# is way easier to program in than java or even VB, and I like it much better now.

Do you have any code that will download web pages that need no post, because all the data sent is included  in the URL. If it is not just a few lines of code, and something that uses external files, please explain where to get them, and how to include them in my project as I am new to C#. Also, I want it to load the page in at most a couple seconds, the way I had it working before would take 30 seconds to a minute per page.

If it is not too hard, how do you do posts, so I could do better stuff, like the code I would need to send the data in the following cases:
get the search results from Google, check my bank account balance from inside a program, or log into my school web page and try to add a full class to my schedule every 5 minutes.

The sbest way would be a website with a tutorial, or simple instructins copy pasted here.
0
 
spice_stellinaCommented:
Sorry, I don't know C#.. here a piece of Java code (sun jdk 1.5). This works for getting text and xml (I used it for that reason): try to take a look..It seems to me that doesn't take a lot for reading content from urls.. but I haven't done a test with heavy pages..  Sorry for bad english.. hope it helps..
Greets, Anna


import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;

/**
 * Please put  :<br>
 * <code>java -Dsun.net.client.defaultReadTimeout=60000 -Dsun.net.client.defaultConnectTimeout=60000</code><br>
 * in the options of your application server in order to avoid waiting undefinitely/with "no end"...
 */
public class UrlUtil {
      /**
       * Check if the URL is online/reachable
       * @param urlValue String
       * @throws Exception
       */
      public void checkURL(String urlValue) throws Exception {
            if (urlValue == null || urlValue.equalsIgnoreCase("")) {
                  throw new Exception("Input parameters: not valid");
            }
            try {
                  URL theURL = new URL(urlValue);
                  try {
                        URLConnection con = theURL.openConnection();
                        con.getContent();
                        String temp = con.getHeaderField(0);
                        int fileNotFound = temp.indexOf("Not Found");
                        if (fileNotFound >= 0)
                              throw new Exception("urlValue: [" + urlValue + "] --- " + "URL not found, bad link");
                  } catch (IOException ioe) {
                        throw new Exception("urlValue: [" + urlValue + "] --- " + "IOException, bad link: " + ioe.toString());
                  } catch (Exception e) {
                        throw new Exception(e.toString());
                  }
            } catch (MalformedURLException mue) {
                  throw new Exception("urlValue: [" + urlValue + "] --- " + "MalformedURLException: " + mue.toString());
            } catch (Exception e) {
                  throw e;
            }
      }



      /**
       * Reading from an URL
       * @param urlValue String
       * @throws Exception
       */
      public String fetchURL(String urlValue) throws Exception {
            String ret = "";

            if (urlValue == null || urlValue.equalsIgnoreCase("")) {
                  throw new Exception("Input parameters: not valid.");
            }
            try {
                  checkURL(urlValue);
                  try {
                        URL theURL = new URL(urlValue);
                        URLConnection con = theURL.openConnection();

                        BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
                        String inputLine;

                        while ((inputLine = in.readLine()) != null) {
                              if (!ret.equalsIgnoreCase("")){
                                    ret += "\n";
                              }
                              ret += inputLine;
                        }
                        in.close();
                  } catch (MalformedURLException mue) {
                        throw new Exception("urlValue: [" + urlValue + "] --- " + "MalformedURLException: " + mue.toString());
                  } catch (IOException ioe) {
                        throw new Exception("urlValue: [" + urlValue + "] --- " + "IOException: " + ioe.toString());
                  }
            } catch (Exception e) {
                  throw e;
            }
            return ret;
      }

}



0
 
Titanium_SniperAuthor Commented:
Thanks a bunch, I will test as soon as I fix my main computer.
0
 
Titanium_SniperAuthor Commented:
Sorry, i completly forgot about this project of mine, and no that code is not even near the speed a web browser reads at, it has taken a few minutes so far, and is still not done.
0
 
Titanium_SniperAuthor Commented:
Actually, it was just a small bug in your code, you used strings, so it used up too much memory, here is a version of your code that will work better:

/**
       * Reading from an URL
       * @param urlValue String
       * @throws Exception
       */
      public String fetchURL(String urlValue) throws Exception {
            StringBuffer ret = new StringBuffer();

            if (urlValue == null || urlValue.equalsIgnoreCase("")) {
                  throw new Exception("Input parameters: not valid.");
            }
            try {
                  checkURL(urlValue);
                  try {
                        URL theURL = new URL(urlValue);
                        URLConnection con = theURL.openConnection();

                        BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
                        String inputLine;

                        while ((inputLine = in.readLine()) != null) {
                              if (!(ret.length()==0)){
                                    ret.append("\n");
                              }
                              ret.append(inputLine);
                        }
                        in.close();
                  } catch (MalformedURLException mue) {
                        throw new Exception("urlValue: [" + urlValue + "] --- " + "MalformedURLException: " + mue.toString());
                  } catch (IOException ioe) {
                        throw new Exception("urlValue: [" + urlValue + "] --- " + "IOException: " + ioe.toString());
                  }
            } catch (Exception e) {
                  throw e;
            }
            return ret.toString();
      }
0
 
Mayank SAssociate Director - Product EngineeringCommented:
Better optimizations:

* use StringBuilder instead of StringBuffer

* in the constructor, initialize it with the approximate number of characters expected (you could use the File.length () to calculate the number of characters in the file, e.g.).
0
 
Titanium_SniperAuthor Commented:
thanks, can you tell me when you would use string builders over string buffers for future reference?
0
 
Mayank SAssociate Director - Product EngineeringCommented:
I would always use StringBuilders since they are unsynchronized and better in performance, unless I have multiple threads accessing it at the same time. They are available >= Java 5.0
0
 
Titanium_SniperAuthor Commented:
thanks a bunch
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 12
  • 4
  • 4
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now