We help IT Professionals succeed at work.

Need to get web content from a JAVA servelet

lcor
lcor asked
on
Medium Priority
612 Views
Last Modified: 2013-11-24
I have a servlet JAVA application running under Tomcat 5.x.  This web application essentially has to act like a browser given GET requests from a separate JAVA client running.

1.  Client makes a series GET request to server
2.  Server goes out to internet to get content and converts content to a byte array.
3.  Server returns content to client for display

I know this sounds bizarre but it's legacy code I have to get working with no choice.  I need to come up with the best and most efficient JAVA solution for step # 2.

The legacy code opens up a raw Socket and gets the content.  But, there are problems with this.  It's very slow and sometimes only returns only a small portion of the content.

I was thinking replacing Socket with HttpURLConnection class.  First, does anyone have any ideas, in general, why the Socket class could have issues?  Second, does anyone have working, robust code that does item #2?  

Comment
Watch Question

CERTIFIED EXPERT
Top Expert 2016
Commented:
>>First, does anyone have any ideas, in general, why the Socket class could have issues?

Using a plain socket is so far apart from using a browser that we have a chalk/cheese situation.
The socket will return (if you're lucky), one replay from a web server. When a browser makes a request it will make numerous requests for the many bits of content contained in a page and then integrate them all.
In short, it's a complex piece of software - even at its simplest.

The nearest you'll get to emulating a browser it to use a specialised API such as Jakarta HTTPClient, which i'd recommend you get if you need to do this

http://hc.apache.org/httpclient-3.x/

Not the solution you were looking for? Getting a personalized solution is easy.

Ask the Experts
Mick BarryJava Developer
CERTIFIED EXPERT
Top Expert 2010

Commented:
Make sure you're using some buffereing to pull the pages. Can you post your code and I'll see if I can see why its so slow.

Author

Commented:
objects,  
sorry for the delay but the Prez Day holiday got into the way...I'll provide code snippets tomorrow.

Mick BarryJava Developer
CERTIFIED EXPERT
Top Expert 2010
Commented:
theres a simple example and discussion for reading a GET response here

http://java.sun.com/docs/books/tutorial/networking/urls/readingWriting.html

If you just want the byte stream returned then use a BufferedInputStream instead of a BufferedReader
A ByteArrayOutputStream can be used to write the response to a byte array

Author

Commented:
Here's the raw socket attempt.  Tends to pause when trying to acquire web content.

Author

Commented:
Socket sock;
ByteArrayOutputStream bc = new ByteArrayOutputStream();
            
try {
    sock = new Socket(rHost.toString(), Integer.parseInt(rPort));                              
    outs = sock.getOutputStream();
    outs.write( html );
    outs.write('\r');
    outs.write('\n');
    outs.flush();
    sock.shutdownOutput();            

    byte[] b = new byte[1024];
    InputStream ins = sock.getInputStream();
    int i = ins.read(b);

    while(i != -1) {
        bc.write(b, 0, i);
        i = ins.read(b);
    }
                  
    content = bc.toByteArray();            

    bc.close();
    ins.close();
    outs.close();

    } catch (MalformedURLException e) {
        e.printStackTrace();
    } catch (IOException e) {
         e.printStackTrace();
}


Author

Commented:
Here's using HttpURLConnection.  Tends to return messed up content but it's faster than the Socket way.

Socket sock;
ByteArrayOutputStream bc= new ByteArrayOutputStream();

try {
    URL url = new URL(reqURL);
    HttpURLConnection conn = (HttpURLConnection)url.openConnection();
    conn.setRequestMethod("GET");
    conn.connect();      
    InputStream ins = conn.getInputStream();

    byte[] b = new byte[1024];
    int i = ins.read(b);

    while (i != -1) {
        bc.write(b, 0, i);
        i = ins.read(b);
    }

    ins.close();
    conn.disconnect();

    content = bc.toByteArray();
    bc.close();

    } catch (MalformedURLException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

Mick BarryJava Developer
CERTIFIED EXPERT
Top Expert 2010

Commented:
you can simplify and buffer the input using:

    URL url = new URL(reqURL);
    InputStream in = new BufferedInputStream(url.openStream());
    byte[] b = new byte[1024];
    int n = 0;
    while (-1!-(n=in.read(b))) {
      bc.write(b, 0, n);
    }

Author

Commented:
I'm steering towards the Jakarta solution since my research/prototyping shows that java.net has issues.  But, awarding points for the java.net solution that was a good example to show how it works.
CERTIFIED EXPERT
Top Expert 2016

Commented:
:-)
Mick BarryJava Developer
CERTIFIED EXPERT
Top Expert 2010

Commented:
> shows that java.net has issues.

We use it in hundreds of production applications :)

Access more of Experts Exchange with a free account
Thanks for using Experts Exchange.

Create a free account to continue.

Limited access with a free account allows you to:

  • View three pieces of content (articles, solutions, posts, and videos)
  • Ask the experts questions (counted toward content limit)
  • Customize your dashboard and profile

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.