Link to home
Start Free TrialLog in
Avatar of lcor
lcor

asked on

Need to get web content from a JAVA servelet

I have a servlet JAVA application running under Tomcat 5.x.  This web application essentially has to act like a browser given GET requests from a separate JAVA client running.

1.  Client makes a series GET request to server
2.  Server goes out to internet to get content and converts content to a byte array.
3.  Server returns content to client for display

I know this sounds bizarre but it's legacy code I have to get working with no choice.  I need to come up with the best and most efficient JAVA solution for step # 2.

The legacy code opens up a raw Socket and gets the content.  But, there are problems with this.  It's very slow and sometimes only returns only a small portion of the content.

I was thinking replacing Socket with HttpURLConnection class.  First, does anyone have any ideas, in general, why the Socket class could have issues?  Second, does anyone have working, robust code that does item #2?  

ASKER CERTIFIED SOLUTION
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Make sure you're using some buffereing to pull the pages. Can you post your code and I'll see if I can see why its so slow.

Avatar of lcor
lcor

ASKER

objects,  
sorry for the delay but the Prez Day holiday got into the way...I'll provide code snippets tomorrow.

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of lcor

ASKER

Here's the raw socket attempt.  Tends to pause when trying to acquire web content.
Avatar of lcor

ASKER

Socket sock;
ByteArrayOutputStream bc = new ByteArrayOutputStream();
            
try {
    sock = new Socket(rHost.toString(), Integer.parseInt(rPort));                              
    outs = sock.getOutputStream();
    outs.write( html );
    outs.write('\r');
    outs.write('\n');
    outs.flush();
    sock.shutdownOutput();            

    byte[] b = new byte[1024];
    InputStream ins = sock.getInputStream();
    int i = ins.read(b);

    while(i != -1) {
        bc.write(b, 0, i);
        i = ins.read(b);
    }
                  
    content = bc.toByteArray();            

    bc.close();
    ins.close();
    outs.close();

    } catch (MalformedURLException e) {
        e.printStackTrace();
    } catch (IOException e) {
         e.printStackTrace();
}


Avatar of lcor

ASKER

Here's using HttpURLConnection.  Tends to return messed up content but it's faster than the Socket way.

Socket sock;
ByteArrayOutputStream bc= new ByteArrayOutputStream();

try {
    URL url = new URL(reqURL);
    HttpURLConnection conn = (HttpURLConnection)url.openConnection();
    conn.setRequestMethod("GET");
    conn.connect();      
    InputStream ins = conn.getInputStream();

    byte[] b = new byte[1024];
    int i = ins.read(b);

    while (i != -1) {
        bc.write(b, 0, i);
        i = ins.read(b);
    }

    ins.close();
    conn.disconnect();

    content = bc.toByteArray();
    bc.close();

    } catch (MalformedURLException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

you can simplify and buffer the input using:

    URL url = new URL(reqURL);
    InputStream in = new BufferedInputStream(url.openStream());
    byte[] b = new byte[1024];
    int n = 0;
    while (-1!-(n=in.read(b))) {
      bc.write(b, 0, n);
    }
Avatar of lcor

ASKER

I'm steering towards the Jakarta solution since my research/prototyping shows that java.net has issues.  But, awarding points for the java.net solution that was a good example to show how it works.
:-)
> shows that java.net has issues.

We use it in hundreds of production applications :)