Solved

Reading an HTML page

Posted on 2003-11-17
4
258 Views
Last Modified: 2010-04-01
I currently have a system which works through JSPs. I wish to integrate this system with another thirtd party system. Basically what happens is that I 'call' a url, e.g. www.test.com?AD=1&AB=2. This url would then return an HTML page containing a table. I want to read the contents of this table as it contains parameters I need to use for further processing. The table would contain a number of columns each cell of which would contain certain info, e.g. AB=5. I would want to know that AB is 5

Any ideas how I go about this?

Thanks
0
Comment
Question by:Ktoshni
  • 2
4 Comments
 
LVL 14

Accepted Solution

by:
kennethxu earned 250 total points
ID: 9763967
You'll need to URLConnection to get the html from other site. and then parse the html to extract what you need.

Sample of using URLConnection:

        URL url = new URL( "http://www.test.com?AD=1&AB=2" );
        URLConnection conn = url.openConnection();
        InputStream in = new BufferedInputStream( conn.getInputStream() );
        // read html content from the input stream.

You can manually parse the html content by search for particular string pattern. or you can also make use of javax.swing.text.html and javax.swing.text.html.parser package. I'm not an expert of those packages but there is a lot example available in google.
0
 
LVL 15

Expert Comment

by:dualsoul
ID: 9769092
I suggest you to use HttpUnit or HtmlUnit (you can find both on sorceforge.net)
package to parse it. This packages were designed for testing - but they are very good for working with HTML struture, they can load HTML page from specified URL and give you very clear and simple object model of this HTML page, so you can easily get values you want.
0
 

Author Comment

by:Ktoshni
ID: 9769502
Thanks for the help kennethxu. I tried using the code you gave me but I forgot to mention that I need to connect to a secure server through HTTPS. When I try to create the URL instance it is a failing with the error: 'Malformed URL Exception unknow protocol: https'

Is there a way I can set URL to accept https or must I use something else?

On another note, thanks for the info dualsoul but the system I am working on is a production system and I can only use programs accepted by my company.
0
 

Author Comment

by:Ktoshni
ID: 9770021
Hi I sorted the https problem by using HttpsURLConnection. Forgive my ignorance but how do I go about reading the HTML content from the input stream?
0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Exception creating bean of class 5 186
Apache server configuration 7 85
dynamic reloading of jsp in jetty 2 120
struts spring hibernate example 12 122
This article outlines the process to identify and resolve account lockout in an Active Directory environment.
February 24, 2017 — On February 23, Travis Ormandy, a vulnerability researcher at Google, reported on Twitter (https://twitter.com/taviso/status/834900838837411840) that massive stores of data have been leaked by CloudFlare, a company that provide…
This Micro Tutorial demonstrates using Microsoft Excel pivot tables, how to reverse engineer competitors' marketing strategies through backlinks.

803 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question