Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Reading an HTML page

Posted on 2003-11-17
4
Medium Priority
?
268 Views
Last Modified: 2010-04-01
I currently have a system which works through JSPs. I wish to integrate this system with another thirtd party system. Basically what happens is that I 'call' a url, e.g. www.test.com?AD=1&AB=2. This url would then return an HTML page containing a table. I want to read the contents of this table as it contains parameters I need to use for further processing. The table would contain a number of columns each cell of which would contain certain info, e.g. AB=5. I would want to know that AB is 5

Any ideas how I go about this?

Thanks
0
Comment
Question by:Ktoshni
  • 2
4 Comments
 
LVL 14

Accepted Solution

by:
kennethxu earned 750 total points
ID: 9763967
You'll need to URLConnection to get the html from other site. and then parse the html to extract what you need.

Sample of using URLConnection:

        URL url = new URL( "http://www.test.com?AD=1&AB=2" );
        URLConnection conn = url.openConnection();
        InputStream in = new BufferedInputStream( conn.getInputStream() );
        // read html content from the input stream.

You can manually parse the html content by search for particular string pattern. or you can also make use of javax.swing.text.html and javax.swing.text.html.parser package. I'm not an expert of those packages but there is a lot example available in google.
0
 
LVL 15

Expert Comment

by:dualsoul
ID: 9769092
I suggest you to use HttpUnit or HtmlUnit (you can find both on sorceforge.net)
package to parse it. This packages were designed for testing - but they are very good for working with HTML struture, they can load HTML page from specified URL and give you very clear and simple object model of this HTML page, so you can easily get values you want.
0
 

Author Comment

by:Ktoshni
ID: 9769502
Thanks for the help kennethxu. I tried using the code you gave me but I forgot to mention that I need to connect to a secure server through HTTPS. When I try to create the URL instance it is a failing with the error: 'Malformed URL Exception unknow protocol: https'

Is there a way I can set URL to accept https or must I use something else?

On another note, thanks for the info dualsoul but the system I am working on is a production system and I can only use programs accepted by my company.
0
 

Author Comment

by:Ktoshni
ID: 9770021
Hi I sorted the https problem by using HttpsURLConnection. Forgive my ignorance but how do I go about reading the HTML content from the input stream?
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Hello there! As a developer I have modified and refactored the unit tests which was written by fellow developers in the past. On the course, I have gone through various misconceptions and technical challenges when it comes to implementation. I would…
Aerodynamic noise is the cause of the majority of the noise produced by helicopters. The inordinate amount of noise helicopters produce is a major problem in the both a military and civilian setting. To remedy this problem the use of an aerogel coat…
This Micro Tutorial will teach you how to add a cinematic look to any film or video out there. There are very few simple steps that you will follow to do so. This will be demonstrated using Adobe Premiere Pro CS6.
Despite its rising prevalence in the business world, "the cloud" is still misunderstood. Some companies still believe common misconceptions about lack of security in cloud solutions and many misuses of cloud storage options still occur every day. …
Suggested Courses

885 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question