Solved

Reading an HTML page

Posted on 2003-11-17
4
256 Views
Last Modified: 2010-04-01
I currently have a system which works through JSPs. I wish to integrate this system with another thirtd party system. Basically what happens is that I 'call' a url, e.g. www.test.com?AD=1&AB=2. This url would then return an HTML page containing a table. I want to read the contents of this table as it contains parameters I need to use for further processing. The table would contain a number of columns each cell of which would contain certain info, e.g. AB=5. I would want to know that AB is 5

Any ideas how I go about this?

Thanks
0
Comment
Question by:Ktoshni
  • 2
4 Comments
 
LVL 14

Accepted Solution

by:
kennethxu earned 250 total points
Comment Utility
You'll need to URLConnection to get the html from other site. and then parse the html to extract what you need.

Sample of using URLConnection:

        URL url = new URL( "http://www.test.com?AD=1&AB=2" );
        URLConnection conn = url.openConnection();
        InputStream in = new BufferedInputStream( conn.getInputStream() );
        // read html content from the input stream.

You can manually parse the html content by search for particular string pattern. or you can also make use of javax.swing.text.html and javax.swing.text.html.parser package. I'm not an expert of those packages but there is a lot example available in google.
0
 
LVL 15

Expert Comment

by:dualsoul
Comment Utility
I suggest you to use HttpUnit or HtmlUnit (you can find both on sorceforge.net)
package to parse it. This packages were designed for testing - but they are very good for working with HTML struture, they can load HTML page from specified URL and give you very clear and simple object model of this HTML page, so you can easily get values you want.
0
 

Author Comment

by:Ktoshni
Comment Utility
Thanks for the help kennethxu. I tried using the code you gave me but I forgot to mention that I need to connect to a secure server through HTTPS. When I try to create the URL instance it is a failing with the error: 'Malformed URL Exception unknow protocol: https'

Is there a way I can set URL to accept https or must I use something else?

On another note, thanks for the info dualsoul but the system I am working on is a production system and I can only use programs accepted by my company.
0
 

Author Comment

by:Ktoshni
Comment Utility
Hi I sorted the https problem by using HttpsURLConnection. Forgive my ignorance but how do I go about reading the HTML content from the input stream?
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Mockito Method call failing. 3 169
servlet concurrency 13 84
nested if has else if 13 87
best way to search/remove a file from an EAR file 3 95
In this article, you will read about the trends across the human resources departments for the upcoming year. Some of them include improving employee experience, adopting new technologies, using HR software to its full extent, and integrating artifi…
A safe way to clean winsxs folder from your windows server 2008 R2 editions
It is a freely distributed piece of software for such tasks as photo retouching, image composition and image authoring. It works on many operating systems, in many languages.
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now