We help IT Professionals succeed at work.

How to Parse a HTML File ?

PremkumarAC
PremkumarAC asked
on
Medium Priority
7,145 Views
Last Modified: 2008-02-26
I need to parse a HTML file and also in a particular place i have to insert values in the HTML document. Also i need to show the updated HTML file in a frame/panel.

<html>
  <head>
    <title>Test Program</title>
  </head>
  <body>
    <p>
      My name is ...
    </p>
  </body>
</html>

I want to insert "Prem" after "My name is ..."

Can anyone help me with an example as how to insert/parse a HTML file and edit the contents?

Regards,
Prem.
Comment
Watch Question

Author

Commented:
I need to parse a HTML file and also in a particular place i have to insert values (object) in the HTML document.
Also i need to show the updated HTML file in a frame/panel.

<html>
 <head>
   <title>Test Program</title>
 </head>
 <body>
   <p>
     My name is ...
   </p>
 </body>
</html>

I want to insert "Prem" after "My name is ...". The user will enter the name in a text field. Once he clicks a button, the html file in the JEditorPane / JTextPane has to be updated.

Can anyone help me with an example as how to insert/parse a HTML file and edit the contents?

Regards,
Prem.

The easiest way 2 go about it is to load the whole html into a string/stringbuffer (U could do this BY reading the html contents from the stream .. say file.)

Look out for the index of "My name is.." & insert ur name after the "My name is.."  substring.

Commented:
There's an excellent article in JavaPro (September 2001): "A Tokenizer for Your Collection". The FlexTokenizer described there "can be used to tokenize virtually any text source". The source is available at www.javapro.com

Author

Commented:
I think the efficient way is to use the HTMLEditorKit, HTMLDocument, DocumentParser, ... to parse an HTML file. I am unable to find a good example which uses the above classes to insert a String in an HTML file. I could find some example where user is allowed to edit the HTML texts in the JEditorPane. But my need is not like that. I have to load an HTML file from Hard disk and i will have some queries in it. I have the answers in my java objects and i have fill in the answers in the suitable place in the HTML document and show it to the user.

Any example file or link could be very much useful...

Thanks in advance.

Regards,
Prem.
Ovi

Commented:
Use the javax.swing.text package.

Author

Commented:
Hi Ovi,
  Do you have any examples ??

Regards,
Prem.
Ovi

Commented:
Sorry What I mean was javax.swing.html package. There are some swing components (JEditorPane) which accept html files, parse them, and display. I have examples of JEditorPane, but parsing html no. I can try one for you ...

For beginning, compile this :

import java.awt.*;
import javax.swing.*;

public class TestHTML {
  public static void main(String args[]) {
    JEditorPane pane = null;
    JFrame f = new JFrame();
    f.setSize(400, 400);
    f.setLocation(150, 100);
    f.getContentPane().setLayout(new BorderLayout());
    try {
      pane = new JEditorPane("http://www.google.com");
    } catch(Exception e) {}
    f.getContentPane().add(new JScrollPane(pane), BorderLayout.CENTER);
    f.setVisible(true);
  }
}

Author

Commented:
Hi Ovi,
  Do you have any examples ??

Regards,
Prem.
Ovi

Commented:
Sorry What I mean was javax.swing.html package. There are some swing components (JEditorPane) which accept html files, parse them, and display. I have examples of JEditorPane, but parsing html no. I can try one for you ...

For beginning, compile this :

import java.awt.*;
import javax.swing.*;

public class TestHTML {
  public static void main(String args[]) {
    JEditorPane pane = null;
    JFrame f = new JFrame();
    f.setSize(400, 400);
    f.setLocation(150, 100);
    f.getContentPane().setLayout(new BorderLayout());
    try {
      pane = new JEditorPane("http://www.google.com");
    } catch(Exception e) {}
    f.getContentPane().add(new JScrollPane(pane), BorderLayout.CENTER);
    f.setVisible(true);
  }
}
Commented:
import java.io.*;
import java.net.*;
import javax.swing.text.html.*;

public class HTMLParser {
  public static void main(String[] args) {
    HTMLEditorKit kit = new HTMLEditorKit();
    HTMLDocument doc = new HTMLDocument();
    File page = null;
    try {
      page = new File("D:\\Projects\\OviProjects\\Tests\\classes\\example.html");
      InputStream is = new FileInputStream(page);
      kit.read(is, doc, 0);
    } catch(Exception e) {}
    doc.dump(new PrintStream(System.out));
  }
}

Author

Commented:
Hi Ovi,
    Fine.
    Now I don't know how to insert/delete strings at a particular place. I am also able to parse the HTML and display the tags, etc...

Here is the code which i have tried.

import java.io.*;
import java.net.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
import javax.swing.text.html.HTML.*;

class ParseHTML {
  public static void main(String[] args) {
    EditorKit kit = new HTMLEditorKit();
    Document doc = kit.createDefaultDocument();

    // The Document class does not yet
    // handle charset's properly.
    doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
     String fn = null;      
    try {
      // Create a reader on the HTML content.
      fn = System.getProperty("user.dir") + System.getProperty("file.separator") + "test_program.htm";
       BufferedReader reader    = new BufferedReader(new FileReader(fn));
     
       // Parse the HTML.
      kit.read(reader, doc, 0);

      // Iterate through the elements
      // of the HTML document.
      ElementIterator it = new ElementIterator(doc);
      javax.swing.text.Element elem;
      while ((elem = it.next()) != null) {
           // System.out.print( "Attributes name = " + elem.getAttributes());
           if ( elem.getParentElement() != null)
           {
//                 System.out.println( elem.getStartOffset() + " " + elem.getEndOffset() + " Element = " + elem.getName() + " Parent " + elem.getParentElement().getName());
                 System.out.println( " Element = " + elem.getName() + " " + doc.getText( elem.getStartOffset(), elem.getEndOffset()-elem.getStartOffset()));
           }
           
      }
    } catch (Exception e) {
      e.printStackTrace();
    }
    System.exit(0);
  }
}


Rgds,
Prem.
Ovi

Commented:
The HTMLDocument has an insertHtml() method for insertion of html content. Also it inherits an insertString method from Document.

Commented:
No comment has been added lately, so it's time to clean up this TA.
I will leave a recommendation in the Cleanup topic area that this question is:

- Points for Ovi

Please leave any comments here within the next seven days.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

Venabili
EE Cleanup Volunteer
Hi PremkumarAC,

 Thank you very much your program helped me in gud time...

Explore More ContentExplore courses, solutions, and other research materials related to this topic.