Solved

HTMLDocument problem. How can I get HTML body ?

Posted on 2002-05-10
10
2,549 Views
Last Modified: 2008-03-10
I have HTML Document. How can I get as a String the stuff between <BODY> and </BODY>.
For example, if I have <BODY>Bla Bla</BODY>, then I need "Bla Bla".
Thanks in Advance!
Best Regards,
Valeri
0
Comment
Question by:Valeri
  • 5
  • 2
  • 2
  • +1
10 Comments
 
LVL 35

Expert Comment

by:girionis
ID: 7001121
 First of all load the document into a variable. If "body" is a variable that holds the "<HTML><BODY>blah blah</BODY></HTML>" string then the following will do:

System.out.println(body.substring((body.toLowerCase().indexOf("<body>") + 6), body.toLowerCase().lastIndexOf("</body>")));

  Hope it helps.
0
 
LVL 9

Expert Comment

by:Ovi
ID: 7003099
The HTMLDocument (and all implementations of Document interface) store the logic of the html as a tree like structure. All you have to do is to navigate thro that tree until you find the body element. I will post the methods for that soon.
0
 
LVL 16

Author Comment

by:Valeri
ID: 7003130
Hi Ovi,
You are right, but I was unable to navigate through that tree...
I'm waiting for your post! :-)
0
 
LVL 9

Accepted Solution

by:
Ovi earned 100 total points
ID: 7003136
This test class expects you to put a "x.html" file in the same directory as the compiled code. In rest is working perfectly.

import java.awt.*;
import java.io.*;
import java.net.*;
import java.util.*;
import javax.swing.*;
import javax.swing.text.*;
import javax.swing.text.html.*;

public class HTMLDocUtils {
 
  public static final Element getBodyElement(HTMLDocument doc) {
    return(findElement(doc.getRootElements()[0], HTML.Tag.BODY));
  }
 
  public static final Element findElement(Element root, HTML.Tag kind) {
    if(root == null) return(null);
    if(matchElementType(root,  kind)) {
      return(root);
    }
    int count = root.getElementCount();
    if(count > 0) {
      for(int i = 0; i<count; i++) {
        Element child = root.getElement(i);
        Element e = findElement(child, kind);
        if(e != null)
          return(e);
      }
    }
    return(null);
  }
 
  public static final boolean matchElementType(Element e, HTML.Tag type) {
    return(e.getAttributes().getAttribute(StyleConstants.NameAttribute) == type);
  }
 
  public static void main(String[] args) {
    HTMLEditorKit kit;
    HTMLDocument doc;
    kit = new HTMLEditorKit();
    doc = (HTMLDocument)kit.createDefaultDocument();
    try {
      URL file = (new HTMLDocUtils()).getClass().getResource("x.html");
      InputStream is = file.openStream();
      kit.read(is, doc, 0);
    } catch(Exception e) { e.printStackTrace(); }
    System.out.println("Document content : ");
    doc.dump(System.out);
    Element body = HTMLDocUtils.getBodyElement(doc);
    if(body != null) {
      System.out.println("Body element detected ************************************************** :");
      System.out.println("Starts at : " + body.getStartOffset());
      System.out.println("Ends at : " + body.getEndOffset());
    } else
      System.out.println("Body element not defined! ************************************************");
  }
}
0
 
LVL 16

Author Comment

by:Valeri
ID: 7003234
Hi Ovi :-)
I will test this class and probably I'll give you the points!!! But now I want to leave the question opened.
Thanks a lot!
Valeri
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 9

Expert Comment

by:Ovi
ID: 7020766
Did you test'it ?
0
 

Expert Comment

by:samuelvd
ID: 8145579
I did test your code, the output is:

Document content :
Body element detected ************************************************** :
Starts at : 1
Ends at : 85

I wrote a bean shell script to sumarize and to use it for my selft

import javax.swing.text.html.*;
import javax.swing.text.Element;

HTMLEditorKit kit = new HTMLEditorKit();
HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();

fr = new FileReader("Test.html");

/*
 * Inserta el contenido de fr en el documento doc, iniciando en la posición 0.
 */
kit.read(fr, doc, 0);

// doc.dump(System.out); // Vaciar el contenido del documento en la salida estandar.

// Obten el elemento Raiz
element = doc.getDefaultRootElement();

void muestraElementos(Element root) {
     print(root.getName());
     int count = element.getElementCount();
     for (int i=0; i<count; i++) {
          Element child = root.getElement(i);
          if (child != null)
               muestraElementos(child);
     }
}


/*****************************************************/

The output of this script is
html
head
p-implied
content
body
p-implied
content
content
table
tr
td
p-implied
content
content
td
p-implied
content
content
tr
td
p-implied
content
content
td
p-implied
content
content


I'm a little confused with the javax.swing.text.Element "p-implied" what does this Elements mean?. Other than this elements the output seems fine
0
 
LVL 9

Expert Comment

by:Ovi
ID: 8151542
p-implied behaves like a normal <p> (paragraph) element, but is generated under some conditions, internally, by the HtmlDocument. If you'll save the html again or simply read'it using editorPane.getText(), you'll see that the p-implied elements will be omitted.
0
 

Expert Comment

by:samuelvd
ID: 8156195
Ovi, One more question, I have programed XML DOM documents using Xerces; What would be the issues involved to implement an XHTMLDocument class using the w3c DOM API????

Would this require a lot of efford?

Regards!
0
 
LVL 9

Expert Comment

by:Ovi
ID: 8157222
Yes and no, depending on what you really want to realize and if you are open to a considerable effort. The text package is the most bigger one and the most complicated too. As a starting point I suggest you to read the articles from sun regarding the text package, especially the one called "customizing a text editor" or something similar, in which Tim Yates (the guru of the text package there) implements a Java code editor.

http://java.sun.com/products/jfc/tsc/articles/text/editor_kit/index.html
http://java.sun.com/products/jfc/tsc/articles/

I've implemented myself a WYSIWYG html editor but there was hard work to be done, and the result is not so competitive.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

For beginner Java programmers or at least those new to the Eclipse IDE, the following tutorial will show some (four) ways in which you can import your Java projects to your Eclipse workbench. Introduction While learning Java can be done with…
By the end of 1980s, object oriented programming using languages like C++, Simula69 and ObjectPascal gained momentum. It looked like programmers finally found the perfect language. C++ successfully combined the object oriented principles of Simula w…
This theoretical tutorial explains exceptions, reasons for exceptions, different categories of exception and exception hierarchy.
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.

929 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now