Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

HTMLDocument problem. How can I get HTML body ?

Posted on 2002-05-10
10
2,589 Views
Last Modified: 2008-03-10
I have HTML Document. How can I get as a String the stuff between <BODY> and </BODY>.
For example, if I have <BODY>Bla Bla</BODY>, then I need "Bla Bla".
Thanks in Advance!
Best Regards,
Valeri
0
Comment
Question by:Valeri
  • 5
  • 2
  • 2
  • +1
10 Comments
 
LVL 35

Expert Comment

by:girionis
ID: 7001121
 First of all load the document into a variable. If "body" is a variable that holds the "<HTML><BODY>blah blah</BODY></HTML>" string then the following will do:

System.out.println(body.substring((body.toLowerCase().indexOf("<body>") + 6), body.toLowerCase().lastIndexOf("</body>")));

  Hope it helps.
0
 
LVL 9

Expert Comment

by:Ovi
ID: 7003099
The HTMLDocument (and all implementations of Document interface) store the logic of the html as a tree like structure. All you have to do is to navigate thro that tree until you find the body element. I will post the methods for that soon.
0
 
LVL 16

Author Comment

by:Valeri
ID: 7003130
Hi Ovi,
You are right, but I was unable to navigate through that tree...
I'm waiting for your post! :-)
0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 9

Accepted Solution

by:
Ovi earned 100 total points
ID: 7003136
This test class expects you to put a "x.html" file in the same directory as the compiled code. In rest is working perfectly.

import java.awt.*;
import java.io.*;
import java.net.*;
import java.util.*;
import javax.swing.*;
import javax.swing.text.*;
import javax.swing.text.html.*;

public class HTMLDocUtils {
 
  public static final Element getBodyElement(HTMLDocument doc) {
    return(findElement(doc.getRootElements()[0], HTML.Tag.BODY));
  }
 
  public static final Element findElement(Element root, HTML.Tag kind) {
    if(root == null) return(null);
    if(matchElementType(root,  kind)) {
      return(root);
    }
    int count = root.getElementCount();
    if(count > 0) {
      for(int i = 0; i<count; i++) {
        Element child = root.getElement(i);
        Element e = findElement(child, kind);
        if(e != null)
          return(e);
      }
    }
    return(null);
  }
 
  public static final boolean matchElementType(Element e, HTML.Tag type) {
    return(e.getAttributes().getAttribute(StyleConstants.NameAttribute) == type);
  }
 
  public static void main(String[] args) {
    HTMLEditorKit kit;
    HTMLDocument doc;
    kit = new HTMLEditorKit();
    doc = (HTMLDocument)kit.createDefaultDocument();
    try {
      URL file = (new HTMLDocUtils()).getClass().getResource("x.html");
      InputStream is = file.openStream();
      kit.read(is, doc, 0);
    } catch(Exception e) { e.printStackTrace(); }
    System.out.println("Document content : ");
    doc.dump(System.out);
    Element body = HTMLDocUtils.getBodyElement(doc);
    if(body != null) {
      System.out.println("Body element detected ************************************************** :");
      System.out.println("Starts at : " + body.getStartOffset());
      System.out.println("Ends at : " + body.getEndOffset());
    } else
      System.out.println("Body element not defined! ************************************************");
  }
}
0
 
LVL 16

Author Comment

by:Valeri
ID: 7003234
Hi Ovi :-)
I will test this class and probably I'll give you the points!!! But now I want to leave the question opened.
Thanks a lot!
Valeri
0
 
LVL 9

Expert Comment

by:Ovi
ID: 7020766
Did you test'it ?
0
 

Expert Comment

by:samuelvd
ID: 8145579
I did test your code, the output is:

Document content :
Body element detected ************************************************** :
Starts at : 1
Ends at : 85

I wrote a bean shell script to sumarize and to use it for my selft

import javax.swing.text.html.*;
import javax.swing.text.Element;

HTMLEditorKit kit = new HTMLEditorKit();
HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();

fr = new FileReader("Test.html");

/*
 * Inserta el contenido de fr en el documento doc, iniciando en la posición 0.
 */
kit.read(fr, doc, 0);

// doc.dump(System.out); // Vaciar el contenido del documento en la salida estandar.

// Obten el elemento Raiz
element = doc.getDefaultRootElement();

void muestraElementos(Element root) {
     print(root.getName());
     int count = element.getElementCount();
     for (int i=0; i<count; i++) {
          Element child = root.getElement(i);
          if (child != null)
               muestraElementos(child);
     }
}


/*****************************************************/

The output of this script is
html
head
p-implied
content
body
p-implied
content
content
table
tr
td
p-implied
content
content
td
p-implied
content
content
tr
td
p-implied
content
content
td
p-implied
content
content


I'm a little confused with the javax.swing.text.Element "p-implied" what does this Elements mean?. Other than this elements the output seems fine
0
 
LVL 9

Expert Comment

by:Ovi
ID: 8151542
p-implied behaves like a normal <p> (paragraph) element, but is generated under some conditions, internally, by the HtmlDocument. If you'll save the html again or simply read'it using editorPane.getText(), you'll see that the p-implied elements will be omitted.
0
 

Expert Comment

by:samuelvd
ID: 8156195
Ovi, One more question, I have programed XML DOM documents using Xerces; What would be the issues involved to implement an XHTMLDocument class using the w3c DOM API????

Would this require a lot of efford?

Regards!
0
 
LVL 9

Expert Comment

by:Ovi
ID: 8157222
Yes and no, depending on what you really want to realize and if you are open to a considerable effort. The text package is the most bigger one and the most complicated too. As a starting point I suggest you to read the articles from sun regarding the text package, especially the one called "customizing a text editor" or something similar, in which Tim Yates (the guru of the text package there) implements a Java code editor.

http://java.sun.com/products/jfc/tsc/articles/text/editor_kit/index.html
http://java.sun.com/products/jfc/tsc/articles/

I've implemented myself a WYSIWYG html editor but there was hard work to be done, and the result is not so competitive.
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
replace a word with other 1 45
java jdbc batch example 8 39
Which non-HTML GUI front end to use with Java? 3 24
hashmap order 17 36
An old method to applying the Singleton pattern in your Java code is to check if a static instance, defined in the same class that needs to be instantiated once and only once, is null and then create a new instance; otherwise, the pre-existing insta…
Java Flight Recorder and Java Mission Control together create a complete tool chain to continuously collect low level and detailed runtime information enabling after-the-fact incident analysis. Java Flight Recorder is a profiling and event collectio…
This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.

789 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question