Solved

HTMLDocument problem. How can I get HTML body ?

Posted on 2002-05-10
10
2,644 Views
Last Modified: 2008-03-10
I have HTML Document. How can I get as a String the stuff between <BODY> and </BODY>.
For example, if I have <BODY>Bla Bla</BODY>, then I need "Bla Bla".
Thanks in Advance!
Best Regards,
Valeri
0
Comment
Question by:Valeri
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 2
  • 2
  • +1
10 Comments
 
LVL 35

Expert Comment

by:girionis
ID: 7001121
 First of all load the document into a variable. If "body" is a variable that holds the "<HTML><BODY>blah blah</BODY></HTML>" string then the following will do:

System.out.println(body.substring((body.toLowerCase().indexOf("<body>") + 6), body.toLowerCase().lastIndexOf("</body>")));

  Hope it helps.
0
 
LVL 9

Expert Comment

by:Ovi
ID: 7003099
The HTMLDocument (and all implementations of Document interface) store the logic of the html as a tree like structure. All you have to do is to navigate thro that tree until you find the body element. I will post the methods for that soon.
0
 
LVL 16

Author Comment

by:Valeri
ID: 7003130
Hi Ovi,
You are right, but I was unable to navigate through that tree...
I'm waiting for your post! :-)
0
The Ultimate Checklist to Optimize Your Website

Websites are getting bigger and complicated by the day. Video, images, custom fonts are all great for showcasing your product/service. But the price to pay in terms of reduced page load times and ultimately, decreased sales, can lead to some difficult decisions about what to cut.

 
LVL 9

Accepted Solution

by:
Ovi earned 100 total points
ID: 7003136
This test class expects you to put a "x.html" file in the same directory as the compiled code. In rest is working perfectly.

import java.awt.*;
import java.io.*;
import java.net.*;
import java.util.*;
import javax.swing.*;
import javax.swing.text.*;
import javax.swing.text.html.*;

public class HTMLDocUtils {
 
  public static final Element getBodyElement(HTMLDocument doc) {
    return(findElement(doc.getRootElements()[0], HTML.Tag.BODY));
  }
 
  public static final Element findElement(Element root, HTML.Tag kind) {
    if(root == null) return(null);
    if(matchElementType(root,  kind)) {
      return(root);
    }
    int count = root.getElementCount();
    if(count > 0) {
      for(int i = 0; i<count; i++) {
        Element child = root.getElement(i);
        Element e = findElement(child, kind);
        if(e != null)
          return(e);
      }
    }
    return(null);
  }
 
  public static final boolean matchElementType(Element e, HTML.Tag type) {
    return(e.getAttributes().getAttribute(StyleConstants.NameAttribute) == type);
  }
 
  public static void main(String[] args) {
    HTMLEditorKit kit;
    HTMLDocument doc;
    kit = new HTMLEditorKit();
    doc = (HTMLDocument)kit.createDefaultDocument();
    try {
      URL file = (new HTMLDocUtils()).getClass().getResource("x.html");
      InputStream is = file.openStream();
      kit.read(is, doc, 0);
    } catch(Exception e) { e.printStackTrace(); }
    System.out.println("Document content : ");
    doc.dump(System.out);
    Element body = HTMLDocUtils.getBodyElement(doc);
    if(body != null) {
      System.out.println("Body element detected ************************************************** :");
      System.out.println("Starts at : " + body.getStartOffset());
      System.out.println("Ends at : " + body.getEndOffset());
    } else
      System.out.println("Body element not defined! ************************************************");
  }
}
0
 
LVL 16

Author Comment

by:Valeri
ID: 7003234
Hi Ovi :-)
I will test this class and probably I'll give you the points!!! But now I want to leave the question opened.
Thanks a lot!
Valeri
0
 
LVL 9

Expert Comment

by:Ovi
ID: 7020766
Did you test'it ?
0
 

Expert Comment

by:samuelvd
ID: 8145579
I did test your code, the output is:

Document content :
Body element detected ************************************************** :
Starts at : 1
Ends at : 85

I wrote a bean shell script to sumarize and to use it for my selft

import javax.swing.text.html.*;
import javax.swing.text.Element;

HTMLEditorKit kit = new HTMLEditorKit();
HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();

fr = new FileReader("Test.html");

/*
 * Inserta el contenido de fr en el documento doc, iniciando en la posición 0.
 */
kit.read(fr, doc, 0);

// doc.dump(System.out); // Vaciar el contenido del documento en la salida estandar.

// Obten el elemento Raiz
element = doc.getDefaultRootElement();

void muestraElementos(Element root) {
     print(root.getName());
     int count = element.getElementCount();
     for (int i=0; i<count; i++) {
          Element child = root.getElement(i);
          if (child != null)
               muestraElementos(child);
     }
}


/*****************************************************/

The output of this script is
html
head
p-implied
content
body
p-implied
content
content
table
tr
td
p-implied
content
content
td
p-implied
content
content
tr
td
p-implied
content
content
td
p-implied
content
content


I'm a little confused with the javax.swing.text.Element "p-implied" what does this Elements mean?. Other than this elements the output seems fine
0
 
LVL 9

Expert Comment

by:Ovi
ID: 8151542
p-implied behaves like a normal <p> (paragraph) element, but is generated under some conditions, internally, by the HtmlDocument. If you'll save the html again or simply read'it using editorPane.getText(), you'll see that the p-implied elements will be omitted.
0
 

Expert Comment

by:samuelvd
ID: 8156195
Ovi, One more question, I have programed XML DOM documents using Xerces; What would be the issues involved to implement an XHTMLDocument class using the w3c DOM API????

Would this require a lot of efford?

Regards!
0
 
LVL 9

Expert Comment

by:Ovi
ID: 8157222
Yes and no, depending on what you really want to realize and if you are open to a considerable effort. The text package is the most bigger one and the most complicated too. As a starting point I suggest you to read the articles from sun regarding the text package, especially the one called "customizing a text editor" or something similar, in which Tim Yates (the guru of the text package there) implements a Java code editor.

http://java.sun.com/products/jfc/tsc/articles/text/editor_kit/index.html
http://java.sun.com/products/jfc/tsc/articles/

I've implemented myself a WYSIWYG html editor but there was hard work to be done, and the result is not so competitive.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Java contains several comparison operators (e.g., <, <=, >, >=, ==, !=) that allow you to compare primitive values. However, these operators cannot be used to compare the contents of objects. Interface Comparable is used to allow objects of a cl…
This was posted to the Netbeans forum a Feb, 2010 and I also sent it to Verisign. Who didn't help much in my struggles to get my application signed. ------------------------- Start The idea here is to target your cell phones with the correct…
Viewers will learn about arithmetic and Boolean expressions in Java and the logical operators used to create Boolean expressions. We will cover the symbols used for arithmetic expressions and define each logical operator and how to use them in Boole…
Viewers will learn about basic arrays, how to declare them, and how to use them. Introduction and definition: Declare an array and cover the syntax of declaring them: Initialize every index in the created array: Example/Features of a basic arr…

696 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question