Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

HTMLDocument problem. How can I get HTML body ?

Posted on 2002-05-10
10
Medium Priority
?
2,732 Views
Last Modified: 2008-03-10
I have HTML Document. How can I get as a String the stuff between <BODY> and </BODY>.
For example, if I have <BODY>Bla Bla</BODY>, then I need "Bla Bla".
Thanks in Advance!
Best Regards,
Valeri
0
Comment
Question by:Valeri
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 2
  • 2
  • +1
10 Comments
 
LVL 35

Expert Comment

by:girionis
ID: 7001121
 First of all load the document into a variable. If "body" is a variable that holds the "<HTML><BODY>blah blah</BODY></HTML>" string then the following will do:

System.out.println(body.substring((body.toLowerCase().indexOf("<body>") + 6), body.toLowerCase().lastIndexOf("</body>")));

  Hope it helps.
0
 
LVL 9

Expert Comment

by:Ovi
ID: 7003099
The HTMLDocument (and all implementations of Document interface) store the logic of the html as a tree like structure. All you have to do is to navigate thro that tree until you find the body element. I will post the methods for that soon.
0
 
LVL 16

Author Comment

by:Valeri
ID: 7003130
Hi Ovi,
You are right, but I was unable to navigate through that tree...
I'm waiting for your post! :-)
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 9

Accepted Solution

by:
Ovi earned 400 total points
ID: 7003136
This test class expects you to put a "x.html" file in the same directory as the compiled code. In rest is working perfectly.

import java.awt.*;
import java.io.*;
import java.net.*;
import java.util.*;
import javax.swing.*;
import javax.swing.text.*;
import javax.swing.text.html.*;

public class HTMLDocUtils {
 
  public static final Element getBodyElement(HTMLDocument doc) {
    return(findElement(doc.getRootElements()[0], HTML.Tag.BODY));
  }
 
  public static final Element findElement(Element root, HTML.Tag kind) {
    if(root == null) return(null);
    if(matchElementType(root,  kind)) {
      return(root);
    }
    int count = root.getElementCount();
    if(count > 0) {
      for(int i = 0; i<count; i++) {
        Element child = root.getElement(i);
        Element e = findElement(child, kind);
        if(e != null)
          return(e);
      }
    }
    return(null);
  }
 
  public static final boolean matchElementType(Element e, HTML.Tag type) {
    return(e.getAttributes().getAttribute(StyleConstants.NameAttribute) == type);
  }
 
  public static void main(String[] args) {
    HTMLEditorKit kit;
    HTMLDocument doc;
    kit = new HTMLEditorKit();
    doc = (HTMLDocument)kit.createDefaultDocument();
    try {
      URL file = (new HTMLDocUtils()).getClass().getResource("x.html");
      InputStream is = file.openStream();
      kit.read(is, doc, 0);
    } catch(Exception e) { e.printStackTrace(); }
    System.out.println("Document content : ");
    doc.dump(System.out);
    Element body = HTMLDocUtils.getBodyElement(doc);
    if(body != null) {
      System.out.println("Body element detected ************************************************** :");
      System.out.println("Starts at : " + body.getStartOffset());
      System.out.println("Ends at : " + body.getEndOffset());
    } else
      System.out.println("Body element not defined! ************************************************");
  }
}
0
 
LVL 16

Author Comment

by:Valeri
ID: 7003234
Hi Ovi :-)
I will test this class and probably I'll give you the points!!! But now I want to leave the question opened.
Thanks a lot!
Valeri
0
 
LVL 9

Expert Comment

by:Ovi
ID: 7020766
Did you test'it ?
0
 

Expert Comment

by:samuelvd
ID: 8145579
I did test your code, the output is:

Document content :
Body element detected ************************************************** :
Starts at : 1
Ends at : 85

I wrote a bean shell script to sumarize and to use it for my selft

import javax.swing.text.html.*;
import javax.swing.text.Element;

HTMLEditorKit kit = new HTMLEditorKit();
HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();

fr = new FileReader("Test.html");

/*
 * Inserta el contenido de fr en el documento doc, iniciando en la posición 0.
 */
kit.read(fr, doc, 0);

// doc.dump(System.out); // Vaciar el contenido del documento en la salida estandar.

// Obten el elemento Raiz
element = doc.getDefaultRootElement();

void muestraElementos(Element root) {
     print(root.getName());
     int count = element.getElementCount();
     for (int i=0; i<count; i++) {
          Element child = root.getElement(i);
          if (child != null)
               muestraElementos(child);
     }
}


/*****************************************************/

The output of this script is
html
head
p-implied
content
body
p-implied
content
content
table
tr
td
p-implied
content
content
td
p-implied
content
content
tr
td
p-implied
content
content
td
p-implied
content
content


I'm a little confused with the javax.swing.text.Element "p-implied" what does this Elements mean?. Other than this elements the output seems fine
0
 
LVL 9

Expert Comment

by:Ovi
ID: 8151542
p-implied behaves like a normal <p> (paragraph) element, but is generated under some conditions, internally, by the HtmlDocument. If you'll save the html again or simply read'it using editorPane.getText(), you'll see that the p-implied elements will be omitted.
0
 

Expert Comment

by:samuelvd
ID: 8156195
Ovi, One more question, I have programed XML DOM documents using Xerces; What would be the issues involved to implement an XHTMLDocument class using the w3c DOM API????

Would this require a lot of efford?

Regards!
0
 
LVL 9

Expert Comment

by:Ovi
ID: 8157222
Yes and no, depending on what you really want to realize and if you are open to a considerable effort. The text package is the most bigger one and the most complicated too. As a starting point I suggest you to read the articles from sun regarding the text package, especially the one called "customizing a text editor" or something similar, in which Tim Yates (the guru of the text package there) implements a Java code editor.

http://java.sun.com/products/jfc/tsc/articles/text/editor_kit/index.html
http://java.sun.com/products/jfc/tsc/articles/

I've implemented myself a WYSIWYG html editor but there was hard work to be done, and the result is not so competitive.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

INTRODUCTION Working with files is a moderately common task in Java.  For most projects hard coding the file names, using parameters in configuration files, or using command-line arguments is sufficient.   However, when your application has vi…
Java Flight Recorder and Java Mission Control together create a complete tool chain to continuously collect low level and detailed runtime information enabling after-the-fact incident analysis. Java Flight Recorder is a profiling and event collectio…
Viewers learn how to read error messages and identify possible mistakes that could cause hours of frustration. Coding is as much about debugging your code as it is about writing it. Define Error Message: Line Numbers: Type of Error: Break Down…
This video teaches viewers about errors in exception handling.
Suggested Courses

598 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question