Solved

HTMLEditorKit with a String of html text

Posted on 2004-10-23
279 Views
Last Modified: 2012-08-14
I am a little confused on how to use  this class: javax.swing.text.html.HTMLEditorKit

If I have the html text stored in a string, like this:

String html = "<html><script blah blah>function blah blah{}</script><body>this</body></html>";

and I want to get get the parsed text, String parsedText = "this";

I've seen the example on: http://www.javaalmanac.com/egs/javax.swing.text.html/GetText.html, but I'm not sure how to apply it to my application which seems much simpler.
0
Question by:polkadot
    11 Comments
     
    LVL 8

    Assisted Solution

    by:sigmacon
    The implementation of the kit essentially require the approache they showed in the code sample. To get it working for your case, you need to provide a reader from your string.

    Replace line

    Reader rd = new InputStreamReader(conn.getInputStream());

    with

    Reader rd = new StringReader(html);

    and delete three lines in front of it.

    Try the result:

    public static String parseText(String html) throws Exception {
            final StringBuffer buf = new StringBuffer(4096);
       
            try {
                // Create an HTML document that appends all text to buf
                HTMLDocument doc = new HTMLDocument() {
                    public HTMLEditorKit.ParserCallback getReader(int pos) {
                        return new HTMLEditorKit.ParserCallback() {
                            // This method is whenever text is encountered in the HTML file
                            public void handleText(char[] data, int pos) {
                                buf.append(data);
                                buf.append('\n');
                            }
                        };
                    }
                };
       
                // Create a reader on the HTML content
                Reader rd = new StringReader(html);
       
                // Parse the HTML
                EditorKit kit = new HTMLEditorKit();
                kit.read(rd, doc, 0);
            } catch (MalformedURLException e) {
            } catch (URISyntaxException e) {
            } catch (BadLocationException e) {
            } catch (IOException e) {
            }
       
            // Return the text
            return buf.toString();
        }
    0
     
    LVL 8

    Expert Comment

    by:sigmacon
    Sorry, either take of throws Exception or the try / catch. I did not test the code so there may be minor syntax errors:


    public static String parseText(String html) {
            final StringBuffer buf = new StringBuffer(4096);
       
            try {
                // Create an HTML document that appends all text to buf
                HTMLDocument doc = new HTMLDocument() {
                    public HTMLEditorKit.ParserCallback getReader(int pos) {
                        return new HTMLEditorKit.ParserCallback() {
                            // This method is whenever text is encountered in the HTML file
                            public void handleText(char[] data, int pos) {
                                buf.append(data);
                                buf.append('\n');
                            }
                        };
                    }
                };
       
                // Create a reader on the HTML content
                Reader rd = new StringReader(html);
       
                // Parse the HTML
                EditorKit kit = new HTMLEditorKit();
                kit.read(rd, doc, 0);
            } catch (MalformedURLException e) {
            } catch (URISyntaxException e) {
            } catch (BadLocationException e) {
            } catch (IOException e) {
            }
       
            // Return the text
            return buf.toString();
        }
    0
     

    Author Comment

    by:polkadot
    some problems:

    when I use  your code as is with:  kit.read(rd, doc, 0);
    javax.swing.text.ChangedCharSetException


    when I switch the bit: kit.read(rd, doc, 1);
    javax.swing.text.BadLocationException: Invalid location


    What do the errors mean and how can I fix it?
    0
     

    Author Comment

    by:polkadot
    the code compiles, when it runs it produces an empty string (return buf.toString)

    and returns the exception errors above
    0
     

    Author Comment

    by:polkadot
    Also, I have verified that String html = "<html .... >"
    0
     
    LVL 8

    Expert Comment

    by:sigmacon
    The kit is very picky about parsing. Since the sample code actually tries to create a document, you get:

    javax.swing.text.ChangedCharSetException -- This is usually thrown if there is a content-type attribute or a charset attribute. So I don't know why its thrown here. My interpretation is that the HTMLDocument could not determine the character set of the HTML you're trying to parse. Make sure the HTML is well-formed, probably needs a head tag, and so on and declares which charater set it uses.

    javax.swing.text.BadLocationException -- The last parameter to read determines where in the document you want to insert text. Since the document is empty, 1 is not valid, only 0.

    Please be aware of the fact that, AFAIK, the HTMLKit is for HTML 3.2, so it may not help with current HTML versions.
    0
     
    LVL 92

    Expert Comment

    by:objects
    > when I use  your code as is with:  kit.read(rd, doc, 0);
    > javax.swing.text.ChangedCharSetException

    add the following:

    doc.putProperties("IgnoreCharacterSet", Boolean.TRUE);
    0
     

    Author Comment

    by:polkadot
    actually my string is just a the html code behind a url, im using NASA web page to test ...
    works on some pages actually, others returns that error

    Here is the problem I have with it, it didn't filter out all the javascript functions, am I doing something wrong?




    objects, im sorry your snippet of code with no explaination isn't any help, putProperties is not a method of HTMLDocument
    0
     
    LVL 30

    Accepted Solution

    by:
    You first must make sure that the html page is well formed.
    To do that, you can use somthing like "Tidy".
    There is an implementation for java but still under development though :-(
    http://jtidy.sourceforge.net/
    0
     
    LVL 92

    Expert Comment

    by:objects
    sorry for the type, it should have been:

    doc.putProperty("IgnoreCharacterSet", Boolean.TRUE);
    0
     
    LVL 30

    Expert Comment

    by:GrandSchtroumpf
    <:°)
    0

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    The Complete Ruby on Rails Developer Course

    Ruby on Rails is one of the most popular web development frameworks, and a useful tool used by both startups and more established companies to build strong graphic user interfaces, and responsive websites and apps.

    Suggested Solutions

    If you have upgraded to Java2 update 10 on a Microsoft Windows client, you may have discovered that your Java application does not work as it did before.  For example, the colors of your Java2D graphic may be all wrong for no apparent reason. Aft…
    After being asked a question last year, I went into one of my moods where I did some research and code just for the fun and learning of it all.  Subsequently, from this journey, I put together this article on "Range Searching Using Visual Basic.NET …
    This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.
    Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …

    856 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    9 Experts available now in Live!

    Get 1:1 Help Now