• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 260
  • Last Modified:

Is there a way to convert a HTML string to text using plain java?

I need the ability to take an HTML formatted string and convert it to straight text. I need to do this in Java and would like to do it natively ( no extra jars , no gui ) if possible.

Any suggestions? The stub below 'works' but the resulting string is not accessible outside the class.

  Reader reader = new StringReader(
            "  <html><p>A <foo>xx</foo><a href=test>link</a>");
 
      String yo = "";
      try
      {
         {
 
           
            HTMLEditorKit.ParserCallback callback =
            new HTMLEditorKit.ParserCallback()
            {
         
               public void handleText(char[] data, int pos)
               {
                  System.out.println(data);
                //  yo = data.toString();
               }
            };
            new ParserDelegator().parse(reader, callback, false);
         }
      }
      catch (IOException e)
      {
                e.printStackTrace();
      }
 
0
Sarge516
Asked:
Sarge516
  • 4
  • 3
1 Solution
 
a_bCommented:
"not accessible outside the class." I am not sure I follow. Can you please explain??
0
 
Sarge516Author Commented:
If I try to use the "YO" string in the HTMLEditor inner class, I get a run error in eclipse:

Cannot access a non-final variable from an inner-class ....

0
 
a_bCommented:
public class Test2 {
      String test = "TESTING";

      public static void main(String args[]) {
            new Test2().hello();
      }

      private  void hello() {
            new InnerClass().sayHello();
            
      }

      class InnerClass {
            public void sayHello() {
                  System.out.println(Test2.this.test);
            }
      }

}
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
Sarge516Author Commented:
Sorry if I was vague. Here is my complete code with runtime error.  I can't seem to return a value this way. It doesn't like return data.toString. Callback returns void and not sure how to work around that.

The task is to convert the HMTL line to text.  I am open to other ways if this route is not possible.


import java.io.Reader;
import java.io.StringReader;

import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.ParserDelegator;

public class HtmlToText {

    /**
     * @param args
     */

    private String getText(String yo) {
        Reader reader = new StringReader(yo);
       
       
        try {
            {

                HTMLEditorKit.ParserCallback callback = new HTMLEditorKit.ParserCallback() {

                    public void handleText(char[] data, int pos) {
                        System.out.println(data);
                        return data.toString();
                    //    return "OK";
                    }
                };
                new ParserDelegator().parse(reader, callback, false);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        return "OK1";
    }

    public static void main(String[] args) {

        HtmlToText ht = new HtmlToText();
        System.out.println(ht.getText("<html><p>A <foo>xx</foo><a href=test>link</a>"));
    }

}

Exception in thread "main" java.lang.Error: Unresolved compilation problem:
    Void methods cannot return a value

    at HtmlToText.getText(HtmlToText.java:24)
    at HtmlToText.main(HtmlToText.java:39)




0
 
a_bCommented:

import java.io.Reader;
import java.io.StringReader;

import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.ParserDelegator;

public class HtmlToText {
	String text = "";

   /**
    * @param args
    */

   private String getText(String yo) {
       Reader reader = new StringReader(yo);
       
       
       try {
           {

               HTMLEditorKit.ParserCallback callback = new HTMLEditorKit.ParserCallback() {

                   public void handleText(char[] data, int pos) {
                       System.out.println(data);
                       text = new String(data);
                   //    return "OK";
                   }
               };
               new ParserDelegator().parse(reader, callback, false);
           }
       } catch (Exception e) {
           e.printStackTrace();
       }
       return "OK1";
   }

   public static void main(String[] args) {

       HtmlToText ht = new HtmlToText();
       System.out.println(ht.getText("<html><p>A <foo>xx</foo><a href=test>link</a>"));
   }

}

Open in new window

0
 
Sarge516Author Commented:
Taking the visbility a little higher does solve the major problem I was having. Thanks for a creative solution!
The code still needed a couple more tweaks, as the result was only the returning the final word in the html. I'm posting the final to help others in the future.

import java.io.Reader;
import java.io.StringReader;

import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.ParserDelegator;

public class HTML_to_Text {

      StringBuilder text = new StringBuilder("");
      StringBuilder temp = new StringBuilder("");

      public String getText(String yo) {
            Reader reader = new StringReader(yo);

            try {
                  {

                        HTMLEditorKit.ParserCallback callback = new HTMLEditorKit.ParserCallback() {

                              public void handleText(char[] data, int pos) {
                                    // System.out.println(data);
                                    temp = new StringBuilder(new String(data).trim()).append(" ");
                                    text = text.append(temp);
                              }
                        };
                        new ParserDelegator().parse(reader, callback, false);
                  }
            } catch (Exception e) {
                  e.printStackTrace();
            }
            return text.toString().trim();
      }

      public static void main(String[] args) {

            HTML_to_Text ht = new HTML_to_Text();
            System.out.println(ht
                        .getText("<html><p>A <foo>xx</foo><a href=test>link</a>"));
      }

}



0
 
Sarge516Author Commented:
Thanks for the help! Another set of eyes and brains is what I needed. :}
0

Featured Post

[Webinar On Demand] Database Backup and Recovery

Does your company store data on premises, off site, in the cloud, or a combination of these? If you answered “yes”, you need a data backup recovery plan that fits each and every platform. Watch now as as Percona teaches us how to build agile data backup recovery plan.

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now