Sarge516
asked on
Is there a way to convert a HTML string to text using plain java?
I need the ability to take an HTML formatted string and convert it to straight text. I need to do this in Java and would like to do it natively ( no extra jars , no gui ) if possible.
Any suggestions? The stub below 'works' but the resulting string is not accessible outside the class.
Reader reader = new StringReader(
" <html><p>A <foo>xx</foo><a href=test>link</a>");
String yo = "";
try
{
{
HTMLEditorKit.ParserCallba ck callback =
new HTMLEditorKit.ParserCallba ck()
{
public void handleText(char[] data, int pos)
{
System.out.println(data);
// yo = data.toString();
}
};
new ParserDelegator().parse(re ader, callback, false);
}
}
catch (IOException e)
{
e.printStackTrace();
}
Any suggestions? The stub below 'works' but the resulting string is not accessible outside the class.
Reader reader = new StringReader(
" <html><p>A <foo>xx</foo><a href=test>link</a>");
String yo = "";
try
{
{
HTMLEditorKit.ParserCallba
new HTMLEditorKit.ParserCallba
{
public void handleText(char[] data, int pos)
{
System.out.println(data);
// yo = data.toString();
}
};
new ParserDelegator().parse(re
}
}
catch (IOException e)
{
e.printStackTrace();
}
"not accessible outside the class." I am not sure I follow. Can you please explain??
ASKER
If I try to use the "YO" string in the HTMLEditor inner class, I get a run error in eclipse:
Cannot access a non-final variable from an inner-class ....
Cannot access a non-final variable from an inner-class ....
public class Test2 {
String test = "TESTING";
public static void main(String args[]) {
new Test2().hello();
}
private void hello() {
new InnerClass().sayHello();
}
class InnerClass {
public void sayHello() {
System.out.println(Test2.t his.test);
}
}
}
String test = "TESTING";
public static void main(String args[]) {
new Test2().hello();
}
private void hello() {
new InnerClass().sayHello();
}
class InnerClass {
public void sayHello() {
System.out.println(Test2.t
}
}
}
ASKER
Sorry if I was vague. Here is my complete code with runtime error. I can't seem to return a value this way. It doesn't like return data.toString. Callback returns void and not sure how to work around that.
The task is to convert the HMTL line to text. I am open to other ways if this route is not possible.
import java.io.Reader;
import java.io.StringReader;
import javax.swing.text.html.HTML EditorKit;
import javax.swing.text.html.pars er.ParserD elegator;
public class HtmlToText {
/**
* @param args
*/
private String getText(String yo) {
Reader reader = new StringReader(yo);
try {
{
HTMLEditorKit.ParserCallba ck callback = new HTMLEditorKit.ParserCallba ck() {
public void handleText(char[] data, int pos) {
System.out.println(data);
return data.toString();
// return "OK";
}
};
new ParserDelegator().parse(re ader, callback, false);
}
} catch (Exception e) {
e.printStackTrace();
}
return "OK1";
}
public static void main(String[] args) {
HtmlToText ht = new HtmlToText();
System.out.println(ht.getT ext("<html ><p>A <foo>xx</foo><a href=test>link</a>"));
}
}
Exception in thread "main" java.lang.Error: Unresolved compilation problem:
Void methods cannot return a value
at HtmlToText.getText(HtmlToT ext.java:2 4)
at HtmlToText.main(HtmlToText .java:39)
The task is to convert the HMTL line to text. I am open to other ways if this route is not possible.
import java.io.Reader;
import java.io.StringReader;
import javax.swing.text.html.HTML
import javax.swing.text.html.pars
public class HtmlToText {
/**
* @param args
*/
private String getText(String yo) {
Reader reader = new StringReader(yo);
try {
{
HTMLEditorKit.ParserCallba
public void handleText(char[] data, int pos) {
System.out.println(data);
return data.toString();
// return "OK";
}
};
new ParserDelegator().parse(re
}
} catch (Exception e) {
e.printStackTrace();
}
return "OK1";
}
public static void main(String[] args) {
HtmlToText ht = new HtmlToText();
System.out.println(ht.getT
}
}
Exception in thread "main" java.lang.Error: Unresolved compilation problem:
Void methods cannot return a value
at HtmlToText.getText(HtmlToT
at HtmlToText.main(HtmlToText
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Taking the visbility a little higher does solve the major problem I was having. Thanks for a creative solution!
The code still needed a couple more tweaks, as the result was only the returning the final word in the html. I'm posting the final to help others in the future.
import java.io.Reader;
import java.io.StringReader;
import javax.swing.text.html.HTML EditorKit;
import javax.swing.text.html.pars er.ParserD elegator;
public class HTML_to_Text {
StringBuilder text = new StringBuilder("");
StringBuilder temp = new StringBuilder("");
public String getText(String yo) {
Reader reader = new StringReader(yo);
try {
{
HTMLEditorKit.ParserCallba ck callback = new HTMLEditorKit.ParserCallba ck() {
public void handleText(char[] data, int pos) {
// System.out.println(data);
temp = new StringBuilder(new String(data).trim()).appen d(" ");
text = text.append(temp);
}
};
new ParserDelegator().parse(re ader, callback, false);
}
} catch (Exception e) {
e.printStackTrace();
}
return text.toString().trim();
}
public static void main(String[] args) {
HTML_to_Text ht = new HTML_to_Text();
System.out.println(ht
.getText("<html><p>A <foo>xx</foo><a href=test>link</a>"));
}
}
The code still needed a couple more tweaks, as the result was only the returning the final word in the html. I'm posting the final to help others in the future.
import java.io.Reader;
import java.io.StringReader;
import javax.swing.text.html.HTML
import javax.swing.text.html.pars
public class HTML_to_Text {
StringBuilder text = new StringBuilder("");
StringBuilder temp = new StringBuilder("");
public String getText(String yo) {
Reader reader = new StringReader(yo);
try {
{
HTMLEditorKit.ParserCallba
public void handleText(char[] data, int pos) {
// System.out.println(data);
temp = new StringBuilder(new String(data).trim()).appen
text = text.append(temp);
}
};
new ParserDelegator().parse(re
}
} catch (Exception e) {
e.printStackTrace();
}
return text.toString().trim();
}
public static void main(String[] args) {
HTML_to_Text ht = new HTML_to_Text();
System.out.println(ht
.getText("<html><p>A <foo>xx</foo><a href=test>link</a>"));
}
}
ASKER
Thanks for the help! Another set of eyes and brains is what I needed. :}