Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 3385
  • Last Modified:

java regular expressions - stripping html tags

OK I am trying to strip off all html tags but this doesn't work...why not?
lines[i].replaceAll("\\<.*\\>", "");

Asusming I have a string called htmlPage, how do I convert the <p> and <br> to new lines? htmlPage is a string containing the whole html page and is multiline.

0
rukiman
Asked:
rukiman
2 Solutions
 
cmalakarCommented:
htmlString.replaceAll("<p>", "\n");

will replace all <p> tags into new lines...

Similary you can do for <BR> tag
0
 
cmalakarCommented:
Also you can replace all tags by using..

htmlString = htmlString.replaceAll("<.*>", "");

dont forget, that replaceAll returns the resultant string..
0
 
objectsCommented:
you're using a greedy quantifier, try:

lines[i].replaceAll("\\<.*?\\>", "");

> how do I convert the <p> and <br> to new lines?

line.replaceAll("\\<p\\>", "\n").replaceAll("\\<br\\>", "\n");
0
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

 
cmalakarCommented:
Sorry...

typo mistake.

htmlString = htmlString.replaceAll("<.*>", "");

should be htmlString = htmlString.replaceAll("<[a-z]*>", "");
0
 
ysnkyCommented:
what you look for is;
lines[i].replaceAll("</.*?>", "").replaceAll("<.*?>", "\n");
0
 
rukimanAuthor Commented:
I accepted cmalakar as a solution as I was completely unaware that replaceAll returned the resultant string.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now