Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

java regular expressions - stripping html tags

Posted on 2007-11-18
6
Medium Priority
?
3,379 Views
Last Modified: 2012-08-13
OK I am trying to strip off all html tags but this doesn't work...why not?
lines[i].replaceAll("\\<.*\\>", "");

Asusming I have a string called htmlPage, how do I convert the <p> and <br> to new lines? htmlPage is a string containing the whole html page and is multiline.

0
Comment
Question by:rukiman
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
6 Comments
 
LVL 23

Expert Comment

by:cmalakar
ID: 20310205
htmlString.replaceAll("<p>", "\n");

will replace all <p> tags into new lines...

Similary you can do for <BR> tag
0
 
LVL 23

Assisted Solution

by:cmalakar
cmalakar earned 240 total points
ID: 20310208
Also you can replace all tags by using..

htmlString = htmlString.replaceAll("<.*>", "");

dont forget, that replaceAll returns the resultant string..
0
 
LVL 92

Accepted Solution

by:
objects earned 260 total points
ID: 20310215
you're using a greedy quantifier, try:

lines[i].replaceAll("\\<.*?\\>", "");

> how do I convert the <p> and <br> to new lines?

line.replaceAll("\\<p\\>", "\n").replaceAll("\\<br\\>", "\n");
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 23

Expert Comment

by:cmalakar
ID: 20310265
Sorry...

typo mistake.

htmlString = htmlString.replaceAll("<.*>", "");

should be htmlString = htmlString.replaceAll("<[a-z]*>", "");
0
 
LVL 9

Expert Comment

by:ysnky
ID: 20311268
what you look for is;
lines[i].replaceAll("</.*?>", "").replaceAll("<.*?>", "\n");
0
 

Author Comment

by:rukiman
ID: 20325330
I accepted cmalakar as a solution as I was completely unaware that replaceAll returned the resultant string.
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

After being asked a question last year, I went into one of my moods where I did some research and code just for the fun and learning of it all.  Subsequently, from this journey, I put together this article on "Range Searching Using Visual Basic.NET …
Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
Viewers learn about the third conditional statement “else if” and use it in an example program. Then additional information about conditional statements is provided, covering the topic thoroughly. Viewers learn about the third conditional statement …
Viewers learn how to read error messages and identify possible mistakes that could cause hours of frustration. Coding is as much about debugging your code as it is about writing it. Define Error Message: Line Numbers: Type of Error: Break Down…
Suggested Courses

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question