Solved

extract only <p>..</p> from web page in java

Posted on 2009-07-08
11
256 Views
Last Modified: 2012-05-07
I got web page contents using java.net.url in java.
And I got all the tags and contents. But I only want to get the text in <p> tag.

Can I use regular expression for that? Please let me know if there's any example.


Thanks!!
0
Comment
Question by:Juuno
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2
  • +2
11 Comments
 
LVL 86

Assisted Solution

by:CEHJ
CEHJ earned 75 total points
ID: 24806199
You'd be better off using an html parser. See

http://exampledepot.com/egs/javax.swing.text.html/GetLinks.html?l=rel

and use HTML.Tag.P instead or use a high level API like HttpUnit
0
 
LVL 15

Assisted Solution

by:fsze88
fsze88 earned 75 total points
ID: 24806413
try this?

        String beTestString = "<p>abcxyz</p>";
        Pattern p = Pattern.compile("<p>(.*)</p>");
        Matcher m = p.matcher(beTestString);
//        boolean b = m.matches();
        System.out.println("m.group(1)  : " + m.group(1));
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24806430
Well a simple multiline would break that wouldn't it? Not to mention nesting...
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 15

Expert Comment

by:fsze88
ID: 24806497
I have not try on multline, hum.... I think not a problem
so we can use  m.groupCount()  to get number of group there and using for loop take all of text of <p> tag....
make sense?
0
 
LVL 27

Assisted Solution

by:ddrudik
ddrudik earned 75 total points
ID: 24807575
Here's starter code:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
  public static void main(String[] asd){
  String sourcestring = "source string to match with pattern";
  Pattern re = Pattern.compile("<p[^>]*>(.*?)</p>",Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
  Matcher m = re.matcher(sourcestring);
  int mIdx = 0;
    while (m.find()){
      for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
        System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
      }
      mIdx++;
    }
  }
}

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24808386
There's no need to reinvent the wheel that's an html parser
0
 
LVL 92

Accepted Solution

by:
objects earned 275 total points
ID: 24809496
heres what you need

http://helpdesk.objects.com.au/java/how-do-i-extract-just-the-text-form-a-html-document-ie-strip-out-all-the-html-tags

you just need to tweak it to track when you are inside a

Let me know if you need any help
0
 

Author Comment

by:Juuno
ID: 24810396
@ objects

> heres what you need
http://helpdesk.objects.com.au/java/how-do-i-extract-just-the-text-form-a-html-document-ie-strip-out-all-the-html-tags

I got an error like: TestCallBack cannot be resolved to a type though I have that class.

0
 
LVL 92

Assisted Solution

by:objects
objects earned 275 total points
ID: 24810428
sorry that's a type should be lowercase B
0
 

Author Comment

by:Juuno
ID: 24810657
I got an exception: javax.swing.text.ChangedCharSetException at this line: editorKit.read(reader, htmlText, 0);

Thanks!!
0
 
LVL 92

Assisted Solution

by:objects
objects earned 275 total points
ID: 24810686
0

Featured Post

Salesforce Made Easy to Use

On-screen guidance at the moment of need enables you & your employees to focus on the core, you can now boost your adoption rates swiftly and simply with one easy tool.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
iterator/ListIterator approach 17 68
maven disable workspace resolution 1 79
Java basic valueOf question 1 51
replace an existing file in Dropbox 1 8
Are you developing a Java application and want to create Excel Spreadsheets? You have come to the right place, this article will describe how you can create Excel Spreadsheets from a Java Application. For the purposes of this article, I will be u…
Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
Viewers will learn about the different types of variables in Java and how to declare them. Decide the type of variable desired: Put the keyword corresponding to the type of variable in front of the variable name: Use the equal sign to assign a v…
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.
Suggested Courses

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question