Java code to detect the HTML tag <P>

Is there any method in Java to detect the HTML tag for a articles with tag. For example, i have 2 paragraph here. i need to have a function that can return each of the paragraph. Actually i want to put a picture beside each paragraph at the next page. And there were be more than 2 paragraph. And in between will have other HTML tag like <Strong>, <BR>, etc...

Example:
<P><B>Genting Highlands</B> Resort offers the bountiful harvest of nature together with a staggering potpourri of international standard facilities. Besides the cool air and scenic surroundings, this hilltop city also houses the country's one and only casino.<BR></P>
<P>However, one should not underestimate its value as this casino has been featured in many films produced by Hong Kong, Taiwan, and even Hollywood. Besides the casino, this hilltop resort also has a theme park of its own and an entertainment center.<BR></P>

Thanks in advance.

rgds,
joe
jo_eAsked:
Who is Participating?
 
OviCommented:
Use the HTMLParser from swing, build a HTMLDocument object and traverse the structure and make changes that you want to do.

HTMLDocument doc = new HTMLDocument();
HTMLEditorKit.read(new FileReader("x.html"), doc, 0);

... and traverse the document for querying the <p> tag. The document structure is tree like one made up of Element objects. A Element object for HTMLDocument represents a html tag. To get the type of the tag you do something like :

AttributeSet attr = aElement.getAttributes();
if(attr.getAttribute(StyleConstants.NameAttribute).equals(HTML.Tag.P))
  System.out.println("Paragraph detected");
else
  System.out.println("Not a paragraph");

If you are sure that you have paragraps at only one level (as children of the BODY element), all you have to do is to search for the BODY element, retrieve it's children with Element's methods and test for paragraphs. If there are deeper paragraphs, you should traverse recursively the document tree structure for finding them.
0
 
stephendlCommented:
You need something along the lines of this...

String text = (your text to be searched);
String searchString = "</P>";
String insertString = "<img src=\"my image url\">";
int oldPosition = 0;
int position = text.indexOf(searchString);

while (position != -1) {
  // insert the string into the text
  text = text.substring(oldPosition, position) +
     insertString  +
     text.substring(position, text.length());
  // set the old position to the current position plus the
      length of the string just added and the length of the
      string searched for (so we don't find it again!)
  oldPosition = position +
     searchString.length() +
     insertString.length();
  // search for the string again
  position = text.indexOf(searchString);
}

There are probably some errors with the exact indexes (0 based Vs 1 based indexes) so you might have to add or minus a 1 here or there. There are probably some special cases that you might have to look out for too. But this should give you a good base to start from. Hope it helps.

Stephen
0
 
stephendlCommented:
Sorry, I forgot to point out, this is obviously VERY inefficent. It would be better / faster to use a StringBuffer to construct the new string into.
0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
shyamkumarreddyCommented:
Jo_e

I can give u a high level of doing it.
You can use XML Parsing or
use javax.swing.text.html/javax.swing.text.html.parser package which has lot of classes for doing have a look at documentation.

Thanks
Shyam
0
 
OviCommented:
To find a specific element, let's say BODY, beginning from the root of the structure you should do :

public Element getBodyElement(HTMLDocument doc) {
  Element root = doc.getRootEelements()[0];
  return(findChild(root));
}

public Element findChild(Element root, HTML.Tag type) {
  if(matchType(root, type))
    return(root);
  int count = root.getElementCount();
  for(int i = 0; i<count; i++) {
    Element child = findChild(root.getElement(i), type);
    if(child != null)
     return(child);
  }
  return(null);
}

public boolean matchType(Element e, HTML.Tag type) {  
return(e.getAttributes().getAttribute(StyleConstants.NameAttribute).equals(type));
}
0
 
OviCommented:
... and of course all the required imports :
import java.swing.text.*;
import java.swing.text.html.*;


Note : implementing in such maneer give's you the possibility to extend your application not handlig only <p> tags, but all defined ones.
0
 
jo_eAuthor Commented:
Hi Ovi,

Thanks for your kindly answer my question.
But the wway can i use this in the JSP? Because i am not use the swing to built the interface.

<import java.swing.text.*;
import java.swing.text.html.*;>

Thank you.

joe

0
 
jo_eAuthor Commented:
Hi Ovi,

Thanks for your kindly answer my question.
But the wway can i use this in the JSP? Because i am not use the swing to built the interface.

<import java.swing.text.*;
import java.swing.text.html.*;>

Thank you.

joe

0
 
OviCommented:
If you can communicate with a servlet for this purpose, yes.
0
 
girionisCommented:
No comment has been added lately, so it's time to clean up this TA.

I will leave a recommendation in the Cleanup topic area that this question is:

- points to Ovi

Please leave any comments here within the
next seven days.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER !

girionis
Cleanup Volunteer
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.