Extract content between two tags in HTML document

Posted on 2005-03-27
Medium Priority
Last Modified: 2012-08-14

I have an HTML document, contained in a String (using this code: http://www.javaalmanac.com/egs/javax.swing.text.html/GetText.html).

Now, I would like to extract the String between the <title> tag, and the </title> tag.

(The title tags will *always* be in this document).

Any example code for this? Should I just use StringTokenizer, setting the delimiter to "<title>", then in the second token, run the StringTokenizer class again, to extract the first token, when the delimiter is "</title>" ??

That's the best idea that I can come up with.

Thanks in advance,
>> IM
Question by:InteractiveMind
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2

Accepted Solution

lhankins earned 2000 total points
ID: 13640311
Here's a working example :

      String someHtml =
              "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\n" +
              "<html>\n" +
              "<head>\n" +
              "<title>this is my title</title>\n" +
              "\n" +
              "<style type=\"text/css\">\n" +
              "</style>\n" +
              "\n" +
              "<script type=\"text/javascript\">\n" +
              "</script>\n" +
              "\n" +
              "\n" +
              "</head>\n" +
              "\n" +
              "<body>\n" +
              "   <div>bleh</div>\n" +
              "</body>\n" +
              "\n" +

      String titleStartTag = "<title>";
      String titleEndTag = "</title>";

      int start = someHtml.indexOf(titleStartTag);
      int end = someHtml.indexOf(titleEndTag);

      if (start != -1 && end !=-1)
         String titleText = someHtml.substring(start + titleStartTag.length(), end);

         System.out.println("title inner text is [" + titleText + "]");


Output when run is :

    title inner text is [this is my title]

Expert Comment

ID: 13640319
BTW - I did a .toLowerCase() on the whole HTML string because title could appear as <title> or <TITLE>.      This will also make the inner text lower case.  To get around this, you could find the <title> start and end via the lowercase string, then fall back to the original HTML string when doing the .substring.

LVL 25

Author Comment

ID: 13640424
Fantastic! Thank you very much.  :)
>> IM

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction Java can be integrated with native programs using an interface called JNI(Java Native Interface). Native programs are programs which can directly run on the processor. JNI is simply a naming and calling convention so that the JVM (Java…
Java Flight Recorder and Java Mission Control together create a complete tool chain to continuously collect low level and detailed runtime information enabling after-the-fact incident analysis. Java Flight Recorder is a profiling and event collectio…
Viewers will learn about the different types of variables in Java and how to declare them. Decide the type of variable desired: Put the keyword corresponding to the type of variable in front of the variable name: Use the equal sign to assign a v…
Viewers will learn one way to get user input in Java. Introduce the Scanner object: Declare the variable that stores the user input: An example prompting the user for input: Methods you need to invoke in order to properly get  user input:
Suggested Courses
Course of the Month8 days, 6 hours left to enroll

766 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question