Splitting a paragraph into sentences

Posted on 2009-12-30
Last Modified: 2012-05-08
I have a bit of code that takes a chunk of text and splits it into individual sentences.   It works pretty good but there are a few cases that I would like to see if I could cover in a regular expression without having to do some post processing cleanup.  Those involve titles like Mr. Mrs. or Dr.   Right now the code splits sentences after the title which is not desirable.   Given the enclosed code, can you see how to alter the pattern to prevent this from happenning?
import java.util.regex.*;

public class Test {
    public static void main(String[] args) throws Exception {
        // Create a pattern to match breaks
          String teststring = "This is a simple sentence. This is a sentence about Mr. Smith and Dr. Jones.  This is a rather more complicated (e.g. one that contains a clause) and holds a sentence (2.25). " +
          "And this is another sentence but finishes with a number 12. And this is another (small-sized) sentence. " +
          "Finally, this is the last sentence in this (rather short) paragraph." +
          " And what about this sentence? And of course don't forget this one!  Amen brother." +
          " Here is a bullet list test a.  one bullet; b. two bullets c. three bullets.";
            Pattern p = Pattern.compile("(?<=\\w[\\w\\)\\]][\\.\\?\\!]\\s)");  
        String[] result =
        for (int i=0; i<result.length; i++)
Question by:efamilant
    LVL 40

    Expert Comment

    Did you considered this?

    LVL 86

    Accepted Solution

    Try something like

    Pattern p = Pattern.compile("(?<=\\w[\\w\\)\\]](?<!Mrs?|Dr)[\\.\\?\\!]\\s)");
    LVL 40

    Expert Comment

    This is a hard problem to solve, if you really want to have a complete/heuristic solution.
    you can put more such words like Mr. or abbreviations like M.B.B.S in CEHJ's solution.

    Author Closing Comment

    Great.   Just what I needed.
    LVL 86

    Expert Comment


    Featured Post

    How to improve team productivity

    Quip adds documents, spreadsheets, and tasklists to your Slack experience
    - Elevate ideas to Quip docs
    - Share Quip docs in Slack
    - Get notified of changes to your docs
    - Available on iOS/Android/Desktop/Web
    - Online/Offline

    Join & Write a Comment

    Java contains several comparison operators (e.g., <, <=, >, >=, ==, !=) that allow you to compare primitive values. However, these operators cannot be used to compare the contents of objects. Interface Comparable is used to allow objects of a cl…
    Introduction This article is the second of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers the basic installation and configuration of the test automation tools used by…
    Viewers will learn one way to get user input in Java. Introduce the Scanner object: Declare the variable that stores the user input: An example prompting the user for input: Methods you need to invoke in order to properly get  user input:
    This tutorial covers a step-by-step guide to install VisualVM launcher in eclipse.

    754 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    17 Experts available now in Live!

    Get 1:1 Help Now