Splitting a paragraph into sentences

I have a bit of code that takes a chunk of text and splits it into individual sentences.   It works pretty good but there are a few cases that I would like to see if I could cover in a regular expression without having to do some post processing cleanup.  Those involve titles like Mr. Mrs. or Dr.   Right now the code splits sentences after the title which is not desirable.   Given the enclosed code, can you see how to alter the pattern to prevent this from happenning?
import java.util.regex.*;

public class Test {
    public static void main(String[] args) throws Exception {
        // Create a pattern to match breaks
          String teststring = "This is a simple sentence. This is a sentence about Mr. Smith and Dr. Jones.  This is a rather more complicated (e.g. one that contains a clause) and holds a sentence (2.25). " +
          "And this is another sentence but finishes with a number 12. And this is another (small-sized) sentence. " +
          "Finally, this is the last sentence in this (rather short) paragraph." +
          " And what about this sentence? And of course don't forget this one!  Amen brother." +
          " Here is a bullet list test a.  one bullet; b. two bullets c. three bullets.";
            Pattern p = Pattern.compile("(?<=\\w[\\w\\)\\]][\\.\\?\\!]\\s)");  
        String[] result =
                 p.split(teststring);
        for (int i=0; i<result.length; i++)
            System.out.println("i->"+result[i]);
    }
}
efamilantAsked:
Who is Participating?
 
CEHJCommented:
Try something like

Pattern p = Pattern.compile("(?<=\\w[\\w\\)\\]](?<!Mrs?|Dr)[\\.\\?\\!]\\s)");
0
 
Gurvinder Pal SinghCommented:
Did you considered this?
http://stanfordparser.rubyforge.org/

0
 
Gurvinder Pal SinghCommented:
This is a hard problem to solve, if you really want to have a complete/heuristic solution.
you can put more such words like Mr. or abbreviations like M.B.B.S in CEHJ's solution.
0
 
efamilantAuthor Commented:
Great.   Just what I needed.
0
 
CEHJCommented:
:-)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.