Solved

Get the first sentence of a paragraph

Posted on 2006-07-24
5
971 Views
Last Modified: 2008-02-26
Shouldn't be too tough, but I just don't have time right now to figure it out.

I need to be able to split a paragraph so that the first sentence goes into one string variable and the rest of the paragraph (sans the first sentence) goes into a second string variable. I need to take into account the common ways a sentence can end ("!","?",".").

Thank you.
0
Comment
Question by:stengelj
  • 3
  • 2
5 Comments
 
LVL 24

Accepted Solution

by:
Justin_W earned 500 total points
ID: 17170152
Determining natural language sentence boundaries is actually a very difficult thing to do. See the following links for additional info and reference:
http://www.cs.umd.edu/Honors/reports/Nilani.pdf
http://www.gnu.org/software/emacs/emacs-lisp-intro/html_node/sentence-end.html
http://computing.fnal.gov/docs/products/xemacs/v21_1/lispref.info,.StandardRegexps.html
http://www.codeproject.com/dotnet/RegexTutorial.asp

Your best bet for easily achieving something fairly accurate would be to use .NET's regular expressions to search for something like this:
    "[.?!][]\"')}]*\s*"
     This means a period, question mark or exclamation mark, followed
     optionally by a closing parenthetical character, followed by optional whitespace.

And then you would split the original string based on the index of the first match of the expression.
0
 
LVL 9

Author Comment

by:stengelj
ID: 17170192
Thanks.  I'll try it out.
0
 
LVL 9

Author Comment

by:stengelj
ID: 17170517
Perfect.   I think the only thing that might screw it up would be a punctuated abbreviation but, that should be a problem for what I'm doing. Here's my function:

Protected Function SplitAnno(ByVal myStr As String, ByVal myType As String) As String
        Dim i As Integer
        Dim s As String
        i = Regex.Match(myStr, "[.?!][]\""')}]*\s*").Index
        s = myStr.Substring(0, i + 1) '+1 to get the punctuation
        Select Case myType
            Case "Summary"
                Return Trim(s)
            Case "Body"
                Return Trim(myStr.Substring(s.Length + 1)) '+1 to go past the punctuation
            Case Else
                Return ""
        End Select

Thanks for the quick help!
    End Function
0
 
LVL 9

Author Comment

by:stengelj
ID: 17170523
Oops! I screwed up my function at the end with my thank you.
0
 
LVL 24

Expert Comment

by:Justin_W
ID: 17170571
You're welcome. However, be advised that your function may also fail for strings that don't have any matches. Also, your "'+1 to get the punctuation" strategy doesn't take the optional trailing parenthetical characters or whitespace into account.
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In .NET 2.0, Microsoft introduced the Web Site.  This was the default way to create a web Project in Visual Studio 2005.  In Visual Studio 2008, the Web Application has been restored as the default web Project in Visual Studio/.NET 3.x The Web Si…
IntroductionWhile developing web applications, a single page might contain many regions and each region might contain many number of controls with the capability to perform  postback. Many times you might need to perform some action on an ASP.NET po…
In an interesting question (https://www.experts-exchange.com/questions/29008360/) here at Experts Exchange, a member asked how to split a single image into multiple images. The primary usage for this is to place many photographs on a flatbed scanner…

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question