Solved

Get the first sentence of a paragraph

Posted on 2006-07-24
5
972 Views
Last Modified: 2008-02-26
Shouldn't be too tough, but I just don't have time right now to figure it out.

I need to be able to split a paragraph so that the first sentence goes into one string variable and the rest of the paragraph (sans the first sentence) goes into a second string variable. I need to take into account the common ways a sentence can end ("!","?",".").

Thank you.
0
Comment
Question by:stengelj
  • 3
  • 2
5 Comments
 
LVL 24

Accepted Solution

by:
Justin_W earned 500 total points
ID: 17170152
Determining natural language sentence boundaries is actually a very difficult thing to do. See the following links for additional info and reference:
http://www.cs.umd.edu/Honors/reports/Nilani.pdf
http://www.gnu.org/software/emacs/emacs-lisp-intro/html_node/sentence-end.html
http://computing.fnal.gov/docs/products/xemacs/v21_1/lispref.info,.StandardRegexps.html
http://www.codeproject.com/dotnet/RegexTutorial.asp

Your best bet for easily achieving something fairly accurate would be to use .NET's regular expressions to search for something like this:
    "[.?!][]\"')}]*\s*"
     This means a period, question mark or exclamation mark, followed
     optionally by a closing parenthetical character, followed by optional whitespace.

And then you would split the original string based on the index of the first match of the expression.
0
 
LVL 9

Author Comment

by:stengelj
ID: 17170192
Thanks.  I'll try it out.
0
 
LVL 9

Author Comment

by:stengelj
ID: 17170517
Perfect.   I think the only thing that might screw it up would be a punctuated abbreviation but, that should be a problem for what I'm doing. Here's my function:

Protected Function SplitAnno(ByVal myStr As String, ByVal myType As String) As String
        Dim i As Integer
        Dim s As String
        i = Regex.Match(myStr, "[.?!][]\""')}]*\s*").Index
        s = myStr.Substring(0, i + 1) '+1 to get the punctuation
        Select Case myType
            Case "Summary"
                Return Trim(s)
            Case "Body"
                Return Trim(myStr.Substring(s.Length + 1)) '+1 to go past the punctuation
            Case Else
                Return ""
        End Select

Thanks for the quick help!
    End Function
0
 
LVL 9

Author Comment

by:stengelj
ID: 17170523
Oops! I screwed up my function at the end with my thank you.
0
 
LVL 24

Expert Comment

by:Justin_W
ID: 17170571
You're welcome. However, be advised that your function may also fail for strings that don't have any matches. Also, your "'+1 to get the punctuation" strategy doesn't take the optional trailing parenthetical characters or whitespace into account.
0

Featured Post

PeopleSoft Has Never Been Easier

PeopleSoft Adoption Made Smooth & Simple!

On-The-Job Training Is made Intuitive & Easy With WalkMe's On-Screen Guidance Tool.  Claim Your Free WalkMe Account Now

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

I recently went through the process of creating a Calendar Control of events with the basis of using a database to keep track of the dates that are selectable, one requirement was to have the selected date pop-up in a simple lightbox.  At first this…
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
Attackers love to prey on accounts that have privileges. Reducing privileged accounts and protecting privileged accounts therefore is paramount. Users, groups, and service accounts need to be protected to help protect the entire Active Directory …

697 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question