?
Solved

Get the first sentence of a paragraph

Posted on 2006-07-24
5
Medium Priority
?
976 Views
Last Modified: 2008-02-26
Shouldn't be too tough, but I just don't have time right now to figure it out.

I need to be able to split a paragraph so that the first sentence goes into one string variable and the rest of the paragraph (sans the first sentence) goes into a second string variable. I need to take into account the common ways a sentence can end ("!","?",".").

Thank you.
0
Comment
Question by:stengelj
  • 3
  • 2
5 Comments
 
LVL 24

Accepted Solution

by:
Justin_W earned 2000 total points
ID: 17170152
Determining natural language sentence boundaries is actually a very difficult thing to do. See the following links for additional info and reference:
http://www.cs.umd.edu/Honors/reports/Nilani.pdf
http://www.gnu.org/software/emacs/emacs-lisp-intro/html_node/sentence-end.html
http://computing.fnal.gov/docs/products/xemacs/v21_1/lispref.info,.StandardRegexps.html
http://www.codeproject.com/dotnet/RegexTutorial.asp

Your best bet for easily achieving something fairly accurate would be to use .NET's regular expressions to search for something like this:
    "[.?!][]\"')}]*\s*"
     This means a period, question mark or exclamation mark, followed
     optionally by a closing parenthetical character, followed by optional whitespace.

And then you would split the original string based on the index of the first match of the expression.
0
 
LVL 9

Author Comment

by:stengelj
ID: 17170192
Thanks.  I'll try it out.
0
 
LVL 9

Author Comment

by:stengelj
ID: 17170517
Perfect.   I think the only thing that might screw it up would be a punctuated abbreviation but, that should be a problem for what I'm doing. Here's my function:

Protected Function SplitAnno(ByVal myStr As String, ByVal myType As String) As String
        Dim i As Integer
        Dim s As String
        i = Regex.Match(myStr, "[.?!][]\""')}]*\s*").Index
        s = myStr.Substring(0, i + 1) '+1 to get the punctuation
        Select Case myType
            Case "Summary"
                Return Trim(s)
            Case "Body"
                Return Trim(myStr.Substring(s.Length + 1)) '+1 to go past the punctuation
            Case Else
                Return ""
        End Select

Thanks for the quick help!
    End Function
0
 
LVL 9

Author Comment

by:stengelj
ID: 17170523
Oops! I screwed up my function at the end with my thank you.
0
 
LVL 24

Expert Comment

by:Justin_W
ID: 17170571
You're welcome. However, be advised that your function may also fail for strings that don't have any matches. Also, your "'+1 to get the punctuation" strategy doesn't take the optional trailing parenthetical characters or whitespace into account.
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

User art_snob (http://www.experts-exchange.com/M_6114203.html) encountered strange behavior of Android Web browser on his Mobile Web site. It took a while to find the true cause. It happens so, that the Android Web browser (at least up to OS ver. 2.…
Introduction This article shows how to use the open source plupload control to upload multiple images. The images are resized on the client side before uploading and the upload is done in chunks. Background I had to provide a way for user…
Screencast - Getting to Know the Pipeline
As many of you are aware about Scanpst.exe utility which is owned by Microsoft itself to repair inaccessible or damaged PST files, but the question is do you really think Scanpst.exe is capable to repair all sorts of PST related corruption issues?
Suggested Courses
Course of the Month16 days, 11 hours left to enroll

862 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question