Convert text to Sentence Case

I need to be able to convert text to the proper sentence case. i.e
MY NAME IS FRED I AM A NERD. I HAVE NO FRIENDS IN LONDON.
to
My name is Fred I am a nerd. I have no friends in London.

The key factors being identification of keywords to capitalise them, e.g. Fred, I and London

A combination that would work is using word convert the case and then do a spell check. However you can not programatically control the spell checker (except for starting it) in Word.
Suggestions please.

Loads of points available.

ps already considered creating list of keywords, rejected due to the time involved.

Will also be working with diseases and non standard words, so global spell check and replace may not work.
LVL 7
simonsabinAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

VbmasterCommented:
How else can the program know that a word is to be capitalizad, except using list of keywords? Guessing?
0
simonsabinAuthor Commented:
They key point is I don't want to build up the keyword list. Word already has it in its dictionary. How can I use it?

I need to be able to spell check and now whether the word found is a suggestion of wrong because it should be capitalised.
0
watyCommented:
Try this

' #VBIDEUtils#************************************************************
' * Programmer Name  : Waty Thierry
' * Web Site         : www.geocities.com/ResearchTriangle/6311/
' * E-Mail           : waty.thierry@usa.net
' * Date             : 29/06/99
' * Time             : 13:54
' **********************************************************************
' * Comments         : Changing Strings to Title Case
' *
' *
' **********************************************************************

Function TitleCaps(InString As String) As String
   ' *** Changing Strings to Title Case
   ' *** Useful for certain type of applications (eg for names and addresses etc),
   ' *** being able to Title Caps (ie capitalise first letter of each word) is achieved
   ' *** with the following function.
   ' *** However, be careful with names such as McDonalds as it will become Mcdonalds.
   ' *** Perhaps it would be a good idea to edit the code to cope with this...
   ' *** of applications (eg for names and addresses etc), being able to Title Caps
   ' *** (ie capitalise first letter of each word) is achieved with the following function.
   ' *** However, be careful with names such as McDonalds as it will become Mcdonalds.
   ' *** Perhaps it would be a good idea to edit the code to cope with this...

   Dim OutString        As String
   Dim CurrentLetter    As String
   Dim CurrentWord      As String
   Dim TCaps            As String
   Dim StrCount         As Integer

   ' *** Converts [instring] to Title Caps (as best it can!)
   OutString = ""
   If InString = "" Then
      TitleCaps = ""
      Exit Function
   End If

   CurrentWord = ""
   For StrCount = 1 To Len(InString)
      CurrentLetter = Mid(InString, StrCount, 1)
      CurrentWord = CurrentWord + CurrentLetter
      If InStr(" .,/\;:-!?[]()#", CurrentLetter) <> 0 Or _
         StrCount = Len(InString) Then
         TCaps = UCase(Left(CurrentWord, 1)) + _
            LCase(Right(CurrentWord, Len(CurrentWord) - 1))
         OutString = OutString + TCaps
         CurrentWord = ""
      End If
   Next
   TitleCaps = OutString

End Function

0
The Ultimate Tool Kit for Technolgy Solution Provi

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy for valuable how-to assets including sample agreements, checklists, flowcharts, and more!

simonsabinAuthor Commented:
Waty,

Not what I am after, I need sentence case not proper case

ps. You can use StrConv(string,vbProperCase)
to do want you have done.
0
caraf_gCommented:
Forget it...

How do you know you're not dealing with a Language such as German, where all Nouns are capitalised?

Or with dutch where we do not capitalise anything, for example the fact that I am dutch or my first language is dutch.

Or with Scottish names like McDonalds where capitals appear in the middle of a word?

Or with some language that you've never seen before where you haven't got a clue whether a word is a name or any other type of word.

Even if the language is English, who's to say that the user is not talking about something that requires the inclusion of foreign language words? Such as "The Dutch word for difficult is moeilijk"

Etc... etc... etc...
0
watyCommented:
In French, the month name are not capitalized like in English

January = janvier
February = février
....
0
simonsabinAuthor Commented:
I which case the word would not be capitalised. If you run Word spell checking on you sentence after you have change it from all capitals to Sentence Case then the words dutch and moeilijk are underlined as spelt wrongly. The difference however is that the suggestion for dutch is Dutch (i.e capitalise it) where as the suggestion for moeilijk is no suggestion or a different word.

I want to automatically replace words that should be capitalised and leave the rest. But you have no control with the word spell checker.
0
simonsabinAuthor Commented:
p.s. it is all english.

Also I know what ever solution is chosen it won't be perfect.
0
amebaCommented:
>you can not programatically control the spell checker
you can retrieve errors

I pasted your sentence (converted to lowercase) to Word97, created 'custom Writting Style' (only checks Capitalization) in Options, and used VBA:
    Dim i As Integer, pr1 As ProofreadingErrors, msg As String
    Set pr1 = Selection.Range.SpellingErrors
    For i = 1 To pr1.Count
        msg = msg & pr1.Item(i).Text & vbCr
    Next
    MsgBox msg
It returned:
fred
i
london

Maybe this can be a start?
0
simonsabinAuthor Commented:
Thanks ameba, just the trigger needed.

Here is the code I have used, any comments will be welcomed

  Dim w As Word.Application, d As Word.Document, s As Object, lng As Long
  Dim sugg As Word.SpellingSuggestions, r As Range
  Set w = GetObject("", "Word.Application")
 
  Set d = w.Documents.Add
 
  lng = Len(Text1)
  w.ActiveDocument.Content = Text1
 
  w.ActiveDocument.Content.Case = wdLowerCase
  w.ActiveDocument.Content.Case = wdTitleSentence
 
  For Each s In w.ActiveDocument.SpellingErrors
    Set sugg = s.GetSpellingSuggestions
    If sugg.Count > 0 Then
      If StrComp(sugg(1), s.Text, vbTextCompare) = 0 Then
        s.Text = sugg(1)
      End If
    End If
  Next
  Text1 = Left$(w.ActiveDocument.Content.Text, lng)
 
  d.Close False
  w.Quit
  Set w = Nothing
0
amebaCommented:
Sounds good, I'll check it later.
Must go now and vote for President of my country.
(Also I know what ever candidate is chosen it won't be perfect. :)
0
amebaCommented:
' I hope it is OK.
' Text1(0) multiline=true, Text1(1)
Option Explicit

Private Sub Form_Load()
    Text1(0).Text = "MY NAME IS FRED I AM A NERD. I HAVE NO FRIENDS IN LONDON, GREAT Britain." _
        & vbCrLf & "Mcdonalds is in NEW york."
End Sub

Private Sub Command1_Click()
    Text1(1).Text = XCase(Text1(0).Text)
End Sub

Public Function XCase(sInput As String) As String
    Dim w As Word.Application, d As Word.Document
    Dim r As Word.Range
    Dim sugg As Word.SpellingSuggestions
   
    Set w = GetObject("", "Word.Application")
    Set d = w.Documents.Add
    d.Range.LanguageID = wdEnglishUK
    d.Content.Text = sInput
   
    d.Content.Case = wdLowerCase
    d.Content.Case = wdTitleSentence
   
    For Each r In w.ActiveDocument.SpellingErrors
        Set sugg = r.GetSpellingSuggestions
        If sugg.Count > 0 Then
            If StrComp(sugg(1), r.Text, vbTextCompare) = 0 Then
                r.Text = sugg(1)
            End If
        End If
    Next
 
    ' Word will use vbCr instead of vbCrLf, and append one extra vbCr
    '    reverse this
    XCase = Replace(w.ActiveDocument.Range.Text, vbCr, vbCrLf)
    If Right$(XCase, 2) = vbCrLf Then
        XCase = Left$(XCase, Len(XCase) - 2)
    End If
   
    d.Close False
    w.Quit
    Set w = Nothing
End Function
0
simonsabinAuthor Commented:
Yeh I noticed the extra character appearing when I put the text back into VB.

JUST one more note, this is to be called probably >1 Million times. Getting the Word app and creating the document can be done once but is

d.Content.Text = sInput
     
    d.Content.Case = wdLowerCase
    d.Content.Case = wdTitleSentence
     
    For Each r In w.ActiveDocument.SpellingErrors
        Set sugg = r.GetSpellingSuggestions
        If sugg.Count > 0 Then
            If StrComp(sugg(1), r.Text, vbTextCompare) = 0 Then
                r.Text = sugg(1)
            End If
        End If
    Next
   
    ' Word will use vbCr instead of vbCrLf, and append one extra vbCr
    '    reverse this
    XCase = Replace(w.ActiveDocument.Range.Text, vbCr, vbCrLf)
    If Right$(XCase, 2) = vbCrLf Then
        XCase = Left$(XCase, Len(XCase) - 2)
    End If


the quickest code?
0
amebaCommented:
I think you don't need:
    d.Content.Case = wdLowerCase

In Options define 'custom' writting style, check only Capitalization checkBox and ignore other spelling errors

    If d Is Nothing Then
        Set w = GetObject("", "Word.Application")
        Set d = w.Documents.Add
       
        d.Range.LanguageID = wdEnglishUK
        w.Languages(wdEnglishUK).SpellingDictionaryType = wdSpelling
        w.Languages(wdEnglishUK).DefaultWritingStyle = "Custom"
         w.ActiveDocument.ActiveWritingStyle(wdEnglishUK) = "Custom"
    End If
------------

It is slow, few KB/minute.

1. What is the average size of your records (100-200 characters or many KB)?
2. Can you ignore MixedCase words (is Mcdonalds OK)?
0
simonsabinAuthor Commented:
I found that on making a second call the conversion to sentence case did not work that is why I added the conversion to lower case.

Average size about 30-40 words.
0
amebaCommented:
If this is a batch job, it will run 20 days (3K/minute, 200MB).

If Word97 automation is too slow, it is not big problem creating wordlist
You'll need:
; days of the week
; months
; holidays
; planets, stars, zodiac
; computers, programs, and languages
; names from Alice in Wonderland
; famous persons, deities and related adjectives
; misc. unique entities
; nationalities, languages, religions
; places and related adjectives
; some abbreviations

see this collection:
ftp://ftp.ox.ac.uk/pub/wordlists/
dirs: places, names, science, databases
0
amebaCommented:
40 mil. words is about 40,000 distinct words

Maybe you can pass only these words once, instead of 1000 times more.
(1 hour instead of 500-1000 hours processing in MS Word)

Of course, this requires some programming:
Create Table with only one field 'Capword'
Initially, it will contain all words (lowercased), but after getting info from Word, only words with Capitals (e.g. only 10,000 words).
Hm, maybe table needs two fields 'Capword' and 'lcasedword' ...
0
simonsabinAuthor Commented:
Just found out that the free format text don't need to be converted to sentence case. Only names which can be done with strconv.

Nay way was interesting. Thanks for the help.

How many points do you want?
0
simonsabinAuthor Commented:
Just found out that the free format text don't need to be converted to sentence case. Only names which can be done with strconv.

Any way was interesting. Thanks for the help.

How many points do you want?
0
amebaCommented:
I'm glad you solved it. 50 points question "For ameba" will be great.
Thanks
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
simonsabinAuthor Commented:
Adjusted points to 500
0
simonsabinAuthor Commented:
Honest chap
0
amebaCommented:
Wow! Thanks!
0
simonsabinAuthor Commented:
Well its almost Christmas!!!
0
amebaCommented:
:-)
It is for me now. Thanks!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Visual Basic Classic

From novice to tech pro — start learning today.