simonsabin
asked on
Convert text to Sentence Case
I need to be able to convert text to the proper sentence case. i.e
MY NAME IS FRED I AM A NERD. I HAVE NO FRIENDS IN LONDON.
to
My name is Fred I am a nerd. I have no friends in London.
The key factors being identification of keywords to capitalise them, e.g. Fred, I and London
A combination that would work is using word convert the case and then do a spell check. However you can not programatically control the spell checker (except for starting it) in Word.
Suggestions please.
Loads of points available.
ps already considered creating list of keywords, rejected due to the time involved.
Will also be working with diseases and non standard words, so global spell check and replace may not work.
MY NAME IS FRED I AM A NERD. I HAVE NO FRIENDS IN LONDON.
to
My name is Fred I am a nerd. I have no friends in London.
The key factors being identification of keywords to capitalise them, e.g. Fred, I and London
A combination that would work is using word convert the case and then do a spell check. However you can not programatically control the spell checker (except for starting it) in Word.
Suggestions please.
Loads of points available.
ps already considered creating list of keywords, rejected due to the time involved.
Will also be working with diseases and non standard words, so global spell check and replace may not work.
How else can the program know that a word is to be capitalizad, except using list of keywords? Guessing?
ASKER
They key point is I don't want to build up the keyword list. Word already has it in its dictionary. How can I use it?
I need to be able to spell check and now whether the word found is a suggestion of wrong because it should be capitalised.
I need to be able to spell check and now whether the word found is a suggestion of wrong because it should be capitalised.
Try this
' #VBIDEUtils#************** ********** ********** ********** ********** ******
' * Programmer Name : Waty Thierry
' * Web Site : www.geocities.com/ResearchTriangle/6311/
' * E-Mail : waty.thierry@usa.net
' * Date : 29/06/99
' * Time : 13:54
' ************************** ********** ********** ********** ********** ****
' * Comments : Changing Strings to Title Case
' *
' *
' ************************** ********** ********** ********** ********** ****
Function TitleCaps(InString As String) As String
' *** Changing Strings to Title Case
' *** Useful for certain type of applications (eg for names and addresses etc),
' *** being able to Title Caps (ie capitalise first letter of each word) is achieved
' *** with the following function.
' *** However, be careful with names such as McDonalds as it will become Mcdonalds.
' *** Perhaps it would be a good idea to edit the code to cope with this...
' *** of applications (eg for names and addresses etc), being able to Title Caps
' *** (ie capitalise first letter of each word) is achieved with the following function.
' *** However, be careful with names such as McDonalds as it will become Mcdonalds.
' *** Perhaps it would be a good idea to edit the code to cope with this...
Dim OutString As String
Dim CurrentLetter As String
Dim CurrentWord As String
Dim TCaps As String
Dim StrCount As Integer
' *** Converts [instring] to Title Caps (as best it can!)
OutString = ""
If InString = "" Then
TitleCaps = ""
Exit Function
End If
CurrentWord = ""
For StrCount = 1 To Len(InString)
CurrentLetter = Mid(InString, StrCount, 1)
CurrentWord = CurrentWord + CurrentLetter
If InStr(" .,/\;:-!?[]()#", CurrentLetter) <> 0 Or _
StrCount = Len(InString) Then
TCaps = UCase(Left(CurrentWord, 1)) + _
LCase(Right(CurrentWord, Len(CurrentWord) - 1))
OutString = OutString + TCaps
CurrentWord = ""
End If
Next
TitleCaps = OutString
End Function
' #VBIDEUtils#**************
' * Programmer Name : Waty Thierry
' * Web Site : www.geocities.com/ResearchTriangle/6311/
' * E-Mail : waty.thierry@usa.net
' * Date : 29/06/99
' * Time : 13:54
' **************************
' * Comments : Changing Strings to Title Case
' *
' *
' **************************
Function TitleCaps(InString As String) As String
' *** Changing Strings to Title Case
' *** Useful for certain type of applications (eg for names and addresses etc),
' *** being able to Title Caps (ie capitalise first letter of each word) is achieved
' *** with the following function.
' *** However, be careful with names such as McDonalds as it will become Mcdonalds.
' *** Perhaps it would be a good idea to edit the code to cope with this...
' *** of applications (eg for names and addresses etc), being able to Title Caps
' *** (ie capitalise first letter of each word) is achieved with the following function.
' *** However, be careful with names such as McDonalds as it will become Mcdonalds.
' *** Perhaps it would be a good idea to edit the code to cope with this...
Dim OutString As String
Dim CurrentLetter As String
Dim CurrentWord As String
Dim TCaps As String
Dim StrCount As Integer
' *** Converts [instring] to Title Caps (as best it can!)
OutString = ""
If InString = "" Then
TitleCaps = ""
Exit Function
End If
CurrentWord = ""
For StrCount = 1 To Len(InString)
CurrentLetter = Mid(InString, StrCount, 1)
CurrentWord = CurrentWord + CurrentLetter
If InStr(" .,/\;:-!?[]()#", CurrentLetter) <> 0 Or _
StrCount = Len(InString) Then
TCaps = UCase(Left(CurrentWord, 1)) + _
LCase(Right(CurrentWord, Len(CurrentWord) - 1))
OutString = OutString + TCaps
CurrentWord = ""
End If
Next
TitleCaps = OutString
End Function
ASKER
Waty,
Not what I am after, I need sentence case not proper case
ps. You can use StrConv(string,vbProperCas e)
to do want you have done.
Not what I am after, I need sentence case not proper case
ps. You can use StrConv(string,vbProperCas
to do want you have done.
Forget it...
How do you know you're not dealing with a Language such as German, where all Nouns are capitalised?
Or with dutch where we do not capitalise anything, for example the fact that I am dutch or my first language is dutch.
Or with Scottish names like McDonalds where capitals appear in the middle of a word?
Or with some language that you've never seen before where you haven't got a clue whether a word is a name or any other type of word.
Even if the language is English, who's to say that the user is not talking about something that requires the inclusion of foreign language words? Such as "The Dutch word for difficult is moeilijk"
Etc... etc... etc...
How do you know you're not dealing with a Language such as German, where all Nouns are capitalised?
Or with dutch where we do not capitalise anything, for example the fact that I am dutch or my first language is dutch.
Or with Scottish names like McDonalds where capitals appear in the middle of a word?
Or with some language that you've never seen before where you haven't got a clue whether a word is a name or any other type of word.
Even if the language is English, who's to say that the user is not talking about something that requires the inclusion of foreign language words? Such as "The Dutch word for difficult is moeilijk"
Etc... etc... etc...
In French, the month name are not capitalized like in English
January = janvier
February = février
....
January = janvier
February = février
....
ASKER
I which case the word would not be capitalised. If you run Word spell checking on you sentence after you have change it from all capitals to Sentence Case then the words dutch and moeilijk are underlined as spelt wrongly. The difference however is that the suggestion for dutch is Dutch (i.e capitalise it) where as the suggestion for moeilijk is no suggestion or a different word.
I want to automatically replace words that should be capitalised and leave the rest. But you have no control with the word spell checker.
I want to automatically replace words that should be capitalised and leave the rest. But you have no control with the word spell checker.
ASKER
p.s. it is all english.
Also I know what ever solution is chosen it won't be perfect.
Also I know what ever solution is chosen it won't be perfect.
>you can not programatically control the spell checker
you can retrieve errors
I pasted your sentence (converted to lowercase) to Word97, created 'custom Writting Style' (only checks Capitalization) in Options, and used VBA:
Dim i As Integer, pr1 As ProofreadingErrors, msg As String
Set pr1 = Selection.Range.SpellingEr rors
For i = 1 To pr1.Count
msg = msg & pr1.Item(i).Text & vbCr
Next
MsgBox msg
It returned:
fred
i
london
Maybe this can be a start?
you can retrieve errors
I pasted your sentence (converted to lowercase) to Word97, created 'custom Writting Style' (only checks Capitalization) in Options, and used VBA:
Dim i As Integer, pr1 As ProofreadingErrors, msg As String
Set pr1 = Selection.Range.SpellingEr
For i = 1 To pr1.Count
msg = msg & pr1.Item(i).Text & vbCr
Next
MsgBox msg
It returned:
fred
i
london
Maybe this can be a start?
ASKER
Thanks ameba, just the trigger needed.
Here is the code I have used, any comments will be welcomed
Dim w As Word.Application, d As Word.Document, s As Object, lng As Long
Dim sugg As Word.SpellingSuggestions, r As Range
Set w = GetObject("", "Word.Application")
Set d = w.Documents.Add
lng = Len(Text1)
w.ActiveDocument.Content = Text1
w.ActiveDocument.Content.C ase = wdLowerCase
w.ActiveDocument.Content.C ase = wdTitleSentence
For Each s In w.ActiveDocument.SpellingE rrors
Set sugg = s.GetSpellingSuggestions
If sugg.Count > 0 Then
If StrComp(sugg(1), s.Text, vbTextCompare) = 0 Then
s.Text = sugg(1)
End If
End If
Next
Text1 = Left$(w.ActiveDocument.Con tent.Text, lng)
d.Close False
w.Quit
Set w = Nothing
Here is the code I have used, any comments will be welcomed
Dim w As Word.Application, d As Word.Document, s As Object, lng As Long
Dim sugg As Word.SpellingSuggestions, r As Range
Set w = GetObject("", "Word.Application")
Set d = w.Documents.Add
lng = Len(Text1)
w.ActiveDocument.Content = Text1
w.ActiveDocument.Content.C
w.ActiveDocument.Content.C
For Each s In w.ActiveDocument.SpellingE
Set sugg = s.GetSpellingSuggestions
If sugg.Count > 0 Then
If StrComp(sugg(1), s.Text, vbTextCompare) = 0 Then
s.Text = sugg(1)
End If
End If
Next
Text1 = Left$(w.ActiveDocument.Con
d.Close False
w.Quit
Set w = Nothing
Sounds good, I'll check it later.
Must go now and vote for President of my country.
(Also I know what ever candidate is chosen it won't be perfect. :)
Must go now and vote for President of my country.
(Also I know what ever candidate is chosen it won't be perfect. :)
' I hope it is OK.
' Text1(0) multiline=true, Text1(1)
Option Explicit
Private Sub Form_Load()
Text1(0).Text = "MY NAME IS FRED I AM A NERD. I HAVE NO FRIENDS IN LONDON, GREAT Britain." _
& vbCrLf & "Mcdonalds is in NEW york."
End Sub
Private Sub Command1_Click()
Text1(1).Text = XCase(Text1(0).Text)
End Sub
Public Function XCase(sInput As String) As String
Dim w As Word.Application, d As Word.Document
Dim r As Word.Range
Dim sugg As Word.SpellingSuggestions
Set w = GetObject("", "Word.Application")
Set d = w.Documents.Add
d.Range.LanguageID = wdEnglishUK
d.Content.Text = sInput
d.Content.Case = wdLowerCase
d.Content.Case = wdTitleSentence
For Each r In w.ActiveDocument.SpellingE rrors
Set sugg = r.GetSpellingSuggestions
If sugg.Count > 0 Then
If StrComp(sugg(1), r.Text, vbTextCompare) = 0 Then
r.Text = sugg(1)
End If
End If
Next
' Word will use vbCr instead of vbCrLf, and append one extra vbCr
' reverse this
XCase = Replace(w.ActiveDocument.R ange.Text, vbCr, vbCrLf)
If Right$(XCase, 2) = vbCrLf Then
XCase = Left$(XCase, Len(XCase) - 2)
End If
d.Close False
w.Quit
Set w = Nothing
End Function
' Text1(0) multiline=true, Text1(1)
Option Explicit
Private Sub Form_Load()
Text1(0).Text = "MY NAME IS FRED I AM A NERD. I HAVE NO FRIENDS IN LONDON, GREAT Britain." _
& vbCrLf & "Mcdonalds is in NEW york."
End Sub
Private Sub Command1_Click()
Text1(1).Text = XCase(Text1(0).Text)
End Sub
Public Function XCase(sInput As String) As String
Dim w As Word.Application, d As Word.Document
Dim r As Word.Range
Dim sugg As Word.SpellingSuggestions
Set w = GetObject("", "Word.Application")
Set d = w.Documents.Add
d.Range.LanguageID = wdEnglishUK
d.Content.Text = sInput
d.Content.Case = wdLowerCase
d.Content.Case = wdTitleSentence
For Each r In w.ActiveDocument.SpellingE
Set sugg = r.GetSpellingSuggestions
If sugg.Count > 0 Then
If StrComp(sugg(1), r.Text, vbTextCompare) = 0 Then
r.Text = sugg(1)
End If
End If
Next
' Word will use vbCr instead of vbCrLf, and append one extra vbCr
' reverse this
XCase = Replace(w.ActiveDocument.R
If Right$(XCase, 2) = vbCrLf Then
XCase = Left$(XCase, Len(XCase) - 2)
End If
d.Close False
w.Quit
Set w = Nothing
End Function
ASKER
Yeh I noticed the extra character appearing when I put the text back into VB.
JUST one more note, this is to be called probably >1 Million times. Getting the Word app and creating the document can be done once but is
d.Content.Text = sInput
d.Content.Case = wdLowerCase
d.Content.Case = wdTitleSentence
For Each r In w.ActiveDocument.SpellingE rrors
Set sugg = r.GetSpellingSuggestions
If sugg.Count > 0 Then
If StrComp(sugg(1), r.Text, vbTextCompare) = 0 Then
r.Text = sugg(1)
End If
End If
Next
' Word will use vbCr instead of vbCrLf, and append one extra vbCr
' reverse this
XCase = Replace(w.ActiveDocument.R ange.Text, vbCr, vbCrLf)
If Right$(XCase, 2) = vbCrLf Then
XCase = Left$(XCase, Len(XCase) - 2)
End If
the quickest code?
JUST one more note, this is to be called probably >1 Million times. Getting the Word app and creating the document can be done once but is
d.Content.Text = sInput
d.Content.Case = wdLowerCase
d.Content.Case = wdTitleSentence
For Each r In w.ActiveDocument.SpellingE
Set sugg = r.GetSpellingSuggestions
If sugg.Count > 0 Then
If StrComp(sugg(1), r.Text, vbTextCompare) = 0 Then
r.Text = sugg(1)
End If
End If
Next
' Word will use vbCr instead of vbCrLf, and append one extra vbCr
' reverse this
XCase = Replace(w.ActiveDocument.R
If Right$(XCase, 2) = vbCrLf Then
XCase = Left$(XCase, Len(XCase) - 2)
End If
the quickest code?
I think you don't need:
d.Content.Case = wdLowerCase
In Options define 'custom' writting style, check only Capitalization checkBox and ignore other spelling errors
If d Is Nothing Then
Set w = GetObject("", "Word.Application")
Set d = w.Documents.Add
d.Range.LanguageID = wdEnglishUK
w.Languages(wdEnglishUK).S pellingDic tionaryTyp e = wdSpelling
w.Languages(wdEnglishUK).D efaultWrit ingStyle = "Custom"
w.ActiveDocument.ActiveWri tingStyle( wdEnglishU K) = "Custom"
End If
------------
It is slow, few KB/minute.
1. What is the average size of your records (100-200 characters or many KB)?
2. Can you ignore MixedCase words (is Mcdonalds OK)?
d.Content.Case = wdLowerCase
In Options define 'custom' writting style, check only Capitalization checkBox and ignore other spelling errors
If d Is Nothing Then
Set w = GetObject("", "Word.Application")
Set d = w.Documents.Add
d.Range.LanguageID = wdEnglishUK
w.Languages(wdEnglishUK).S
w.Languages(wdEnglishUK).D
w.ActiveDocument.ActiveWri
End If
------------
It is slow, few KB/minute.
1. What is the average size of your records (100-200 characters or many KB)?
2. Can you ignore MixedCase words (is Mcdonalds OK)?
ASKER
I found that on making a second call the conversion to sentence case did not work that is why I added the conversion to lower case.
Average size about 30-40 words.
Average size about 30-40 words.
If this is a batch job, it will run 20 days (3K/minute, 200MB).
If Word97 automation is too slow, it is not big problem creating wordlist
You'll need:
; days of the week
; months
; holidays
; planets, stars, zodiac
; computers, programs, and languages
; names from Alice in Wonderland
; famous persons, deities and related adjectives
; misc. unique entities
; nationalities, languages, religions
; places and related adjectives
; some abbreviations
see this collection:
ftp://ftp.ox.ac.uk/pub/wordlists/
dirs: places, names, science, databases
If Word97 automation is too slow, it is not big problem creating wordlist
You'll need:
; days of the week
; months
; holidays
; planets, stars, zodiac
; computers, programs, and languages
; names from Alice in Wonderland
; famous persons, deities and related adjectives
; misc. unique entities
; nationalities, languages, religions
; places and related adjectives
; some abbreviations
see this collection:
ftp://ftp.ox.ac.uk/pub/wordlists/
dirs: places, names, science, databases
40 mil. words is about 40,000 distinct words
Maybe you can pass only these words once, instead of 1000 times more.
(1 hour instead of 500-1000 hours processing in MS Word)
Of course, this requires some programming:
Create Table with only one field 'Capword'
Initially, it will contain all words (lowercased), but after getting info from Word, only words with Capitals (e.g. only 10,000 words).
Hm, maybe table needs two fields 'Capword' and 'lcasedword' ...
Maybe you can pass only these words once, instead of 1000 times more.
(1 hour instead of 500-1000 hours processing in MS Word)
Of course, this requires some programming:
Create Table with only one field 'Capword'
Initially, it will contain all words (lowercased), but after getting info from Word, only words with Capitals (e.g. only 10,000 words).
Hm, maybe table needs two fields 'Capword' and 'lcasedword' ...
ASKER
Just found out that the free format text don't need to be converted to sentence case. Only names which can be done with strconv.
Nay way was interesting. Thanks for the help.
How many points do you want?
Nay way was interesting. Thanks for the help.
How many points do you want?
ASKER
Just found out that the free format text don't need to be converted to sentence case. Only names which can be done with strconv.
Any way was interesting. Thanks for the help.
How many points do you want?
Any way was interesting. Thanks for the help.
How many points do you want?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Adjusted points to 500
ASKER
Honest chap
Wow! Thanks!
ASKER
Well its almost Christmas!!!
:-)
It is for me now. Thanks!
It is for me now. Thanks!