Solved

Counting words with particular formatting

Posted on 2008-10-02
8
238 Views
Last Modified: 2010-04-21
Hi all,

I'm working with large Word documents which consist of text with a variety of formatting, including colours and font-weight. I'm writing a Macro which counts and displays the number of times particular words appear in this document, but I want to only count those occurrences which have particular formatting.

For example, I want to count the number of times the string "if" appears in a document where it is (a) a whole word, (b) in msBlue colour, (c) either begins a line or has only spaces before it, and (d) non-bold. Another example is that I want to find the string "[ADHOC]" where it is (a) a whole word, (b) in msBlue (c) is on a line of its own, and (d) bold. Occurrences of these words with any other formatting should not be counted.

Below is the code I have written so far for the "if" example. This counts the number of occurrences of the word "if" and displays it on the screen.

Any help with this would be appreciated.
Dim ifs As Integer

With ActiveDocument.Content.Find

    Do While .Execute(FindText:="if", Forward:=True, Format:=True, _

       MatchWholeWord:=True) = True

              

       ifs = ifs + 1

    Loop

End With

 

Response = MsgBox("'if' was found" _

& Str$(ifs) & " times.", vbOKOnly)

Open in new window

0
Comment
Question by:YouGov2008
  • 3
  • 3
  • 2
8 Comments
 
LVL 100

Expert Comment

by:mlmcc
ID: 22629839
I don't find anyway to add the formatting restrictions to the find command.

mlmcc
0
 
LVL 21

Expert Comment

by:EricFletcher
ID: 22630104
Add the format conditions for the Find as shown below. To manage the other conditions, you'll need to do more analysis for each found word.
With ActiveDocument.Content.Find

    .Font.Color = wdColorBlue

    Do While .Execute(FindText:="if", Forward:=True, Format:=True, _

       MatchWholeWord:=True) = True

'-- analyze the found content for your other conditions here and only increment the count if it passes

       ifs = ifs + 1

    Loop

End With

Open in new window

0
 

Author Comment

by:YouGov2008
ID: 22632197
Thanks Eric. That helps a bit. From what you've provided, I've added an additional line to also specify the bold/non-bold formatting as well, as seen in my latest version below. But it's the ability to count them only if they start a line (ignoring any tabs or spaces) in the case of the "if" string, or only counting them if they have a line to themselves (ignoring any tabs or spaces) in the case of the "[ADHOC]" string that I'm struggling with.

I don't feel I can allocate any points yet because this is definitely the hardest part of the task and I'm still stumped. Thanks for the help so far though.

Dim ifs As Integer

With ActiveDocument.Content.Find

    .Font.Color = wdColorBlue

    .Font.Bold = False

    Do While .Execute(FindText:="if", Forward:=True, Format:=True, _

       MatchWholeWord:=True) = True

                       

       ifs = ifs + 1

    Loop

End With

Open in new window

0
 
LVL 21

Expert Comment

by:EricFletcher
ID: 22637334
Hmm... well, you will need to add code to test for the other conditions before you increment the counter. The trick will be to figure out how to determine if a found item is on a line by itself, or is prefixed by "n" number of allowable characters (spaces, tabs, etc.) Perhaps a Select Case construct could do it.

If you tested the character before the found item ("if"), and it was a paragraph end or a new line character, you would know it started on a new line. If it was a space (or a tab?), you could continue to test for the next previous character, skipping any allowable characters until you came up with a non-allowed character or a paragraph or new line character.

I'm a bit fuzzy about how to manage this with Ranges, so I wasn't able to test it out myself.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 100

Expert Comment

by:mlmcc
ID: 22638036
I tried using the ^w to look for white space but it required a blank at the front or rear.

^p searches for paragraph marks.

Not sure if you can do that in code using the WORD search engine.

mlmcc
0
 
LVL 21

Accepted Solution

by:
EricFletcher earned 500 total points
ID: 22638345
You can do a lot more with the wildcard options of Find, but I assume you want a code solution.

Consider the following 3 paragraphs:

[ADHOC]
    [ADHOC]
This one is located within the [ADHOC] text

If you use Find with the Use wildcards option turned on, the following patterns will find as follows:

^0013(\[ADHOC\])^0013
-- finds [ADHOC] with a paragraph mark before and after (i.e. only #1 where it is on a line of its own)

(^0013)([ ]{1,})(\[ADHOC\])^0013
-- finds a paragraph mark followed by 1+ spaces, then [ADHOC] and another paragraph mark (i.e. only #2)

([ ^0013]{1,})(\[ADHOC\])^0013
-- finds the [ADHOC] string if it is preceded by at least one of a space OR a paragraph mark and followed by a paragraph mark (i.e. #1 and #2)

Note: The ^0013 finds the paragraph mark; the \ is needed because [ is used for defining patterns; the different parts of the expression are enclosed within parentheses for clarity (and to enable them to be dealt with separately in a Replace if needed).

0
 

Author Comment

by:YouGov2008
ID: 22650458
Hi Eric,

By coincidence, I've been working with Wildcards since posting my original request for help and I think that they might hold the key to the solution because you can incorporate wildcard searches into macros. As you assumed, I do indeed want a code solution because this needs to be run by people with minimal knowledge of such things.

I thought that maybe I could first run a script that replaced any occurrences of [ ][ADHOC] with [ADHOC] i.e. that made sure that [ADHOC] was always at the start of a line. The code I used for this is below...

Selection.Find.Execute Replace:=wdReplaceAll
    With Selection.Find
        .Text = "[ ]@(\[ADHOC\])"
        .Replacement.Text = "[ADHOC]"
        .Font.Bold = True
        .Font.Color = wdColorBlue
        .Forward = True
        .Wrap = wdFindContinue
        .Format = True
        .MatchCase = False
        .MatchWholeWord = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
    End With

This works, to an extent, but it is very frustrating because once you specify that you want to use Wildcards then the search string is automatically case-sensitive (you can't turn this off), and there is no way of me knowing the case of what has been written. For example, I am just as likely to find [Adhoc] or [AdHoc] or [adhoc] or any other combination, and I need to count all of them.

I could write something which converted all cases of [adhoc] to [ADHOC] but I would prefer this to be non-interventionary. I.e. All I want to do is count things, not change anything.

The other frustration with Word Wildcards is that, unlike proper RegEx, there is no syntax for "0 or more cases". {0,} is not accepted syntax for some reason. Looks like an oversight on the part of the program designers. Anyway, this is a problem because I want to be able to count occurrences that have "0 or more spaces before them". I guess I could count the number of [ADHOC] tags with no spaces and count the number with 0 or more spaces and then add them together, but it's annoying anyway.
0
 

Author Closing Comment

by:YouGov2008
ID: 31502327
No workable solution provided, though perhaps the task has no solution.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

You can of course define an array to hold data that is of a particular type like an array of Strings to hold customer names or an array of Doubles to hold customer sales, but what do you do if you want to coordinate that data? This article describes…
Using Word 2013, I was experiencing some incredible lag when typing.  Here's what worked for me....
The view will learn how to download and install SIMTOOLS and FORMLIST into Excel, how to use SIMTOOLS to generate a Monte Carlo simulation of 30 sales calls, and how to calculate the conditional probability based on the results of the Monte Carlo …
This Micro Tutorial well show you how to find and replace special characters in Microsoft Word. This is similar to carriage returns to convert columns of values from Microsoft Excel into comma separated lists.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now