Link to home
Start Free TrialLog in
Avatar of Stephen Kairys
Stephen KairysFlag for United States of America

asked on

Software to count how many instances of words

For Win10

Hello,

I am looking for free software that will provide a report on how many times each word appears in a manuscript. e.g.
"the" 100
"is"     40
etc.

The file is in Word format but can be converted to PDF. If possible, I would prefer not uploading it to any site on the Web.

For those wondering, I'm trying to identify words that I overuse in my novel. :)

Thanks,
Steve
Avatar of John
John
Flag of Canada image

What about using Word, Advanced Find to find such words.  I have done this. Here is a resource to assist. I see more than one of such resources so I would stick with Word if I were you.

http://smallbusiness.chron.com/duplicate-words-ms-word-45105.html
You can try this http://download.cnet.com/Hermetic-Word-Frequency-Counter/3000-2079_4-10400779.html
It's free.

"This software scans an MS Word docx file or a text file (including HTML and XML files) with text encoded via ANSI or UTF-8 and counts the frequencies of different words. The words which are found and displayed can be ordered alphabetically or by frequency."
Avatar of Joe Winograd
Hi Steve,

Go to this page:
http://word.tips.net/T001833_Generating_a_Count_of_Word_Occurrences.html

Then scroll down to this section:
If you want to determine all the unique words in a document, along with how many times each of them appears in the document, then a different approach is needed. The following VBA macro will do just that.
I haven't tried it myself, but that macro looks very promising. Let me know how it goes for you. Regards, Joe

Update: That link is for older versions of Word. This is for newer ones:
http://wordribbon.tips.net/T010761_Generating_a_Count_of_Word_Occurrences.html
Avatar of Stephen Kairys

ASKER

Hi everyone.
You've all given me food for thought. No need for further responses from anyone else unless I say otherwise :)

Thanks.
@Joe

Do I need to copy my Word DOCX file to a macro-enabled doc to run this macro? Thanks.
There are many ways to enables macros, but yes, in order to run them, macros must be enabled. Btw, I just tested the macro in Word 2016 — worked perfectly! First, it gives this dialog:

User generated image
Then it creates a new doc with the word list and frequency (count of occurrences), sorted by either word or frequency. Here's a subset of actual results from a run:

21 in
19 cable
15 dog
11 cat
10 with
9  bag
8  usb
8  there
8  hdmi
7  power
7  on
7  cables
6  link
5  dock
5  after

Very cool! Regards, Joe
save the doc as html file

open the html file in firefox

find 'word' using Ctrl F and you will see the number of matches ;-)))
> find 'word' using Ctrl F and you will see the number of matches ;-)))

That's nice for a single word, but he wants to count the occurrences of all words in the doc.
OK, it's running, but at the top, I see it's "(Not Responding)". Is that behavior normal for a large document? (Apx. 106K words.) Thanks.
Yes! I tested it on a relatively small doc of five pages and 994 words. It sat on "Not Responding" for a minute or two. On a doc of 106K words, you may want to check it in the morning. :)  But just to make sure that it's working for you, test it on a two-page doc. Btw, what version of Office do you have?
Also, I don't notice any "in progress" display on the status bar. Then again, I'm not sure I'm even seeing the status bar itself...

User generated image
Thanks.
@Joe - Office 2016.
There is no progress bar...just a spinning circle until it finishes. Try it on a small doc so you can get a feel for it before turning it loose on the big guy.
Top four words for an apx. 1 page document. Considering the protagonist, here is a woman, not surprising "she" is #1! :)

29      she
19      had
16      her
10      was

Thanks. Will turn it loose on the "big guy" overnight. :)
Steve
Right, no surprise there. :)

Will be interesting to see if it croaks on the big guy. Some macros/programs/scripts work fine on small docs/files but die on big ones.
It's done! And the winners are....

2719      he
1803      she
1762      i
1743      her
1710      his
1687      you

7530  - if I recall correctly - unique words in a 106K doc. This exercise was fun! :)

Btw, what's odd is that the "Remaining" field, which seems to display the variable ttlwds started not at 106K but somewhere in the 120K's.  Even though the code says:

ttlwds = ActiveDocument.Words.Coun

Open in new window


Thanks,
Steve
That was much faster than I thought it would be!

You have a typo there...code says:
ttlwds = ActiveDocument.Words.Count

Open in new window

Anyway, same here. Word says my test doc has 994 words, but the Remaining field starts out in the 1,400 range. I can't explain that.
Good catch on the typo. Obviously a bad paste on my part. :)

Anyhow, I imported the analysis into Excel and the total is yet a third number;
90598

Accordingly, I spot-checked three words. The totals below list:
1. Data from running macro.
2. Data from replacing the word with itself (by typing the same word in the Find What and Replace With fields.

smiled: 344  344
you:       1687 1866
would:   329 336
<name of female lead character> 1056 1150
<name of male lead character> 1231 1358

Not sure what to make of the above. I'm guessing that if I don't need an exact word count, this macro has given me an idea of what words I may be overusing; however, I'm not sure how much I can trust the results...

Thanks,
Steve
It's possible that there's a bug in the code. It also depends on the definition of a "word" and how punctuation is interpreted in determining word boundaries, which is tricky. But my guess is that you didn't have Find whole words only ticked in the Find/Replace dialog — you need to click the More>> button to see the options. So a search for "you" also finds "your", "you're", "you'd", "you'll", "young", etc. Likewise, "would" finds "wouldn't" and "would-be". But there are no other words containing "smiled". I can't see if the theory holds for your male and female leads because I don't know their names. I haven't tested this on my end yet — will do that soon.

Update: I tested my doc on the word "port" and I think my guess above is correct — the macro is finding whole words only. Here are the macro's results:

2 displayport
1 passport
3 port
1 portable

A Find (or Find/Replace) in Word for "port" with the Find whole words only option ticked finds 3 (same as the macro), but with it un-ticked, finds 7 (same as the sum of all words that the macro finds containing "port").
Good thought. However, I do have the whole-words option checked. Maybe there is a bug....

So, at this point, I either have to find another method to count, or assume (risky word :) ) that the results are at least in the ballpark....

Thanks.
Just for kicks, search for another word (like "smiled") that is not a component of any other word (or hyphenated word).
Wow. You may have nailed it. Most of the issue seems to be b/c of Word, with the Macro also exhibiting a flaw...

Examples (where "Word" = word's find/replace)

"smiled": .  Word find/replace = 344. Macro = 344.

"again":  Word find/replace = 141. Macro = 141.

"would":  Word find/replace = 336.  Macro: "would" = 329, "would've" = 7. Total = 336..
> Word seems to include "would've" for whole words only which is WRONG.

"Nadine": Word find/replace = 1150.  Macro:  "Nadine" = 1056, "Nadine's" = 92 "Nadine'" = 2 (The last one is "Nadine", followed by a single apostrophe.)  Total = 1150.
> The "Nadine" + the single apostrophe is from two occurrences such as this.
   She said, "My parents named me 'Nadine' after an actress."
> For what it's worth the macro spells "Nadine" as "nadine'. Obviously, it ignores case to deal with instances such as first word of a sentence.

Conclusions:
1.  Word's whole-words-only seems to be fooled by apostrophes.
2. The macro does not now how to deal with a word enclosed in single quotes.

PS: Word apparently does not deall with hyphenated words properly. In the text below, searching on "drop" finds both "Drop" and the "Drop" in "Drop-down menu."

Drop
Down
Dropdown menu
Drop-down menu

That exercise was interesting! Nice debugging suggestion on your part! :) Thanks.
Steve
You should experiment with the other options, especially Find all word forms, Ignore punctuation characters, and Ignore white-space characters. For example, with Find all word forms ticked, Find (and Replace) finds "drop" and "drop-down", but not "dropdown", "'drop'", or "drop'" (note the single quote marks in those last two). With it un-ticked, it finds all of them, unless Find whole words only is ticked, in which case it finds all of them except "dropdown". Fascinating stuff!
I'll give the above a whirl.

Wow. Amazing how a creative work - a novel - has looped back to what I do in my day job - dealing with technology! :)
Bedside a word match, there is also an exact match
You and your will be counted as a match in you.
The single quote in would've ..... ...
The only way is to extract words based on space and commas, periods, coons, semi-colons, quotes ....
@Arnold, by "exact match" (which I don't see on my options screen) do you mean "Find whole words only"?

@Joe - I think I have a possible explanation why adding up all the word occurrences in Excel summed to a lower total than the true word count:

 ' Set up excluded words
    Excludes = "[the][a][of][is][to][for][by][be][and][are]"

Open in new window


Also, I'm wondering if this macro was saved in NORMAL.DOT. I see it in a document on which I never ran it.

Thanks.
have to, check think ms search has exact match comparison.
> Also, I'm wondering if this macro was saved in NORMAL.DOT

Probably (.DOTM). Click the Macros in drop-down, select Normal.dotm, and see if it shows up in the Macro name list:

User generated image
I wrote my own program (that I'm calling CountWords) to count the words so we'd have a confirmation on the accuracy of the macro's results. CountWords is not a macro, but a stand-alone program (written in AutoHotkey) that uses a Word COM (Component Object) call to get the text from the Word file. It doesn't perform any fancy parsing, but calls a simple RegEx (Regular Expression), namely:

\b\w+\b

If you're not familiar with RegEx, that means to match on any number of characters (w+) between a so-called "word boundary" (\b). It does not have any "Exclude" words and the matching is case insensitive.

CountWords points out two problems with the macro. First, the macro misses words that begin with a number, even if there are letters after the numbers. For example, it misses 0, 2016, 500gb, 1TB, 230w, etc. Second, the macro misses words that begin with the letter "z". I haven't looked at the macro's code (and I am not a VB/VBA expert), but these bugs are probably easy to fix.

CountWords puts the results in a CSV file so that it is easy to open in Excel and format the results. It puts in a first row with two column headers (WORD and COUNTS) and a last row with the TOTAL words. Let me know if you'd like to give CountWords a spin on your file. Regards, Joe
@Arnold - Thanks for the response.

@Joe - Wow, that's interesting. Intuitively, it makes sense it could ignore words starting with a number, but the 'z' thing is surprising. Without looking at the code I'm wondering if there is logic that says (using C-style code).

if ((first_letter >= 'a') && (first_letter < 'z')) // note the < instead of <=.

Is your program an EXE?
Thanks.
Update: I may have found the issue:

 'Out of range?
        If SingleWord < "a" Or SingleWord > "z" Then
            SingleWord = ""

Open in new window


I don't know VB; however, I'm guessing  that comparing a full word to the single character "z" means that any word starting with z is ignored. My hypothesis is that any word with a character after "z" is considered greater than "z". e.g.
"za" > "z"
"zap" > 'z"
etc.

Per an ascii table I found, ascii 122 is "z" and ascii 123 is "{" (left curly brace).
Accordingly, maybe the logic needs to read as follows?
        If SingleWord < "a" Or SingleWord >= "{" Then
            SingleWord = ""

Open in new window

[/code]

NOTE: The above is intended to fix the "z" issue only. it does not address the leading-digit issue.
> Is your program an EXE?

Yes. I actually built an installer/uninstaller for it, so you run a Setup.exe that installs the program, creates a Program Group with a shortcut to the program, and also creates an uninstaller so that you can do a proper uninstall via Programs and Features. It runs on 32-bit and 64-bit XP, Vista, W7, W8/8.1, and W10. It requires an Office install (either 32-bit or 64-bit is fine), but I haven't determined yet how modern it needs to be — I tested on Office 2016.

While testing, I found another bug in the macro. I ran it on War and Peace — big, to say the least! :) There was no progress (nothing in the status bar) after five minutes or so — it seemed to be hanging, and then I got this error dialog:

User generated image
The good news is that my program, which I decided to rename to WordCounts, handles it fine. Here are the results from War and Peace — took just 3:51 (that's mins and secs, not hrs and mins):

User generated image
When it starts up, it displays a Browse For File dialog and then this dialog box so that you know it's working and not hanging:

User generated image
After it accesses all the text in the Word file and figures out the word list, it gives an improved dialog box with percentage completion and the current word that it is processing:

User generated image
Of course, at the end, in addition to the "WordCounts finished successfully" dialog shown above, it creates the CSV file with the results. Regards, Joe
Thanks! I'll take a look at your program during the week.  (Need to concentrate on getting my latest draft to a couple of Beta readers.)

Just curious: How many words is "War and Peace?"? :)
> How many words is "War and Peace?"? :)

Look at the "WordCounts finished successfully" dialog that I posted.
WordCounts Version 1.1 Release Notes

o  Enhanced name of output file to contain name of input file. Naming convention is now:

<input file name>_WordCounts_YYYY-MM-DD_hh.mm.ss.csv

o  Enhanced title bar in Browse For File dialog to contain program name, Version number, and Build number.

o  Changed output file and percentage completion dialog to have all upper case words. Since the matches are case insensitive, it is not meaningful to show mixed case words.

o  Changed progress bar dialog not to be always on top. Since this means it can now be hidden behind other windows, added a taskbar icon for it.

o  Simplified system tray (notification area) icon to have just two context menu picks — Continue Running (default) and Stop Running.

o  Enhanced system tray icon (notification area) to show Version and Build numbers when hovering.
@Joe,
And you wrote Release Notes! Impressive. :)
Thanks!
Did you expect less? :)
Are you using a dictionary object, Joe?  If not, that should speed up the processing quite a bit.
> Are you using a dictionary object, Joe?

No. I'm using an AutoHotkey Object as an associate array with key/value pairs. It's my understanding that there is little advantage in a dictionary object over an AutoHotkey Object.

> If not, that should speed up the processing quite a bit.

Processing already seems pretty fast, although it never hurts to make it faster. :)  It's taking about four minutes to process the more-than-half-million words in War and Peace. Very small documents finish instantaneously, showing 0:00 elapsed time. I just tested a doc with 612 unique words in 1,072 total words and it took five seconds.
You can use VBS to test the speed.  You'll need to instantiate a Word.automation object and a scripting.dictionary object.
> You can use VBS to test the speed.

No I can't — I don't know VBS. :)
SOLUTION
Avatar of aikimark
aikimark
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
WordCounts Version 1.2 Release Notes

o  Added check for Word not being installed. It now exits gracefully with a nice error dialog (v1.1 exited ungracefully, to say the least).

o  Improved green progress bar movement while initially accessing the entire contents of the Word file.

o  Improved performance of the green progress bar while processing the words by updating it on every fiftieth word instead of every word. Words flashed so fast when doing every one that it was impossible to see the word, anyway. Experimentation showed that doing one every 50 made each word last about one second in the dialog (of course, that will vary by hardware).

o  Changed system tray (notification area) icon to an icon of a Word document.

o  Enhanced system tray (notification area) icon when hovering to show the current word being processed, its word number in the unique words list, the total unique word count, and the percentage done (i.e., the same info that is in the progress bar dialog box).

o  Enhanced the output CSV file with the same info that is in the "WordCounts finished successfully" closing dialog. Placed it in rows after the last word. For example, the last rows of the War and Peace CSV now contain this:

ZUBOV     2
ZUBOVA    3
ZUBOVSKI  2
ZUM       1
ZWECK     1

Input file: C:\temp\WAR AND PEACE.docx
Unique words: 17709
Total words: 576507
Begin time: 2016-09-19_14.01.32
End time: 2016-09-19_14.05.34
Elapsed time (minutes:seconds): 4:02

Open in new window


o  Digitally signed the WordCounts.exe and Setup_WordCounts.exe files with Symantec code-signing certificates for Microsoft Authenticode (dual-signed with SHA1 and SHA256).

o  Bug fixes.
Hi aikimark,

Thank you very much for running that timing test. It made me take a hard look at my code and I discovered a serious flaw. Version 1.3 fixes it.

I suspect we're working with the same Project Gutenberg document. After the fix in V1.3, I'm now at 69 seconds:

Input file: C:\temp\WAR AND PEACE.docx
Unique words: 17709
Total words: 576507
Begin time: 2016-09-19_21.53.08
End time: 2016-09-19_21.54.17
Elapsed time (minutes:seconds): 1:09

As with your VBS code, the Word doc is not open when the program runs. My time also includes displaying a green progress bar once a second showing the current word being processed (very helpful so the user knows the program is not hanging), as well as creating the CSV file. So our times are in the same ballpark now.

Thanks again for your efforts that led me to fixing the bug. Regards, Joe
You're welcome. 3:50 seemed a bit high.
Indeed, it was!
I might have something else for you, but it's late and my mind is a bit fatigued.  Tomorrow.
Same here...I'm about to pack it in for the night. Looking forward to your ideas tomorrow. Thanks, Joe
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Great stuff, aikimark! Ironically, I went in the other direction. I started out using the clipboard, as follows:

SaveClipboard:=ClipboardAll
oDoc:=ComObjGet(InputFile)
oDoc.Range.Copy()
oDoc.Close()
InputDoc[1]:=Clipboard
Clipboard:=SaveClipboard

Open in new window

But then I thought that using the clipboard might have two problems. First, performance — I was thinking that a direct assignment to the array variable would be faster than going through the clipboard. Second, I became concerned that on a very large file, the Get/Copy could take a while, and the user could clobber the clipboard in another program during that time. So I switched to a direct assignment, as follows:

oDoc:=ComObjGet(InputFile)
InputDoc[1]:=oDoc.Content.Text
oDoc.close()

Open in new window

At the time, I didn't see a measurable performance difference, but that was when I still had the huge performance flaw, which may have been overshadowing any difference between copying to the clipboard and doing a direct assignment. So I just went back and tested with the fixed code, but the timings of the clipboard approach and the direct assignment approach were identical — to the second! It's interesting that you're seeing such a big performance gain with the clipboard approach.

Thanks for your collaboration on this — it's been very helpful! Regards, Joe
I'm not a fan of using the clipboard.  It doesn't allow your applications to 'play well with others'.  It is possible that a new (2003) version of Word won't have the same poor performance as my test did accessing the content.text property.  I tried getting at the Selection.Text, Story.text, and direct-specification Range.text with similar poor performance.  I thought it was due to pagination.  Maybe it is something else.

It's nice to see that Autohotkey exposes access to the clipboard
> I'm not a fan of using the clipboard.

Same here!

> It's nice to see that Autohotkey exposes access to the clipboard

It has built-in variables Clipboard, which is used if the content is text, and ClipboardAll, which contains everything on the clipboard, such as graphics and formatting. It also has a ClipWait statement that can wait for either anything or just text to appear on the clipboard, and it has an optional seconds-to-wait param that limits the amount of time that it will wait (if omitted, it waits indefinitely).
It's been quite a while since I've used AutoHotKey or Kixtart
AutoHotkey has come a long way in recent years, such as support for objects (which may be used to implement indexed arrays, associative arrays, and even nested arrays), native COM (very powerful), robust GUI controls, and more. The language developers are now working on Version 2, which "aims to improve the usability and convenience of the language and command set by sacrificing backward compatibility." Only time will tell.

I never heard of KiXtart. Just tried to hit its website, but it is down now. Will try again later.
Great dialog, both of you! The technical concepts are over my head, but I love the collaboration and how this thread has taken on a life of its own!

@Joe - When you are done with the refinements, I'd like to run the program against my novel. How would you deliver it to me?

Thanks!
Yes, this has been a fantastic thread — one of the best ever in my six years on EE. I'll send you a download link in a PM, as I'm not ready (yet) to expose it to the entire Internet. I use a service called Cubby (from the LogMeIn folks) — discussed in this recent thread, if you're interested. I should be able to have Version 1.3 ready for you within the next hour or two. Regards, Joe
I hope that Kixtart is still alive.
It is also possible to create an add-in for Word that will do this with any open document.
@Joe, no rush. I'd probably run it on the novel - and just for fun - the embryonic sequel - Thursday. Thanks!
WordCounts Version 1.3 Release Notes

o  Fixed major bug in performance — a huge thanks to aikimark for discovering it.

o  Stopped using the clipboard.

o  Changed the progress bar dialog to update once every half-second.
> no rush

Already done. Will send you the link now.
This can also be solved using Powershell.
> I'd probably run it on the novel

How many words does Word report in the novel?
> It is also possible to create an add-in for Word that will do this with any open document.

Do you mean via VBA?

> This can also be solved using Powershell.

No doubt, since PS has COM support. But it's not in my wheelhouse.
Do you mean via VBA?
Yes.
Hey everyone. I have not forgotten about this question. Busy with  projects, and was away this weekend. Joe, I do plan to look at your tool; however, it may have to wait until after 10/5, per committing to submit my manuscript to a Beta reader by then.

Thanks.
> Joe, I do plan to look at your tool; however, it may have to wait until after 10/5, per committing to submit my manuscript to a Beta reader by then.

Steve, no rush here, but the entire testing process from start to finish should take less than five minutes. That includes downloading, installing, and running it on your manuscript — unless it's much bigger than War and Peace. :)

I'm working on enhancing WordCounts into a commercial program. During the process, beta testers found an interesting problem, viz., WordCounts was returning the text that included edits from Track Changes. For example, consider a doc with these Track Changes edits:

User generated image
WordCounts v1.0-1.3 get this text:

Hello worldWorld. Hello hello again. And now colour.

What it should really get is this:

Hello World. hello again. And now color.

In other words, it should get the text as if "Accept All Changes" had been done, or as it would appear with "Final" or "No Markup" selected — Word 2007 calls it "Final" (default is "Final Showing Markup"); Word 2010 calls it "Final" (default is "Final: Show Markup"); Word 2013 and 2016 call it "No Markup" (default is "All Markup"). Of course, the original source should NOT be changed (such as via "Accept All Changes") and it is preferable not to create a temp file. Fortunately, there's an easy solution with Word COM — set .ShowRevisions to False and close the file without saving changes.

The beta testers also requested a new feature — the ability to stay in the program to process additional files, rather than exiting and having to run it again.

Both of the above changes are in the new version, WordCounts v1.4. Here's what the new "finished successfully" dialog looks like, allowing the user to stay in the program:

User generated image
Clicking No exits the program, while Yes starts over, going back to the Browse-For-File dialog. Version 1.4 has been tested successfully with Word 2007, 2010, 2013, and 2016 in Windows 7, 8.1, and 10 (not all combinations thereof). All tested Windows are 64-bit and all Office are 32-bit.

I would do the WordCounts Version 1.4 Release Notes, but there are just those two items in it, so no need. Regards, Joe
@Joe -  I downloaded and ran on the sequel to my novel. I PM'd you with some feedback. Thanks!
OK, I'll head over to Messages.
WordCounts Version 1.5 Release Notes

(1) Prior versions treat the apostrophe (single quote) as a "word boundary" character, which breaks it into two words. For example, "don't" becomes two words — "don" and "t"; likewise, "cat's" becomes "cat" and "s"; "you're" becomes "you" and "re"; etc. Version 1.5 fixes this by not breaking words at either a standard apostrophe (') or the so-called "smart" right apostrophe (’), which Word may use depending on the settings in Options>Proofing>AutoCorrect>AutoFormat — Replace "straight quotes" with "smart quotes".

(2) There's a new option for ignoring words that are all numbers. The UI for this is very rough now — just a quick-and-dirty message box up-front:

User generated image
A future version will have a better UI.

(3) When processing is complete, you may press F10 to open the created spreadsheet, i.e., the CSV file, which is usually owned by Excel. The "finished successfully" dialog box reminds you of this feature:

User generated image
Regards, Joe
Very nice! Haven't tried it yet - busy weekend - but the F10 feature sounds really cool! :)
Thanks.
You're welcome. No rush here — test it whenever convenient for you. Regards, Joe
Mods - I see you reminded me about this question.  Thanks for the heads-up. I have not forgotten about it and will be crediting Joe Winograd in the near future.
WordCounts Version 1.6 Release Notes

(1) Prior versions treat commas in a number as a word boundary character, breaking it into multiple words. For example, "1,234" becomes "1" and "234"; likewise, "123,456,789" becomes "123", "456", and "789". V1.6 keeps it together as a single word.

(2) Prior versions treat the dollar sign and decimal point (period/dot) in a monetary value as word boundary characters, breaking it into multiple words. For example, "$12.34" becomes "12" and "34". V1.6 keeps it together as the single word "$12.34" (with both the dollar sign and the decimal point).

(3) Prior versions treat the "at" sign (@) and period/dot in an email address as word boundary characters, breaking it into multiple words. V1.6 keeps it together as a single word.

(4) Prior versions treat the colon, forward slash, and period/dot in web addresses as word boundary characters, breaking it into multiple words. V1.6 keeps it together as a single word.

(5) Prior versions treat the colon, backward slash, and period/dot in path/file names as word boundary characters, breaking it into multiple words. V1.6 keeps it together as a single word, unless there are spaces in it, which will be treated as word boundary characters. There's simply no way to tell with "C:\temp folder" if the folder name is "C:\temp folder" or if it is "C:\temp" and "folder" is a separate word.

(6) Prior versions treat a hyphen as a word boundary character, breaking it into two words. For example, "broad-shouldered", becomes "broad" and "shouldered". V1.6 keeps it together as a single word. However, if there are two hyphens in a row ("--"), V1.6 treats them as a word boundary, making the two words on either side of the double hyphens separate words. Note: Word's end-of-line hyphenation feature does not result in a word that contains the hyphen. For example, if Word splits "extremely" across two lines as "ex-" and "tremely", it is treated correctly as the word "extremely".

(7) The changes in items (1) and (2) raise the question of what it means to be an "all numbers" word. In addition to a word having just the digits 0-9, some users may want to consider commas, decimal points, and dollar signs as "numeric" features, but other users may not want that. I decided this issue is too complex (and too much away from the program's core purpose), and that a better approach is to put all of the "numeric" words in the spreadsheet and let each user decide what to keep and what to delete. Since numbers sort before letters, all-numeric words appear at the top of the spreadsheet and are easy to mass-delete if that's what a user wants. So V1.6 removes the option for ignoring all-numeric words that had been added in V1.5.

A comment about going forward with WordCounts. I'm terminating the fine-tuning for issues such as those recently addressed, e.g., email addresses, file paths, numeric entities, web addresses, etc. I have drifted too far from the original idea for the program, which is to act as an assistant in word usage for authors/writers. The core idea is to make it easy to see that "very" or "beautiful" is used far too many times in a manuscript. Also, if a character in your book is named "Phillip", but you accidentally call him "Philip", Word won't flag that, because both spellings are correct, but you'll see the problem when browsing through the spreadsheet (sorted, of course, alphabetically by the Word column). The current release addresses these primary requirements nicely, imo. Regards, Joe
Joe,
WOW! You've put a ton of effort into this program...and obviously enjoyed it! And, I like how one can use the program to find inconsistent spellings of a character's name.

Thanks!
Thanks for the compliment, Steve — I appreciate it! Yes, it's been a lot of fun! And I learned some new stuff along the way — always a good thing.
WordCounts Version 1.7 Release Notes

o  Fixed bug where the word "base" was missing in output file.

WordCounts Version 1.8 Release Notes

o  Fixed "base" bug in a better way.

o  Fixed bug where leading zeroes were being removed from words.

o  Fixed bug introduced in V1.7 where word boundary logic reverted to an earlier version.
Joe and Mods,
I'm still trying out the software Thanks.
No rush here.
Using the software and enjoying it. I will tag the appropriate comments as Solutions soon. (Joe, that last sentence was not so much for you but for the mods, as I have been told once or twice recently that this question needs attention. Seems like three days w/o a response is the trigger point for them notifying me.)

Thanks.
> Using the software and enjoying it.

I'm really glad to hear that!

> Joe, that last sentence was not so much for you but for the mods

Understood.
WordCounts Version 1.9 Release Notes

o  Improved performance significantly. For example, Frankenstein went from 27 seconds to 11 seconds, Ulysses from 295 seconds to 171, and War and Peace from 140 to 73.

o  Put the Version and Build in the title bar of all dialog boxes and in the summary section of the output CSV file (thanks to Steve for this idea).

o  Fixed recently reported Error 0x80010001 — more accurate to say that it's a possible fix, since I am unable to reproduce the problem (thanks to Steve for reporting this).

o  Switched from an expiring Symantec (VeriSign) code-signing certificate to a new DigiCert certificate (still dual-signed SHA1 and SHA256). This has no impact on functionality, but documenting it here so users are not surprised when seeing it (the Symantec digital signing was documented in the V1.2 Release Notes).

o  Added some error checking.


WordCounts Version 1.10 Release Notes

o  Added several crosscheck totals to detect potential problems (an improved implementation of how the error with the word "base" was discovered). Displays a message box if a crosscheck mismatch occurs, but processing continues.

o  Enhanced all error dialogs with a unique error code for each one so that WordCounts technical support can know exactly what error occurred.

o  Improved the green progress bar to display the percentage done (based on the number of words processed, not on run time). So instead of repeating the green progress bar in multiple passes, it now finishes in one pass, going from 0% to 100%.

o  Removed the "current word" display when hovering over the system tray icon, as this information is already in the progress bar dialog box.

o  Changed the green progress bar update interval from a half-second to a full second.

o  Increased the size of the progress bar window in order to accommodate a very long full file path/name.

o  Compiled with the latest release of AutoHotkey — Version 1.1.24.02.

o  Considering all of the changes to the main display window, here's what it looks like now:

User generated image
Thanks, Joe. Kudos for putting in so much hard work on this program!
You're welcome, Steve. And thanks to you for the idea of such a program...and the for the nice words...both appreciated! I really enjoyed developing it...and learned a lot during the effort. Regards, Joe
Joe, I'll try to download 1.10 this weekend. Once I run it on my novel, I'll close the question. Thanks.
WordCounts Version 1.11 Release Notes

o  Added code to distinguish between "begin time", which is before selecting a file, and "begin execution time", which is after selecting a file, and, if the file is password protected, after entering the correct password. The reason for this change is to have the "elapsed time" reflect the actual processing time, not including the time spent waiting for user input (either file selection or password entry).

o  WordCounts was designed to read the contents of a document without having to open Word. Prior versions succeed in that regard if a document is not password protected, but fail in that regard if a document is password protected. In the latter case, Word opens a blank document, and then presents the user with a Password dialog box that says, "Enter password to open file". WordCounts 1.11 fixes this, i.e., Word no longer opens a blank document — Word doesn't run visibly at all and doesn't present its Password dialog. Instead, v1.11 presents its own Password dialog box in the case of a password protected document, and then feeds the password under-the-covers to Word so that Word doesn't have to open. (A big thanks to Steve for flagging this issue — I had not tested with any password protected files and so did not run into this glitch.)

o  Compiled with the latest release of AutoHotkey — Version 1.1.24.03.
John,
I downloaded the latest version. Nice job on adding that dialog box to WordCounts to intercept Word's instance of it.

I'm going to PM you in a minute with more feedback. Thanks!
Hi Steve,
Joe here...not John. :)  I'll check your PM soon. Cheers!
OOPS! Thanks for setting my straight ! :)
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Joe and Mods,
I have not forgotten about you or this question...was busy during the holidays. I will download the latest version within a week. Thanks.

Happy New Year,
Steve
Steve,
Please be sure to test V1.12 on your files that have both an Open and a Modify password. It should process those files without having to launch Word visibly and without showing any Word dialogs — you should get just one "Password Required" dialog from WordCounts itself. Thanks, Joe
Joe,
Will do. Thanks!
You're welcome, Steve. And thanks to you for testing. I consider V1.12 to be a Release Candidate and am looking forward to your feedback on it. Regards, Joe
Joe, I just PM'd you per the password dialog. Thanks.
Steve,
I replied to your PM. But if you see this first, read my posts #a41934139 and #a41948149 — they have the answers to your issue. Regards, Joe
Oops...missed that one,. :) Thanks.
Joe,
I'm ready to close this question. What comment should I tag? Thanks.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Glad I could help, Joe.  You did all the heavy lifting.
> Do you want this one

No, Steve (at least, not that one alone). That won't give credit to the article because of a bug in the EE website — I can't begin to tell you how many points I've lost on that bug! Problem is, it doesn't count the link when it appears embedded in text (such as AutoHotkey). It counts the link only when it is explicit (such as https://www.experts-exchange.com/articles/18346/). The same bug exists for videos. FWIW, I've submitted bug reports for both:

Articles bug
Videos bug

But given that I submitted those bug reports more than two months ago, I don't know when, or even if, EE will ever fix the problem. Anyway, it's OK to select #a41934139 as one of the posts, but in order to credit the article, please select #a41979410 (or this one), too. Thanks!

> Glad I could help, Joe. You did all the heavy lifting.

Nice of you to say, aikimark, but I found your posts really helpful!
I'm the BASF of contributing experts :-)
There ya go — the BASF Award!
Joe,
Per an earlier post:

I said:  I'm ready to close this question.

You said: I presume this means that WordCounts V1.12 worked well for you — right?

Truth be told, my current novel editing phase doesn't require analyzing the word counts. Accordingly, all I can say is that it ran and spit out a spreadsheet. :) Do you need deeper verification?

In any event, if you OK the above, I will close as follows:
Your post: 41934139 - Best sol'n.
Your post: 41979410 - Assisted solution. (In all fairness,  I did not have a chance to read the article.)
Aikimark's posts 41805403 and 41806547: Assisted solutions.

Thanks!
> Do you need deeper verification?

Nope — as long as it created a spreadsheet that looks reasonable.

> I will close as follows

Perfect!
Joe, thank you for your persistence in listening to my feedback and improving the program.

Aikimark - thanks for your "behind the scenes" help.

What a great collaborative effort by you two!

Steve
You're welcome, Steve. And my thanks to you for the initial idea for the program, acting as a beta tester, and ideas for improvements along the way. Thanks, also, to aikimark for his help and ideas. This was a fun, educational project! Regards, Joe
Thanks for the points and kudos from both of you.
For anyone interested in this thread, I have an updated version (2.0) of WordCounts™. The Quick Start Guide for it is attached. Also, there's a new question on the same subject that may generate some interesting discussion:

https://www.experts-exchange.com/questions/29119103/Method-utility-to-list-how-many-times-specific-words-are-used-in-Word-document.html

Regards, Joe
WordCounts_v2.0_Quick_Start_Guide.pdf