JamesCbury
asked on
MS word bat script to determine if a file has track changes on and if there are changes
Hi Experts, I have about 50,000 word documents sitting on a file server as part of an application that I support. There was an issue where some of the files - no telling how many - we're saved with track changes on and unaccecpted changes to the document (I.e., the red text). I need to write a program that will examine each file and determine if A) track changes is enabled, B) if there are any unaccecpted changes, C) when the file was last modified, D) who modified it.
I'm thinking that this information is stored in the metadata of the file somewhere, or at least in the file properties. I believe a batch program would be the easiest way to do this with the output going to a .txt or .csv file. I would actually be more comfortable if there were a VBA program that I could run out of Excel that could do the analysis ( I'm pretty good with VBA)
Any suggestions would be much appreciated - I'm in kind of a crunch here.
- Jamey
I'm thinking that this information is stored in the metadata of the file somewhere, or at least in the file properties. I believe a batch program would be the easiest way to do this with the output going to a .txt or .csv file. I would actually be more comfortable if there were a VBA program that I could run out of Excel that could do the analysis ( I'm pretty good with VBA)
Any suggestions would be much appreciated - I'm in kind of a crunch here.
- Jamey
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks Phillip - that's good advice. I will make that update to my program for future use.
It may not be part of the official metadata, but Track Changes IS a saved flag. I'm not certain how this would be read in an older .doc Word file, but it's pretty easy in .docx. All of the newer Office formats that end in "x" are actually ZIP files that contain a variety of resources, so it's very easy to discover in the XML.
Basically, you'd write a script that unzipped the file (some 3rd party libraries might let you just decompress a single file into memory, which would be even faster). In the resulting "word" subfolder, you'll have 2 files that have the info you want - document.xml and settings.xml.
If Track Changes is enabled, then the settings.xml file will contain this text:
<w:trackRevisions/>
If there are revisions, then the document.xml file will contain a tag that starts with <w: and indicates the operation, like "<w:del", and it will have a "w:date" attribute, and a "w:author" attribute, like this:
<w:del w:id="1" w:author="Fooey Barress" w:date="2016-02-10T20:42:0 0Z">
You can use Xpath syntax to search for such tags using most major XML libraries.
The older pre-2010 formats use a proprietary binary format that would require a separate library to read, assuming the Office SDK doesn't give you the necessary info.
In any event, reading the data like this should give you a good idea without risking any modifications to the file AND it has the added bonus of not requiring an instance of Word running in memory to open/close all that stuff.
Basically, you'd write a script that unzipped the file (some 3rd party libraries might let you just decompress a single file into memory, which would be even faster). In the resulting "word" subfolder, you'll have 2 files that have the info you want - document.xml and settings.xml.
If Track Changes is enabled, then the settings.xml file will contain this text:
<w:trackRevisions/>
If there are revisions, then the document.xml file will contain a tag that starts with <w: and indicates the operation, like "<w:del", and it will have a "w:date" attribute, and a "w:author" attribute, like this:
<w:del w:id="1" w:author="Fooey Barress" w:date="2016-02-10T20:42:0
You can use Xpath syntax to search for such tags using most major XML libraries.
The older pre-2010 formats use a proprietary binary format that would require a separate library to read, assuming the Office SDK doesn't give you the necessary info.
In any event, reading the data like this should give you a good idea without risking any modifications to the file AND it has the added bonus of not requiring an instance of Word running in memory to open/close all that stuff.
ASKER
Open in new window