Link to home
Start Free TrialLog in
Avatar of Mr_Fulano
Mr_FulanoFlag for United States of America

asked on

Iterating over and listing the Built-In Document Properties in MS Word 2010 usng C#

Hi, I'm using VS2010 C# and MS Word 2010.

My code below is a Windows Form with one button on it and a function called doIT(), which is where my code resides. When you click the button, I launch the code and view the result in the VS Output Window.  

The code also requires the following references:
-- Microsoft Office 14.0 Object Library
-- Microsoft Word 14.0 Object Library
-- System.Reflections

I've been working on an application that will read an MS Word document and return all its Built-In Document Properties.

The code at the end of this post works and although using "Microsoft.Office.Interop" DLLs is said to be slow, I'm not concerned about the speed of the code, but rather with its accuracy.

What I'd like to do now is find a way to iterate over each of the properties with something like a "foreach" loop, where the code can read the property's "Name" and also its "Value" - rather than having to hard code each and every possible document property one by one.

So, I would like to take a more elegant approach and do something like this below:

However, every time I attempt to take this approach it fails. I've tried numerous approaches, but it either errors out or produces no results whatsoever.

foreach (PropertyInfo prop in wordProperties )
{
               // Iterate through the document's properties and identify each of the property's "Name" and its "Value"

                Console.WriteLine(propertyName " :   " + propertyValue);
}

Open in new window

Below is my code thus far, which I now have working, but where I'd have to hard code each property one by one.

Thank you for your assistance!
Fulano

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using Microsoft.Office.Core;
using Microsoft.Office.Interop.Word;
using System.Reflection;

namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {
        private void doIT()
        {
            Microsoft.Office.Interop.Word.Application wordObject = new Microsoft.Office.Interop.Word.Application();     //create word app class object  
            
	    object file = @"\\Test Folder\TestDOC.docx";                                               //this is the path to file to open
            
            object nullobject = System.Reflection.Missing.Value;

            Microsoft.Office.Interop.Word.Document docs = wordObject.Documents.Open(file, nullobject, nullobject, nullobject, nullobject,
            nullobject, nullobject, nullobject, nullobject, nullobject, nullobject, nullobject, nullobject, nullobject, nullobject, nullobject);    //open the file 

            object wordProperties = docs.BuiltInDocumentProperties;

            Type typeDocBuiltInProps = wordProperties.GetType();
            
            try
            {
                //Author
                Object Authorprop = typeDocBuiltInProps.InvokeMember("Item", BindingFlags.Default | BindingFlags.GetProperty, null, wordProperties, new object[] { "Author" });
                Type typeAuthorprop = Authorprop.GetType();
                string strAuthor = typeAuthorprop.InvokeMember("Value", BindingFlags.Default | BindingFlags.GetProperty, null, Authorprop, new object[] { }).ToString();
                Console.WriteLine("Author: " + strAuthor);

                //Last Author (Last saved by)
                Object LastAuthorprop = typeDocBuiltInProps.InvokeMember("Item", BindingFlags.Default | BindingFlags.GetProperty, null, wordProperties, new object[] { "Last Author" });
                Type typeLastAuthorprop = LastAuthorprop.GetType();
                string strLastAuthor = typeLastAuthorprop.InvokeMember("Value", BindingFlags.Default | BindingFlags.GetProperty, null, LastAuthorprop, new object[] { }).ToString();
                Console.WriteLine("LastAuthor: " + strLastAuthor);

                //Revision Number
                Object RevisionNumberprop = typeDocBuiltInProps.InvokeMember("Item", BindingFlags.Default | BindingFlags.GetProperty, null, wordProperties, new object[] { "Revision Number" });
                Type typeRevisionNumberprop = RevisionNumberprop.GetType();
                string strRevisionNumber = typeRevisionNumberprop.InvokeMember("Value", BindingFlags.Default | BindingFlags.GetProperty, null, RevisionNumberprop, new object[] { }).ToString();
                Console.WriteLine("RevisionNumber: " + strRevisionNumber);
               

            }
            catch (Exception j)
            {
                Console.WriteLine(j.Message);
            }
           

            //docs.Close(WdSaveOptions.wdDoNotSaveChanges, nullobject, nullobject);
            ((Microsoft.Office.Interop.Word._Application)wordObject).Quit(WdSaveOptions.wdDoNotSaveChanges);

        }
        public Form1()
        {
            InitializeComponent();
        }
        private void btnGo_Click(object sender, EventArgs e)
        {
            doIT();
        }
    }
}

Open in new window

Avatar of Jacques Bourgeois (James Burger)
Jacques Bourgeois (James Burger)
Flag of Canada image

I do not have the time to test in .NET, but a quick inspection and test in the Word VBA window shows that the Document object has a BuiltInDocumentProperties property, a collection that returns all the properties, with their names, value, etc.

You would probably be able to do what you want by looping through docs.BuiltInDocumentProperties.
Avatar of Mr_Fulano

ASKER

Hi Jacques, I tried your suggestion, which was one of the approaches I had taken originally but it has failed. The error I got when I tested your suggestion is below:

foreach statement cannot operate on variables of type 'Microsoft.Office.Interop.Word.Document' because 'Microsoft.Office.Interop.Word.Document' does not contain a public definition for 'GetEnumerator'      

The issue I'm having, which I can't get around is that it seems that you cannot iterate through an object like Microsoft.Office.Interop.Word.Document.

Any suggestions?

Thank you,
Fulano
Hi Jacques, I also tried the following code below. I don't know what I'm doing wrong, because "docs.BuiltInDocumentProperties" is a collection (at least I think it is), so I should be able to iterate through it, but this code throws and exception as shown below:

Error = Cannot convert type 'System.__ComObject' to 'System.Reflection.PropertyInfo'

 int i = 1;
                foreach (PropertyInfo property in docs.BuiltInDocumentProperties)
                //foreach (PropertyInfo property in typeDocBuiltInProps.GetProperties())
                {
                    Console.WriteLine(i + "  " + property.ToString());
                        i++;
                }

Open in new window


Thank you for your help,
Fulano
Dear Mr Fulano,
I think key lies in the below line of code, I am yet to find a solution, but just sharing my thoughts,
Object Authorprop = typeDocBuiltInProps.InvokeMember("Item", BindingFlags.Default | BindingFlags.GetProperty, null, wordProperties, new object[]{"Author"});
Since we are using reflection and invoking the member, Instead of passing "Author" as the last argument, somehow if we can ask the built in properties to return all properties, so that the returned values contain all the properties. That is something like if we can pass new object[]{"*"}, I am just thinking loud in layman terms. Exact syntax I need to find out. But I think this is the direction I would think on...
Or other way is if we can pass the last value dynamically (like string "Author"), but for that we need to what values it can take, I am not sure if we know what values it can take.
Thanks,
Karrtik
ASKER CERTIFIED SOLUTION
Avatar of Karrtik Iyer
Karrtik Iyer
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I have a little more time tonight, so I could go on a little digging trip.

First of all, if you look carefully at your error message, the class PropertyInfo that you use is in System.Reflection. This is a namespace used internally in the framework to analyze the content of a .NET assembly. So you surely do not have the right class there.

If you dig into the Word VBA documentation, you see that what BuiltInDocumentProperties contains DocumentProperty objects, so this is what you should use to iterate instead of PropertyInfo.

I first try to write a method that works VBA. VBA is closer to Word, so its easier to spot mistakes in your logic on in the classes/properties/methods that you use. It took me less than 2 minutes and I had the thing in my pocket.

Back to C#, where the fact that we are talking to Word through an interop sometimes makes things a little more complicated. And its the case here, at least on my installation.

Here is a direct translation of my code in C#

foreach (DocumentProperty property in docs.BuiltInDocumentProperties)
{
	Console.WriteLine(property.Name + " - " + property.Value);
}

Open in new window


Unfortunately, it does not work. All the other classes from Word are OK, I can explore them in the debugger, but DocumentProperty throws an error.

DocumentProperty is recognized by the editor (once you declare a using clause for Microsoft.Office.Core). The whole thing compiles without problems. But I get a runtime error when I hit the declaration of the DocumentProperty object, telling me that this interface is not supported. And true, if I check my computer registry, it is not registered. It should be for a COM class.

Sorry, but I do not have time to go looking for why my copy of Office (installed on a brand new hard drive 2 weeks ago) does not seem to have registered all its classes.

I hope it works on your side.
Hi Jacques, first and foremost, thank you for all your help. I tried your last solution, which I had tried before, but it gave me the same error as the first time I tried it.

Error = Unable to cast COM object of type 'System.__ComObject' to interface type 'Microsoft.Office.Core.DocumentProperty'

I think we're close with the first Array solution you provided, but I'm throwing and exception. I need to tinker with your Array solution a little more, but I think I may be getting close.

I can iterate through the collection and get the fields, but I'm getting stuck on the values part.

Let me tinker a bit more and I'll share with you what I found so far. - You definitely have me going in the right direction. I'm also not sure why the last solution you provided works for you, but not for me. Are you using Word 2010 or another version?
My solution works when I code it in VBA, where the thing is not impaired by the interop an the conversion from COM to .NET.

But It does not work in .NET, either in VB or C#, because the DocumentProperty class is not registered in a way that the interop can work with it. COM classes need to be registered in the Windows registry, and that one is not.

I did a lot or research to make that thing work, but did not found a solution. The reason that we see most often stated for that error message is a conflict between multiple versions of Office. This is not the case for me. My hard disk crashed at the beginning of last month, and I had to reinstall everything from scratch. Word 2007 is the only version that ever ran on that new installation.

Sorry, I can not be of more help. I need to prepare a custom training session for next week and won't have more time to put on it.
Dear Mr Fulano,
The best way is to store the properties in an string array like below,  and iterate it in a for loop and call invoke the earlier code without hard coding the property name. As an alternative you can read this list of properties from a config file so that the below string array is built at run time from the config file.
string [] properties = { "Title", "Subject", "Author",
                  "Keywords", "Revision Number",
                  "Creation Date", "Last Save Time" };
     foreach (var value in properties )
                {
                    string propname = value.ToString();//Enum.GetName(typeof(WdBuiltInProperty), value);
                    Object eachproperty = typeDocBuiltInProps.InvokeMember("Item", BindingFlags.Default | BindingFlags.GetProperty, null, wordProperties, new object[] { value });
                    Type typeprop = eachproperty.GetType();

                    string propvalue = typeprop.InvokeMember("Value", BindingFlags.Default | BindingFlags.GetProperty, null, typeprop, new object[] { }).ToString();        //get property value

                    Console.WriteLine("The document's propvalue is : " + propvalue);

                   
                }
Hi Jacques, thank you for all your help.

Fulano
Hi Karrtik, I like your approach to use an array, but the only problem with this solution is that I would need to list each property individually that I wanted to use - exactly how it's found in the collection. I wanted to avoid that, because it opens me up to errors and to variations from version to version.

I did find a way to pull extract all the property names, by using the Substring() method, but it fails when it finds an empty (null) property providing the following error:  -- 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll

Below is the code I'm using, which works IF the properties are populated (not null). My quest now is to avoid the code throwing the exception error.

As I mentioned, I used the Substring() method to trim off the "wdProperty" part of the internal property name (i.e. wdPropertyTitle, wdPropertySubject, wdPropertyAuthor, etc, ) and then I also added quotation marks to my Substring text. (This approach works, but fails on null values).

I need a way to avoid the null value problem and that will solve it.

foreach (var value in values)
                {
                    string propname = value.ToString();
                    string subPropname = propname.Substring(10).ToString();
                    
                    Object eachproperty = typeDocBuiltInProps.InvokeMember("Item", BindingFlags.Default | BindingFlags.GetProperty, null, wordProperties, new object[] { "\"" + subPropname + "\"" });
                    
                    Type typeeachproperty = eachproperty.GetType();

                    string propvalue = typeeachproperty.InvokeMember("Value", BindingFlags.Default | BindingFlags.GetProperty, null, eachproperty, new object[] { }).ToString();

                    Console.WriteLine("The document's propname is : " + propname);
                    Console.WriteLine("The document's subPropname is : " + subPropname);
                    Console.WriteLine("The document's propvalue is : " + propvalue);
                    
                }

Open in new window


Thank you,
Fulano
Hi Karrtik, the part of the code will give you all the properties. This is the part I have working. The other part is giving me a null reference as explained in my prior post above.

foreach (var value in values)
                {
                    string propname = value.ToString();
                    string subPropname = propname.Substring(10).ToString();
                    

                    Console.WriteLine("The document's propname is : " + propname);
                    Console.WriteLine("The document's subPropname is : " + subPropname);
                 
                    
                }

Open in new window

Dear Mr Fulano, I would think of putting try catch block for that one line of code that gives error when the property doesn't exist. And also catch that particular exception (not all exceptions) which is thrown when that property does not exist, so that I can ignore that particular property and continue with the rest.
Thanks,
Karrtik
Karrtik, I have some bad news...theArray code doesn't work at all. - At first I thought it would, but the more I tinkered with it the more I realized its the reaching into the Word Doc at all. In fact, if you break the programatic link to the MS Word document in the code above, the Array will continue to list all the WdBuiltInProperties. So, if its listing the properties without a link to the document, its not reaching into the document and pulling out the field names at all, its simply listing out a predefined list of properties that may or may not be in the subject document at all.

Back to square one...

The only way to make the code work is if you provide it the field name (i.e Author, Subject, etc.). The trick would be to have the code reach in, pull out all the properties it finds, and then get the values for those properties.

Fulano
Dear Mr Fulano,
I tested your code,  I found a small problem in the code, I corrected it. And then it worked. There is no need to put property name in double quotes. ("\"" + subPropname + "\""). Here is the code that I got to work. I have also attached screenshot of my console program and the word document property window. However still there are some properties such as wdPropertyPages, wdPropertyWords, etc.. for which you might still not get the property key by just trimming the wdProperty part. Like after trimming the wdPropertyPages becomes Pages but actual property name is Page Count, similarly word count, etc.. Hence I suggest you to consider my other alternative of storing the array of properties (which I listed in your previous question's answer) as config file and reading that array instead of taking this trimming of enum name approach. You can still keep the code of iterating over the array, only the array now changes to the strings read from config file.
Here is the code that I made to work:
                foreach (var value in values)
                {
                    string propname = value.ToString();
                    string subPropname = propname.Substring(10).ToString();
                    string propvalue;

                    try
                    {
                        Object eachproperty = typeDocBuiltInProps.InvokeMember("Item",
                            BindingFlags.Default | BindingFlags.GetProperty, null, wordProperties,
                            new object[] {subPropname});

                        Type typeeachproperty = eachproperty.GetType();

                        propvalue =
                            typeeachproperty.InvokeMember("Value", BindingFlags.Default | BindingFlags.GetProperty, null,
                                eachproperty, new object[] {}).ToString();
                    }
                    catch (Exception j)
                    {
                        Console.WriteLine("Unable to find the property for this document: " + propname);
                        continue;
                    }

                    Console.WriteLine("The document's propname is : " + propname);
                    Console.WriteLine("The document's subPropname is : " + subPropname);
                    Console.WriteLine("The document's propvalue is : " + propvalue);

                }

Open in new window

MrFulano_Iterate_WordProperties.png
Hi Karrtik, you're correct...the quotation marks were an error of mine that I originally posted and later caught in my code, but forgot to repost that I had changed that. Very sorry if that confused you.

Now, moving forward, your catch section Console.Writeline () code is a good addition. It make the application flow completely through the properties and not abort on each exception error - I like that.

That said, the application * should * extract all the properties that are available, and it does not. -- It misses essential properties that every document has like "Time Created", "Time Last Saved", "Last Saved By", "Revision", and so on. So the code is not matching the correct property names with the properties within the - actual -  document.

What I think is happening is that the Array is getting the property names of "typeof(WdBuiltInProperty)", but its not getting them from inside the actual document, its simply listing them from a lookup table. You can proved that by breaking the link to the MS Word document and running the code again. Without making a link to the document, you'll still be able to list out all the property names.  

So, keeping that in mind, when the code runs, it try to match the property names it got from the array with the actual "inner document" property names and its a "hit and miss" type of situation...some match and you get a return value, some don't match and those error out.

I believe that the real solution is to make the code use System.Reflections to look -- inside the actual document -- and pull out all the property names that it can find - first. Put those property names within an array and then, we can use those to get the property values within the document.  

If you'd like, I can close out this question and award you the points and then I can open another question that's more specific to our current problem, that way you get more point, because you're actually answering additional questions within the same question...although the solution is really not working fully - only partially.

Let me know if moving this to another question is agreeable with you, or if you want to continue in this one.

Fulano
Dear Mr Fulano, I'm fine moving to another question since the problem we are targeting is little different. I shall keep trying for another generic solution for this problem without having to store the property names in a config file.
Thanks,
Karrtik
Hi Karrtik, I posted a new question with the following title:

Reading an MS Word 2010 document (docx) and return all of the document's "Built-In Document Properties."

Therein I provided an in-depth explanation of the status of the application thus far. You may want to read the entire question to make sure we're on the same page.

Thank you very much for your help thus far and I look forward to working with you on the next part.

Fulano
This was a solution to the first problem which lead to another issue, which we are working on next.

Thank you for your help.