Link to home
Start Free TrialLog in
Avatar of Mr_Fulano
Mr_FulanoFlag for United States of America

asked on

Reading an MS Word 2010 document (docx) and return all of the document's "Built-In Document Properties."

Hi, I'm using VS2010 C# and MS Word 2010.

My code below is a Windows Form with one button on it and a function called doIT(), which is where my code resides. When you click the button, the code launches and you can then view the result in the VS Output Window.  

The code also requires the following references:
-- Microsoft Office 14.0 Object Library
-- Microsoft Word 14.0 Object Library
-- System.Reflections

Description:

I've been working on an application that will read an MS Word document (docx) and return all of the document's "Built-In Document Properties."

The code at the end of this post works and although using "Microsoft.Office.Interop" DLLs is said to be slow, I'm not concerned about the speed of the code, but rather with its accuracy and completeness of the results. Thus, I need to get -- all -- the property values, not just some of them - otherwise the application is useless to me.

What I'm doing is iterating over each of the "WdBuiltInProperty" entries with a "foreach" loop and using the property's "Name" to get the property "Value" - rather than having to hard code each and every possible document property one by one. This I believe is a more elegant approach.

However, although the code works (partially), it fails on some of the properties, while working on others. I've tried numerous approaches, but it usually errors out on the property that it cannot fetch.

What I think is actually happening is that the Array in my code is getting the property names of "typeof(WdBuiltInProperty)", but its not getting them from inside the actual document, its simply listing them from what I believe is some sort of look-up table. You can prove that by breaking the link to the MS Word document and running the code again (with a few minor modifications), so without making a link to the document, you'll still be able to list out all the property names.  

So, keeping that in mind, when the code runs, it try to match the property names it got from WdBuiltInProperty, which are now in the array with the actual "inner document" property names and that becomes a "hit and miss" type of situation...some match and you get a return value, and some don't match and those error out with exceptions as shown below:

-- A first chance exception of type 'System.NullReferenceException' occurred in MetaDatareader_with_Loop.exe
-- A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll

Based on these exceptions, there may be more than one problem here...

I believe that the real solution is to make the code use System.Reflections to look -- inside the actual document -- and pull out all the property names that it can find. Then to put those property names into an array and then, we can use those to get the property values within the document.  

One thing that is also interesting is that some properties, like the created date and the last saved date and the revision number are all not being retrieved. The fact of the matter is that my document, and I would presume all MS Word documents will have these properties by default. My document definitely does, but I'm getting an error that says

========================================================================
The proerty value for wdPropertyTimeCreated could not be extracted.
===========================================================================
The proerty value for wdPropertyTimeLastSaved could not be extracted.
===========================================================================
etc...


So, my question is...how do I get the property names that are actually in the document.

Thank you for your help,
Fulano


Below is an example of my output results:
===========================================================================
Description Properties:
===========================================================================
1). The document's propname is    : wdPropertyTitle
2). The document's subPropname is : Title
3). The document's propvalue is   : Screenwriting Fundamentals

===========================================================================
1). The document's propname is    : wdPropertySubject
2). The document's subPropname is : Subject
3). The document's propvalue is   : Tutorial Document Collection

===========================================================================
1). The document's propname is    : wdPropertyAuthor
2). The document's subPropname is : Author
3). The document's propvalue is   : John Smith

===========================================================================
A first chance exception of type 'System.NullReferenceException' occurred in MetaDatareader_with_Loop.exe
A first chance exception of type 'System.NullReferenceException' occurred in MetaDatareader_with_Loop.exe
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
The proerty value for wdPropertyKeywords could not be extracted.
===========================================================================
The proerty value for wdPropertyComments could not be extracted.
===========================================================================
1). The document's propname is    : wdPropertyTemplate
2). The document's subPropname is : Template
3). The document's propvalue is   : Normal.dotm

===========================================================================
The proerty value for wdPropertyLastAuthor could not be extracted.
===========================================================================
The proerty value for wdPropertyRevision could not be extracted.
===========================================================================
The proerty value for wdPropertyAppName could not be extracted.
===========================================================================
The proerty value for wdPropertyTimeLastPrinted could not be extracted.
===A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
========================================================================
The proerty value for wdPropertyTimeCreated could not be extracted.
===========================================================================
The proerty value for wdPropertyTimeLastSaved could not be extracted.
===========================================================================
The proerty value for wdPropertyVBATotalEdit could not be extracted.
===========================================================================
The proerty value for wdPropertyPages could not be extracted.
===========================================================================
The proerty value for wdPropertyWords could not be extracted.
===========================================================================
The proerty value for wdPropertyCharacters could not be extracted.
===========================================================================
1). The document's propname is    : wdPropertySecurity
2). The document's subPropA first chance exception of type 'System.NullReferenceException' occurred in MetaDatareader_with_Loop.exe
A first chance exception of type 'System.NullReferenceException' occurred in MetaDatareader_with_Loop.exe
A first chance exception of type 'System.NullReferenceException' occurred in MetaDatareader_with_Loop.exe
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
name is : Security
3). The document's propvalue is   : 0

===========================================================================
The proerty value for wdPropertyCategory could not be extracted.
===========================================================================
The proerty value for wdPropertyFormat could not be extracted.
===========================================================================
The proerty value for wdPropertyManager could not be extracted.
===========================================================================
1). The document's propname is    : wdPropertyCompany
2). The document's subPropname is : Company
3). The document's propvalue is   :

===========================================================================
The proerty value for wdPropertyBytes could not be extracted.
===========================================================================
The proerty value for wdPropertyLines could not be extracted.
======================================A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
=====================================
The proerty value for wdPropertyParas could not be extracted.
===========================================================================
The proerty value for wdPropertySlides could not be extracted.
===========================================================================
The proerty value for wdPropertyNotes could not be extracted.
===========================================================================
The proerty value for wdPropertyHiddenSlides could not be extracted.
===========================================================================
The proerty value for wdPropertyMMClips could not be extracted.
===========================================================================
The proerty value for wdPropertyHyperlinkBase could not be extracted.
===========================================================================
The proerty value for wdPropertyCharsWSpaces could not be extracted.
===========================================================================
The program '[6516] MetaDatareader_with_Loop.vshost.exe: Managed (v4.0.30319)' has exited with code 0 (0x0).

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using Microsoft.Office.Core;
using Microsoft.Office.Interop.Word;
using System.Reflection;
using System.IO;

namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {

        public Form1()
        {
            InitializeComponent();
        }
        private void btnGo_Click(object sender, EventArgs e)
        {
            doIT();
        }
      

        private void doIT()
        {
            string propname = "";
            string subPropname = "";
            string propvalue = "";

            #region (Object set up code)

            Microsoft.Office.Interop.Word.Application wordObject = new Microsoft.Office.Interop.Word.Application();     //create word app class object  
            
	    object file = @"\\Test Folder\TestDOC.docx";                //this is the path to file to open
            object nullobject = System.Reflection.Missing.Value;

            Microsoft.Office.Interop.Word.Document docs = wordObject.Documents.Open(file, nullobject, nullobject, nullobject, nullobject,
            nullobject, nullobject, nullobject, nullobject, nullobject, nullobject, nullobject, nullobject, nullobject, nullobject, nullobject);  //open the file 

            object wordProperties = docs.BuiltInDocumentProperties;
            Type typeDocBuiltInProps = wordProperties.GetType();

            Console.WriteLine("");
            Console.WriteLine("===========================================================================");
            Console.WriteLine("Description Properties:");
            Console.WriteLine("===========================================================================");

            #endregion (Object set up code)

            
            #region (Array code)


            Array values = Enum.GetValues(typeof(WdBuiltInProperty));

            foreach (var value in values)
            {

                try
                {

                    propname = value.ToString();
                    subPropname = propname.Substring(10).ToString();

                    Object eachproperty = typeDocBuiltInProps.InvokeMember("Item", BindingFlags.Default | BindingFlags.GetProperty, null, wordProperties, new object[] { subPropname });
                    Type typeeachproperty = eachproperty.GetType();



                    propvalue = (string)typeeachproperty.InvokeMember("Value", BindingFlags.Default | BindingFlags.GetProperty, null, eachproperty, new object[] { }).ToString();

                    Console.WriteLine("1). The document's propname is    : " + propname);
                    Console.WriteLine("2). The document's subPropname is : " + subPropname);
                    Console.WriteLine("3). The document's propvalue is   : " + propvalue);
                    Console.WriteLine();

            #endregion (Array code)
                    
                }


                catch (Exception j)
                {
                    Console.WriteLine("The proerty value for " + propname + " could not be extracted.");
                    
                }

                Console.WriteLine("===========================================================================");

            }
        }
        
        
       

    }
}

Open in new window

Avatar of Miguel Oz
Miguel Oz
Flag of Australia image

Please try this code: (replace lines 56 to 95 in your code)
DocumentProperties wordProperties = (DocumentProperties)docs.BuiltInDocumentProperties;
var values = Enum.GetValues(typeof(WdBuiltInProperty));

foreach (var value in values)
{

    try
    {
	propname = value.ToString();
        subPropname = propname.Substring(10).ToString();
        string wpValue  = wordProperties[value]].Value.ToString();

        Console.WriteLine("1). The document's propname is    : " + propname);
        Console.WriteLine("2). The document's subPropname is : " + subPropname);
        Console.WriteLine("3). The document's propvalue is   : " + wpValue);
        Console.WriteLine();
    }


    catch (Exception j)
    {
        Console.WriteLine("The property value for " + propname + " could not be extracted.");
    }
}

Open in new window

Notice that:
1) I cast the doc properties to DocumentProperties directly as it allows direct use of enum values.
2) Please check that document is a native Word 2010 doco. (e.g. not a word 2003 doco)
Avatar of Mr_Fulano

ASKER

Hi Miguel, I tested your code, but it errored-out.

To get it to run, I had to make the following modification:

--  I removed an extra right square bracket at the following line:

string wpValue  = allWordProperties[value]].Value.ToString(); -- The square bracket after the word value.

-- I also commented out the following statement in my code:

     //object wordProperties = docs.BuiltInDocumentProperties;   (line 45)
    //Type typeDocBuiltInProps = wordProperties.GetType();          (line 46)


Error message:

A first chance exception of type 'System.InvalidCastException' occurred in Unknown Module.

Unable to cast COM object of type 'System.__ComObject' to interface type 'Microsoft.Office.Core.DocumentProperties'. This operation failed because the QueryInterface call on the COM component for the interface with IID '{2DF8D04D-5BFA-101B-BDE5-00AA0044DE52}' failed due to the following error: No such interface supported (Exception from HRESULT: 0x80004002 (E_NOINTERFACE)).

The line that threw the error was this one below:

DocumentProperties wordProperties = (DocumentProperties)docs.BuiltInDocumentProperties;

Any thoughts?

Thanks for your help,
Fulano
Apologies, my code is only applicable to VSTO based documents.
In the case of your Win Form application, sadly there is a limitation of using reflection combined with that not all properties are defined for the word document.  (Meaning reflection will throw an exception if property is not found or can not be extracted)
Please check your word document properties for information of which properties are defined. (http://www.addintools.com/documents/office/where-file-properties.html)
Note: Notice that you can define your own custom properties but I never tried from Win forms.
ASKER CERTIFIED SOLUTION
Avatar of Fernando Soto
Fernando Soto
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Excellent solution. Very well designed and very elegant, which is what I was looking for in this post. - Thank you.