Reading an MS Word 2010 document (docx) and return all of the document's "Built-In Document Properties."

Hi, I'm using VS2010 C# and MS Word 2010.

My code below is a Windows Form with one button on it and a function called doIT(), which is where my code resides. When you click the button, the code launches and you can then view the result in the VS Output Window.  

The code also requires the following references:
-- Microsoft Office 14.0 Object Library
-- Microsoft Word 14.0 Object Library
-- System.Reflections

Description:

I've been working on an application that will read an MS Word document (docx) and return all of the document's "Built-In Document Properties."

The code at the end of this post works and although using "Microsoft.Office.Interop" DLLs is said to be slow, I'm not concerned about the speed of the code, but rather with its accuracy and completeness of the results. Thus, I need to get -- all -- the property values, not just some of them - otherwise the application is useless to me.

What I'm doing is iterating over each of the "WdBuiltInProperty" entries with a "foreach" loop and using the property's "Name" to get the property "Value" - rather than having to hard code each and every possible document property one by one. This I believe is a more elegant approach.

However, although the code works (partially), it fails on some of the properties, while working on others. I've tried numerous approaches, but it usually errors out on the property that it cannot fetch.

What I think is actually happening is that the Array in my code is getting the property names of "typeof(WdBuiltInProperty)", but its not getting them from inside the actual document, its simply listing them from what I believe is some sort of look-up table. You can prove that by breaking the link to the MS Word document and running the code again (with a few minor modifications), so without making a link to the document, you'll still be able to list out all the property names.  

So, keeping that in mind, when the code runs, it try to match the property names it got from WdBuiltInProperty, which are now in the array with the actual "inner document" property names and that becomes a "hit and miss" type of situation...some match and you get a return value, and some don't match and those error out with exceptions as shown below:

-- A first chance exception of type 'System.NullReferenceException' occurred in MetaDatareader_with_Loop.exe
-- A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll

Based on these exceptions, there may be more than one problem here...

I believe that the real solution is to make the code use System.Reflections to look -- inside the actual document -- and pull out all the property names that it can find. Then to put those property names into an array and then, we can use those to get the property values within the document.  

One thing that is also interesting is that some properties, like the created date and the last saved date and the revision number are all not being retrieved. The fact of the matter is that my document, and I would presume all MS Word documents will have these properties by default. My document definitely does, but I'm getting an error that says

========================================================================
The proerty value for wdPropertyTimeCreated could not be extracted.
===========================================================================
The proerty value for wdPropertyTimeLastSaved could not be extracted.
===========================================================================
etc...


So, my question is...how do I get the property names that are actually in the document.

Thank you for your help,
Fulano


Below is an example of my output results:
===========================================================================
Description Properties:
===========================================================================
1). The document's propname is    : wdPropertyTitle
2). The document's subPropname is : Title
3). The document's propvalue is   : Screenwriting Fundamentals

===========================================================================
1). The document's propname is    : wdPropertySubject
2). The document's subPropname is : Subject
3). The document's propvalue is   : Tutorial Document Collection

===========================================================================
1). The document's propname is    : wdPropertyAuthor
2). The document's subPropname is : Author
3). The document's propvalue is   : John Smith

===========================================================================
A first chance exception of type 'System.NullReferenceException' occurred in MetaDatareader_with_Loop.exe
A first chance exception of type 'System.NullReferenceException' occurred in MetaDatareader_with_Loop.exe
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
The proerty value for wdPropertyKeywords could not be extracted.
===========================================================================
The proerty value for wdPropertyComments could not be extracted.
===========================================================================
1). The document's propname is    : wdPropertyTemplate
2). The document's subPropname is : Template
3). The document's propvalue is   : Normal.dotm

===========================================================================
The proerty value for wdPropertyLastAuthor could not be extracted.
===========================================================================
The proerty value for wdPropertyRevision could not be extracted.
===========================================================================
The proerty value for wdPropertyAppName could not be extracted.
===========================================================================
The proerty value for wdPropertyTimeLastPrinted could not be extracted.
===A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
========================================================================
The proerty value for wdPropertyTimeCreated could not be extracted.
===========================================================================
The proerty value for wdPropertyTimeLastSaved could not be extracted.
===========================================================================
The proerty value for wdPropertyVBATotalEdit could not be extracted.
===========================================================================
The proerty value for wdPropertyPages could not be extracted.
===========================================================================
The proerty value for wdPropertyWords could not be extracted.
===========================================================================
The proerty value for wdPropertyCharacters could not be extracted.
===========================================================================
1). The document's propname is    : wdPropertySecurity
2). The document's subPropA first chance exception of type 'System.NullReferenceException' occurred in MetaDatareader_with_Loop.exe
A first chance exception of type 'System.NullReferenceException' occurred in MetaDatareader_with_Loop.exe
A first chance exception of type 'System.NullReferenceException' occurred in MetaDatareader_with_Loop.exe
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
name is : Security
3). The document's propvalue is   : 0

===========================================================================
The proerty value for wdPropertyCategory could not be extracted.
===========================================================================
The proerty value for wdPropertyFormat could not be extracted.
===========================================================================
The proerty value for wdPropertyManager could not be extracted.
===========================================================================
1). The document's propname is    : wdPropertyCompany
2). The document's subPropname is : Company
3). The document's propvalue is   :

===========================================================================
The proerty value for wdPropertyBytes could not be extracted.
===========================================================================
The proerty value for wdPropertyLines could not be extracted.
======================================A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll
=====================================
The proerty value for wdPropertyParas could not be extracted.
===========================================================================
The proerty value for wdPropertySlides could not be extracted.
===========================================================================
The proerty value for wdPropertyNotes could not be extracted.
===========================================================================
The proerty value for wdPropertyHiddenSlides could not be extracted.
===========================================================================
The proerty value for wdPropertyMMClips could not be extracted.
===========================================================================
The proerty value for wdPropertyHyperlinkBase could not be extracted.
===========================================================================
The proerty value for wdPropertyCharsWSpaces could not be extracted.
===========================================================================
The program '[6516] MetaDatareader_with_Loop.vshost.exe: Managed (v4.0.30319)' has exited with code 0 (0x0).

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using Microsoft.Office.Core;
using Microsoft.Office.Interop.Word;
using System.Reflection;
using System.IO;

namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {

        public Form1()
        {
            InitializeComponent();
        }
        private void btnGo_Click(object sender, EventArgs e)
        {
            doIT();
        }
      

        private void doIT()
        {
            string propname = "";
            string subPropname = "";
            string propvalue = "";

            #region (Object set up code)

            Microsoft.Office.Interop.Word.Application wordObject = new Microsoft.Office.Interop.Word.Application();     //create word app class object  
            
	    object file = @"\\Test Folder\TestDOC.docx";                //this is the path to file to open
            object nullobject = System.Reflection.Missing.Value;

            Microsoft.Office.Interop.Word.Document docs = wordObject.Documents.Open(file, nullobject, nullobject, nullobject, nullobject,
            nullobject, nullobject, nullobject, nullobject, nullobject, nullobject, nullobject, nullobject, nullobject, nullobject, nullobject);  //open the file 

            object wordProperties = docs.BuiltInDocumentProperties;
            Type typeDocBuiltInProps = wordProperties.GetType();

            Console.WriteLine("");
            Console.WriteLine("===========================================================================");
            Console.WriteLine("Description Properties:");
            Console.WriteLine("===========================================================================");

            #endregion (Object set up code)

            
            #region (Array code)


            Array values = Enum.GetValues(typeof(WdBuiltInProperty));

            foreach (var value in values)
            {

                try
                {

                    propname = value.ToString();
                    subPropname = propname.Substring(10).ToString();

                    Object eachproperty = typeDocBuiltInProps.InvokeMember("Item", BindingFlags.Default | BindingFlags.GetProperty, null, wordProperties, new object[] { subPropname });
                    Type typeeachproperty = eachproperty.GetType();



                    propvalue = (string)typeeachproperty.InvokeMember("Value", BindingFlags.Default | BindingFlags.GetProperty, null, eachproperty, new object[] { }).ToString();

                    Console.WriteLine("1). The document's propname is    : " + propname);
                    Console.WriteLine("2). The document's subPropname is : " + subPropname);
                    Console.WriteLine("3). The document's propvalue is   : " + propvalue);
                    Console.WriteLine();

            #endregion (Array code)
                    
                }


                catch (Exception j)
                {
                    Console.WriteLine("The proerty value for " + propname + " could not be extracted.");
                    
                }

                Console.WriteLine("===========================================================================");

            }
        }
        
        
       

    }
}

Open in new window

Mr_FulanoAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Miguel OzSoftware EngineerCommented:
Please try this code: (replace lines 56 to 95 in your code)
DocumentProperties wordProperties = (DocumentProperties)docs.BuiltInDocumentProperties;
var values = Enum.GetValues(typeof(WdBuiltInProperty));

foreach (var value in values)
{

    try
    {
	propname = value.ToString();
        subPropname = propname.Substring(10).ToString();
        string wpValue  = wordProperties[value]].Value.ToString();

        Console.WriteLine("1). The document's propname is    : " + propname);
        Console.WriteLine("2). The document's subPropname is : " + subPropname);
        Console.WriteLine("3). The document's propvalue is   : " + wpValue);
        Console.WriteLine();
    }


    catch (Exception j)
    {
        Console.WriteLine("The property value for " + propname + " could not be extracted.");
    }
}

Open in new window

Notice that:
1) I cast the doc properties to DocumentProperties directly as it allows direct use of enum values.
2) Please check that document is a native Word 2010 doco. (e.g. not a word 2003 doco)
0
Mr_FulanoAuthor Commented:
Hi Miguel, I tested your code, but it errored-out.

To get it to run, I had to make the following modification:

--  I removed an extra right square bracket at the following line:

string wpValue  = allWordProperties[value]].Value.ToString(); -- The square bracket after the word value.

-- I also commented out the following statement in my code:

     //object wordProperties = docs.BuiltInDocumentProperties;   (line 45)
    //Type typeDocBuiltInProps = wordProperties.GetType();          (line 46)


Error message:

A first chance exception of type 'System.InvalidCastException' occurred in Unknown Module.

Unable to cast COM object of type 'System.__ComObject' to interface type 'Microsoft.Office.Core.DocumentProperties'. This operation failed because the QueryInterface call on the COM component for the interface with IID '{2DF8D04D-5BFA-101B-BDE5-00AA0044DE52}' failed due to the following error: No such interface supported (Exception from HRESULT: 0x80004002 (E_NOINTERFACE)).

The line that threw the error was this one below:

DocumentProperties wordProperties = (DocumentProperties)docs.BuiltInDocumentProperties;

Any thoughts?

Thanks for your help,
Fulano
0
Miguel OzSoftware EngineerCommented:
Apologies, my code is only applicable to VSTO based documents.
In the case of your Win Form application, sadly there is a limitation of using reflection combined with that not all properties are defined for the word document.  (Meaning reflection will throw an exception if property is not found or can not be extracted)
Please check your word document properties for information of which properties are defined. (http://www.addintools.com/documents/office/where-file-properties.html)
Note: Notice that you can define your own custom properties but I never tried from Win forms.
0
Fernando SotoRetiredCommented:
Hi Mr_Fulano;

The below code snippet is using OpenXML library which Microsoft released when Office 2007 was. Do use this code you need to download the library and install it on your system. You can download the library from, Open XML SDK 2.0 for Microsoft Office. When you click on the download link you will be given and option of what to download, select OpenXMLSDKv2.msi. Once installed you can add a reference to the OpenXML. You may see two of them note that it may have installed version 2.5 as well, you want to select version 2.0. You will also need a reference for WindowsBase.dll. Make sure you add the two namespaces below and the code should work.

In researching this I found this to be the simplest solution and the added benefit is that you do not have to load the Word document open it and make sure it is correctly closed. You also do not need to use reflections to get the values of the properties.

Please note that a couple of the property names you posted do not match directly but you should be able to know what they are. You will also note that when your code displays "could not be extracted" the property in not in the document and will not display anything for it.
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;


// Define two XML Files to hold the properties from the Property parts.
XDocument coreFileProperties;
XDocument extFileProperties;

// Location of the Word - 2007 - 2010 document to get the properties from
string filename = "C:/Working Directory/Generic-Text-Word-2007---Lorem-ipsu.docx";
// Open the Word document using OpenXML 2.0 which is used to read Word 2007 - 20010
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(filename, false))
{
    // Load the properties from the XML files into the XDocument objects
    CoreFilePropertiesPart corePart = wordDoc.CoreFilePropertiesPart;
    coreFileProperties = XDocument.Load(corePart.GetStream());
    ExtendedFilePropertiesPart appPart = wordDoc.ExtendedFilePropertiesPart;
    extFileProperties = XDocument.Load(appPart.GetStream());
}

// Iterate through the properties and get there names and values
foreach (XElement ele in coreFileProperties.Root.Descendants())
{
    Console.WriteLine("{0}  :  {1}",  ele.Name.LocalName, ele.Value);
}

foreach (XElement ele in extFileProperties.Root.Descendants())
{
    Console.WriteLine("{0}  :  {1}", ele.Name.LocalName, ele.Value);
}

Open in new window

Results when using your document.

title  :  Generic Text Data Document
subject  :  A Tutorial Document
creator  :  William King
keywords  :  Keyword TAG
description  :  This is a test MS Word 2007 document.
lastModifiedBy  :  Sally Smith
revision  :  2
created  :  2015-10-08T16:49:00Z
modified  :  2015-10-08T16:58:00Z
category  :  Category Field
contentType  :  Content Field
contentStatus  :  Content Status Field
language  :  Lorem Ipsum
version  :  Version Number Field
Template  :  Normal
TotalTime  :  2
Pages  :  2
Words  :  428
Characters  :  2445
Application  :  Microsoft Office Word
DocSecurity  :  0
Lines  :  20
Paragraphs  :  5
ScaleCrop  :  false
Manager  :  Mr. Manager
Company  :  The Any Company Co.
LinksUpToDate  :  false
CharactersWithSpaces  :  2868
SharedDoc  :  false
HyperlinksChanged  :  false
AppVersion  :  12.0000

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Mr_FulanoAuthor Commented:
Excellent solution. Very well designed and very elegant, which is what I was looking for in this post. - Thank you.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C#

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.