What does "Content" mean -- as it pertains to MS Office documents?

I would like to know what the technical definition of "content" is as it pertains to MS Office documents. We all know that the word "content," in the vernacular means -- something that is contained within something else - such as the contents of a bag or box. That's not what I'm seeking in my question. What I'm seeking is below:

When one creates an MS Word document, a date known as "content created" is established for that document.  However. if one creates an MS Word document that has nothing in it, hence no content, we still get a "content created" date, but obviously, there is no content to speak of. -- So, what exactly does MS Office (MS Word in this case) consider to be content? - How does Microsoft define "content"?

I've looked high and low on the Internet for an official definition for "content" - as it pertains to MS Office documents, but have not found one. If you read any Microsoft technical documentation, they constantly refer to a document's content, but never define it specifically. In my example of the empty document, there was nothing saved into the document, but we still got a "content created" date. Therefore, there has to be something within each documents, perhaps in its metadata, that Microsoft is referring to as "content" and hence issuing a content created date. - My question is, what is that? How does Microsoft define "content"?

Thank you,
Fulano
Mr_FulanoAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

arnoldCommented:
Academic couriosity, where do you see that date, properties of the file?
A box or a bag can also have empty air as content.

Are you looking to rely on content creation date for some important facet?

Technically speaking while there might not be any user entered data, if you use anything but the ms office application, you will see control/structure defining the document as the applicable office its.
ChopOMaticCommented:
Don't interpret this in any way as official or as having come from MSFT, but in the digital forensic/EDD arena, in the absence of some other definition, "content" tends to refer to user-created content. It's my strong suspicion that MSFT means the same thing, and that they make the assumption that anyone creating a document intends to populate it with some kind of user-created content.

Off the top of my head, I can't think of a single time when I heard someone refer to metadata or internal document file structures as content that I discussed above. To be sure, I've found plenty of critical evidence in those metadata and/or internal document file structures, but to me, the generic term "content" carries an assumption of having been user created.

If I'm writing an expert report, I'd describe things substantively as I have above, and would be comfortable defending those descriptions from the stand.
Mr_FulanoAuthor Commented:
Hi ChopOMatic, (cooll name by the way)...in any event, thanks for the reply, but I need more than that. I need facts.

If I may offer you some advice and PLEASE don't take this at a personal level...I would suggest to you that if you're on the witness stand and "assume" something like you did in your post above...it may be a very long day for you up there...

The first thing that an opposing attorney will ask you is where you got your definitions. - You would say that "it was off the top of your head" and also based on an "assumption."

The second thing an opposing attorney will do is ask you again to repeat what you said about user created data...or he'll read it back from the record. You said.... "in the absence of some other definition, "content" tends to refer to user-created content." Then that attorney will pull up MS Word, will save an empty document with a name like Empty_File_1, and then close the document. Then he/she will go to the document properties and show the jury that under "content created" it has a date...but wait -- YOU said it had content when a user put content into the document --- how could that be? Are you wrong or is Microsoft, one of the biggest software companies on the market wrong.

Quickly your credibility as a forensic examiner will sink into the abyss in the eyes of a jury that more than likely has no idea what the entire discussion was about anyway, but can be easily persuaded by clever parlor tricks, like what the opposing attorney just pulled...you'd be up there for hours!

I don't assume (ever) when I don't know something. I find out, or at least I try to find a frame of reference (an publication, some sort of text, anything tangible) from where I can articulate if on the witness stand.  

That said, while in casual discussion with colleagues, I would tend to agree with you that "more than likely", Microsoft intended for content to be added to a document before it was saved and closed...but we can't assume that.

If you look inside an MS Word document in the XML there is an XML file at the root of the folder called [Content_Types].xml...is this what Microsoft is referring to? It gets created as soon as a document is saved. Is this the content that is referenced in the content created date? That is what I'm trying to determine.

In any event, I appreciate you help...

Fulano
Cloud as a Security Delivery Platform for MSSPs

Every Managed Security Service Provider (MSSP) needs a platform to deliver effective and efficient security-as-a-service to their customers. Scale, elasticity and profitability are a few of the many features that a Cloud platform offers. View our on-demand webinar to learn more!

Mr_FulanoAuthor Commented:
Arnold, no...under the document's "Built-In-Properties" section on the Details tab. Right below the Manager property, you'll see a date for "content created." It's the same date that's in the core.xml file within the XML structure for the document, however, there it's in Zulu time, not local time.

Fulano
arnoldCommented:
Do not currently have access to ms office app on the platform I am on.
Create and save a blank file.
Duplicate the file.
Open the file add additional contents.
Compare the two to see whether the created content date is the same, and then whether the modified includes a modify date of content.

Create a new template. Create a new document using the template and just save it and see if the behavior is sustained.
arnoldCommented:
A search on the question points to an MS forum but there does not seem to be a definitive reference to official MS reference

http://answers.microsoft.com/en-us/office/forum/office_2010-word/microsoft-word-query-about-what-content-created/76279e9d-db5d-4d9f-a9ed-c1e73d59b77f

The internal document reference is likely used for collaborative/sharepoint type of a setup.

I think your critic of ChopOMatic is a bit harsh as one has to have a context to the question I.e. What is being reviewed, etc.
The question is vague.
BillDLCommented:
>>> ... if one creates an MS Word document that has nothing in it, hence no content, we still get a "content created" date, but obviously, there is no content to speak of.  In my example of the empty document, there was nothing saved into the document ... <<<

At this moment I don't have access to a computer with anything more recent than Office 2003 (i.e. binary *.doc by default and xml-based *.docx only through compatibility add-on), but here's something to test out.

Right-Click in any folder > New > Word Document and create a new blank *.docx file.  Right-Click > Properties, and my guess is that you should see no pages, paragraphs, lines, characters, or other "content" in the file's properties.  Open and Save it without typing anything at all and close it.  Check the properties again and my guess is that one page, one paragraph, and one line should now be showing in the file's properties.  If I have anticipated the results correctly with Word 2007 and more recent, then I would suggest that this is "content".  Whether or not it is "content to speak of" is probably irrelevant.
ChopOMaticCommented:
I appreciate the thoughts, Fulano, and your points are well taken. Certainly the scenario you present with an aggressive attorney is a possibility. And if I were about to testify on this exact point of "content," I too would scour the available body of knowledge to arm myself with all the verifiable tedium available, just in case I needed to spar with such an attorney.

All that said, my experience over the past dozen years doing this is that judges and juries are much more responsive to (correct) common-sense language and explanations than they are to techno-tedium.

And now I'll stop the yacking since this is drawing your thread off topic. :)

Again, thanks for the thoughts!

Chop
Mr_FulanoAuthor Commented:
Arnold, thank you for the link...its interesting that I'm not the only one that wants to know what content created actually means.

As for my comments towards ChopOMatic...not harsh in the least, because I prefaced my comment by asking him not to take it at a personal level. What I did in my comments was to address his statement about writing an expert report. Usually, expert reports lead to trials and trials lead to sitting in the witness stand...so, I would rather ChopOMatic realize the danger of assuming on an expert report or during testimony, than allow some two-bit attorney to tear him up on the witness stand.

I'm not being critical of him directly, but of his comment about defending an assumption, without experimentations to draw a conclusion.

If it seemed harsh, that was not my intent. My intent was to keep him from being eaten up on the witness stand.  

Fulano
Mr_FulanoAuthor Commented:
ChopOMatic, I'm glad you didn't take it personally. It WAS NOT intended in that manner. I've seen too many good investigators get ripped to shreds by some attorney that has half the brain cells, because the investigator made a common sense assumption. I didn't want to see that happen to someone here on EE.

Thanks for your help, I DO appreciate you taking the time to assist.

Fulano
rindiCommented:
A new document without any text inside, still isn't empty, it still has content that is invisible to you, like formatting content that holds the page size default font and all things like that. Compare it to a coke bottle with no coke inside. That bottle isn't empty, it contains air...
Mr_FulanoAuthor Commented:
Hi Rindi, I agree 100% with your analysis. That is what I believe is happening, because I understand the mechanics of how the document works internally, much like you and the others here do, I think we all agree with you, but what I'm trying to do is find a source that verifies or validates our interpretation of what is actually happening "behind the scenes."

I've searched high and low for something that can define "content created" (in this context). I'd  even settle for simply defining "content" in the technical sense - not the vernacular as I've stated prior. However, I suspect that Microsoft defined this term many, many years ago and has simply grown to use it rather loosely over time.

I think that, if I could find a technical definition for what you just described, I'd be fine - that would work for me.

BTW, I have a legitimate need to have the definition, I'm not just fixated on defining a term for no reason. I need it for a specific project I'm working on, where I have to support why I'm doing what I'm doing. I need to be able to explain what the content is, and the fact that that time stamp (i.e. the content created time stamp) means that at that specify time, that document was - created, for lack of a better term.

Thanks,
Fulano
rindiCommented:
Maybe the correct wording would be "Document Creation time".
Mr_FulanoAuthor Commented:
Yes, if I had been at Microsoft when Office was being rolled out, that's what I would have suggested. However, it makes you wonder why they didn't choose the obvious? Leading one to wonder what else does it mean?
rindiCommented:
It probably isn't really that important to them. It is typical, for example there are also many error messages you get within Windows which really don't make any sense when you look at the actual cause of the error.
arnoldCommented:
Document creation date is part of the file system level revord.
The item asked I think is more in the contextt of the metadata for use within collaborative, sharepoint .......type of references.
Mr_FulanoAuthor Commented:
I agree with Arnold in the sense that the document creation date is part of the file system record. I don't know about the sharepoint reference part, because that is not my forte, but there has to be a reason they called it "content created." -  At least I would think so. Otherwise it makes no sense to change the creation date name so much.
_TAD_Commented:
A word document (docx) is not merely a single document.  A .docx file consists of dozens of files.

Just for giggles do the following:

1) Create a new docx document
2) Using windows explorer rename the document to a .zip file
3) Open the .zip (formerly docx) in any zip file editor/inspector

You will see document.xml (your document payload) as well as many other files like
endnotes.xml, footnotes.xml, footer.xml, header.xml, stylesheets, themes, embedded objects, etc.


So back to your question about content... when these files are created or updated they will never be completely empty irrespective of the actual content of your document.  I submit that your date is related to the information stored in these internal documents somewhere.
Mr_FulanoAuthor Commented:
Hi TAD, I suspect you're right to some degree, but what I was looking for was something from Microsoft that stated what you just described. I don't think I'm ever going o find that....

Thanks for your input and yes...I've done the ZIP trick - pretty cool!

Fulano
arnoldCommented:
The only one who can definitively answer your inquiry is MS.
The content date in your question will  be used to challenge the authenticity of documents provided (electronic and hard copy).
The documents will be printed out and they will only see the user created content formatted by the Application.

I.e. a party is compelled to produce documents, it is possible that they could either provide it in electronic form or printed out hard copies to fulfill the directive unless electronic production is required.


Electronic document production avails the recipient the same opportunity to alter.......

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Mr_FulanoAuthor Commented:
WOW...these are ALL outstanding answers to a deep question, which may not have a published answer.

I'm going to split the points up evenly to give everyone a bit for their contribution. Thank you. We had a VERY good and interesting conversation.

Fulano
Mr_FulanoAuthor Commented:
ALL excellent comments. Its hard to choose one over another.
BillDLCommented:
Thank you Mr_Fulano
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Office

From novice to tech pro — start learning today.