Why Is It What It Is?

Wednesday, June 8, 2011 Posted by Corey Harrell
….. Or more specifically why is Microsoft Office metadata what it is?

Microsoft Office documents contain metadata that may be relevant to a digital forensic examination. The metadata may show when a file was created, modified, printed, or what user accounts were used to perform those actions. Others have already researched the metadata in Microsoft Office 2003 and 2007 documents including providing programs to parse the metadata. A few of the write-ups are: Kristinn Gudjonsson’s Office 2007 metadata post, and Lance Muller’s Office Metadata EnScript & Updated Office 2007 Metadata EnScript posts. I was interested in understanding how different actions taken against a Microsoft Office document affect its metadata.

To help show the relevance of Office documents’ metadata I included the metadata from one of my test Word 2007 documents. Usually I manual examine the information in metadata on an as needed basis but I thought for this post it would be cleaner to show the information in report format. The text below is the output from Kristinn’s read_open_xml_win.pl script ran against a test Word document. The File Metadata section shows when the file was created, modified, printed, and what user accounts were used to perform those actions. Did you notice the last print date/time (2011-05-27T19:09:00Z) occurred one minute before the file was even created (2011-05-27T19:10:00Z)? I’ll touch on this later in the post which is why I pointed it out.

Document name: E:\office metadata testing\word 2007\Xp-2007-1sp.docx
Date:
--------------------------------------------------------------------------
Application Metadata
--------------------------------------------------------------------------
Template = Normal.dotm
TotalTime = 2
Pages = 1
Words = 1
Characters = 9
Application = Microsoft Office Word
DocSecurity = 0
Lines = 1
Paragraphs = 1
ScaleCrop = false
Company = Test-lab
LinksUpToDate = false
CharactersWithSpaces = 9
SharedDoc = false
HyperlinksChanged = false
AppVersion = 12.0000
--------------------------------------------------------------------------
File Metadata
--------------------------------------------------------------------------
title =
subject =
creator = test-2007
keywords =
description =
lastModifiedBy = test-2007
revision = 2
lastPrinted = 2011-05-27T19:09:00Z
created = 2011-05-27T19:10:00Z
modified = 2011-05-27T19:10:00Z

Usernames in Microsoft Office Metadata

Before looking into how different actions against a Microsoft Office document affect its metadata I think it is useful to know more about the usernames reflected in the creator and modifiedby attributes. The usernames are not populated with the name of the user account that performed the action since there is a value in the Windows registry containing the name to use.

When Office 2003 and 2007 is installed there is prompt asking for a user name and company. Those fields are already prepopulated with the information entered when the operating system is installed which is located in the registry key HKLM\Software\Microsoft\Windows NT\CurrentVersion (thanks Greg Kelly for this info). The prompt gives the user an opportunity to either change the user name or company or to leave the fields with the information entered during OS installation.

There is a registry key containing what user name and company was used when Microsoft Office was installed. To locate the registry key the Office program’s GUID must first be determined and this Microsoft article explains how to locate the GUID. The GUID of Microsoft Office programs I tested were 9040110900063D11C8EF10054038389C for Microsoft Office Professional Edition 2003 and 00002109110000000000000000F01FEC for Microsoft Office Professional Plus 2007. The registry key containing the information about the Office installation is: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\UserData\S-1-5-18\Products\{GUID}\InstallProperties. The regowner and regcompany values in the registry key contain the user name and company entered during the Office installation.

The usernames and company information reflected in Microsoft Office documents’ metadata are pulled from the UserInfo registry key of the user account’s NTUSER.DAT hive performing the actions. The names of the two values containing the data in the registry key are UserName and Company. The location of the registry key varies depending on the version of Microsoft Office but the paths below show where the key is located for Office 2007 and 2003.

Microsoft Office 2007: HCU\Software\Microsoft\Office\Common\UserInfo
Microsoft Office 2003: HCU\Software\Microsoft\Office\11.0\Common\UserInfo

Now the question I found myself asking is how are the UserName and Company values initially populated in the UserInfo key? I previously explained the user name and company during Office installation because the entered information is used to populate the UserInfo registry key of the user account that installed Microsoft Office. For the user accounts on the system that are using Microsoft Office but didn’t install it, the values are populated a little different. The first time the user launches an Office application a dialog box appears asking for the user name and initials. The dialog box is prepopulated with the name of the currently logged on user. The information entered in the dialog box is what results in the username value in the user's UserInfo key while the company value comes from the information entered when the Office was installed.

The metadata shown above now has a little more meaning. The username test-2007 is not the name of the user account that created and modified the document but is the name listed in the UserInfo registry key. The name in the UserInfo registry key can be changed at any point but any changes will alter the last write time on the registry key. This means the last write time of the user account’s UserInfo key should be taken into consideration when examining metadata. If the registry key last write time is before the dates/times in the metadata (create, modify, or print) then the metadata reflects what is currently in the user account’s NTUSER.DAT hive. On the other hand, if the registry key last write time is after the metadata timestamps then what is currently in the user account’s NTUSER.DAT hive may not be what was there when the action was taken against the Office document (did I just hear someone whisper check the restore points or volume shadow copies for registry files).

How Actions Change Microsoft Office Metadata

The testing I conducted consisted of creating one document then performing different actions against copies of the document to see how the metadata changed. I only ran the tests against documents created by the following programs: Word 2007, Word 2003, Excel 2007, and Excel 2003. If I test other Office Programs in the future then I’ll update the post to reflect it. The observed changes in the documents’ metadata were consistent across all of the different versions of Office but there were some minor differences between the different file types. The Excel metadata differed from Word in the following ways: there was no revision number, some timestamps contained seconds, and the Save As function didn’t change the documents’ creation date.

I’m providing charts of how the metadata was affected by the different actions taken against the documents. The chart has information in the parenthesis to show what the metadata values were for one set of documents (the timestamps don’t include the date since it was the same for all of the documents).

Here’s the Microsoft Office Word Metadata Changes chart and Microsoft Office Excel Metadata Changes chart.

The charts show how different actions against a Microsoft Office document affect its metadata. There are quite a few takeaways but I’m only going to highlight a few.

* The metadata create date/time reflects when the Office program was opened as opposed to the first time the document was saved
* Copying an Office document doesn’t changed the metadata
* The metadata print date/time only changes when the document is saved after it is printed
* The Save As function results in the Word metadata create and modification date/times being the same while the modification date/time only changes in Excel metadata

Now let’s go back to the metadata I posted above. Do you remember that the last print date (2011-05-27T19:09:00Z) occurred one minute before the file was even created (2011-05-27T19:10:00Z)? There was one action taken against a Microsoft Word document that produced this pattern in the metadata. The action was printing a document then using the Save As function to create a new document. The metadata shown above is from the newly created document.

Hopefully, the sharing of my test results can help others who are pondering the question “why is Microsoft Office metadata what it is”.
  1. Thanks. This is very good information. While Microsoft has released internals related documentation for both its pre and post 2007 Office data formats, the behavior of the applications is critical for the forensics community to figure out.

    Thanks.

Post a Comment