Classifying files based on location and content using the File Classification Infrastructure (FCI) in Windows Server 2008 R2

Classifying files based on location and content using the File Classification Infrastructure (FCI) in Windows Server 2008 R2

  • Comments 10
  • Likes

We just introduced the world to the File Classification Infrastructure (Windows Server 2008 R2 File Classification Infrastructure – Managing data based on business value). Now let us show you how to set up your file server to classify files.

Setup

The first step is to install the File Classification Infrastructure (FCI). This is done by installing the File Server Resource Manager (FSRM) role service in the File Services role, since FCI is exposed through FSRM. This does mean that FCI is available on every SKU of Windows Server 2008 R2 and comes at no additional cost.

Configuring properties to track

We think of classification as a process that is business driven and expect the corporation to determine which properties it needs for files to drive their policies. When you first install FCI, you’ll notice that no properties are predefined. Defining a property is easy though:

  • Open FSRM (it’s under the Administrative Tools)
  • Navigate to Classification Management –> Classification Properties
  • Click on “Create Property”

FCI-2-1

In this screen shot I’ve defined two properties:

  • PII a boolean (yes/no) to indicate if a file contains personal information
  • Secrecy an ordered list of values (High, Medium, and Low)

People often have a tendency to define a large number of properties that they think might be useful information to know about files. However, we would strongly encourage people to only classify files for properties that are actually driving a management policy of some sort. Classifying files for too many properties, slows down the classification process and means we have to store more information on your disks.

Properties really just require two pieces of information: the name and the type. Some property types require more information (an Ordered List property requires a list of valid values, etc). Optionally you can supply a description for the property. The following types are supported by FCI in Windows Server 2008 R2:

  • Yes/No – a boolean
  • Date-time
  • Number – integer
  • Multiple Choice List – a list of values where multiple values can be selected
  • Ordered List – a list of values that have an implicit ordering (for example high/medium/low or first/second/last)
  • String
  • Multi-String – allows you to set several unique strings to a property

This is a strict subset of the types available on SharePoint.

Created classification properties sets up a schema of properties that should be to tracked for this file server. No files have been modified at this point. However, everything is set up for an admin to write a script that sets properties on files (using the FCI API http://msdn.microsoft.com/en-us/library/dd392349(VS.85).aspx), for a LOB application to set properties on files (since it is remotely scriptable COM, scripts, native, and managed applications can use the API), or for Office files to be manually classified (see a future blog post on more details here). However, to classify the existing files, automatic classification is necessary. For this we use Classification Rules.

Automatically classifying files

Classification rules are simply created by navigating to the “Classification Rules” node and clicking on “Create a New Classification Rule” action

FCI-2-2

Each rule has a name and a scope. The name allows us to figure out which rule set a property value on a file. The scope is necessary since your may have different logic to classify engineering files compared to your finance files, etc. Each classification rule uses a classification mechanism to decide which value to assign a specific property.

FCI-2-3

This rule uses the Folder Classifier which assigns the specified value to the classification property for all files within the rule’s scope. Using this rule everything that appears in D:\engineering will be marked as Medium Secrecy.

Another rule may use the Content Classifier to search the contents of files.

FCI-2-4

The Content Classifier searches for text or patterns using the same mechanism as the search indexer and if it finds them assigns the specified value to the classification property. For this to work, we have to tell the Content Classifier what to search for. This is done by clicking on the Advanced button and selecting the Additional Classification Parameters tab on the resulting dialog.

FCI-2-5

Here you can supply a series of parameters by specifying their name and the parameter value. Parameter names that you can use are:

  • RegularExpression – this is a standard .Net regular expression
  • String – a simple string
  • StringCaseSensitive – a string that is case sensitive

You can have multiple parameters of the same type or a mix of all the parameter types. When all of these parameters are found in a file, then the rule will assign the property value. If you need to set a property value if a file contains the words “Confidential” or “Private” you will need to setup two different rules. The content classifier is intelligent enough to only scan the file once even if there are multiple rules defined.

If these classification mechanisms are not enough for your needs, you can build your own classification plugin (see the Windows 7 SDK) or purchase one from a 3rd party ISV. Once such a plugin is installed it shows up in the drop down list of the classification mechanisms.

FCI-2-6

Multiple classification rules may attempt to assign values to the same property on a file. Consider the following:

  • A rule attempts to set the Secrecy Property to Medium since the file is located in d:\Engineering
  • Another rule attempts to set the Secrecy Property to High since it contains the word Confidential

In such cases, FCI attempts to aggregate the property values. This can be done for the following property types:

  • Yes/No properties – Yes wins over No
  • Multiple Choice List – Combines the sets of values
  • Ordered List properties – The highest entry in the list wins
  • Multi-String – Combines the sets of strings into one set of unique strings

In addition, rules by default only classify files that have not yet been classified for the same property. To change this click on the Advanced button.

FCI-2-7

By checking the “Re-evaluate existing property values” box the rule will attempt to reclassify files if the file or the classification rules have changed. Once the box is checked, the rule must be set to either overwrite any existing property values or to aggregate the new value with the existing value.

The classification rules are applied on a scheduled basis to the files on the server since this may take some time. The administrator can specify the time during which classification can take place. During that time period FCI will find any files that have to be classified or re-classified and process as many of them as possible. Once a file has been classified, the properties stay with the file while it is moved around on NTFS file systems. For Office files, the properties stay attached no matter what is done to the file (they are stored in the file). Of course there is a mechanism to remove properties from files as well, if the administrator needs to.

Now that we have specified the property schema we are interested in for this server and setup a series of rules to automatically classify rules, we are all ready to start managing our files based on their value to the business instead of just where we store them.

Post by Matthias Wollnik


See also these additional blog posts:

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • During the initial planning phases of R2, we had the opportunity to interview several IT professionals

  • I wanted to call your attention to four new blog posts this morning from Nir Ben Zvi (Senior Program

  • I have installed the FSRM in the windows 2008 machine. But under the FSRM, I don't get classification management under FSRM. What could cause this ?

  • Arjun, FCI is new functionality that is available in Windows Server 2008 R2 - try it out on the RC build

  • One of the responsibilities I had last year was the delivery of a data classifications program within

  • We recently revealed the File Classification Infrastructure in Windows Server 2008 R2. This infrastructure

  • We recently revealed the File Classification Infrastructure in Windows Server 2008 R2. This infrastructure

  • Here's a neat new feature of R2 that you might have missed: Microsoft has updated the feature capabilities of an old friend, the File Server Resource Manager, with the added ability to manage file "classifications". These classifications arrive as an

  • (This is a follow up to the blog entry that presented the File Classification Infrastructure in Windows

  • Classifying files based on their content is something we have covered before for the File Classification