Windows Server 2008 R2 File Classification Infrastructure – Managing data based on business value

Windows Server 2008 R2 File Classification Infrastructure – Managing data based on business value

  • Comments 14
  • Likes

During the initial planning phases of R2, we had the opportunity to interview several IT professionals about the pain points of managing data on file servers. We tried to understand how we can improve the file server data management and what success looks like for these IT professionals.


Very early in the discussion it became apparent that there are key problems that we should try to help find solutions for:


·         High growth of data results in higher spending on storage and data management 

·         New regulations impose a need to better handle sensitive information such as personal information and financial documents

·         Leakage of both business critical and personal information is a big problem


We then looked at the existing solutions that are available in the market and indeed there are great solutions for security (e.g.: Data Leakage Prevention …) and data management (e.g.: Backup, Archival, HSM …) but these solutions do not interoperate and mostly work based on where the file is located (folder) and not based on what the file business value is for the organization.


A folder-based approach makes this a harder problem for the human beings who need to figure out where to store their data based on complex company policies.  “Does high business impact data with personally identifiable information go here?  Or there?”  Not to mention the challenges around dealing with documents that don’t end up in the right folder.


What we heard from our customers is that they would like to gain insight into their data so that they can manage data more effectively, reduce cost and mitigate risk


This realization has led us down the path of creating the File Classification Infrastructure that enables organizations to classify their files (assign properties to files) and then use Windows mechanisms as well as partner solutions to apply actions to files based on the file classification.




The File Classification Infrastructure includes the ability to define classification properties, automatically classify files based on location and content, apply file management tasks such as file expiration and custom commands based on classification and produce reports that show the distribution of a classification property on the file server.


In addition to the functionality delivered in Windows we also aimed at building an extensible infrastructure in order to help provide integration points for different partner offerings by enabling classification solutions to plug into Windows to classify files and persisting the file classification so that data management products can query the file classification to apply appropriate policy/action. For example if a data leakage prevention product classifies files as containing personal information then a backup product can back it up to an encrypted store instead of the regular store.


Using this paradigm, IT organizations can now define policy that spans across the organization and can better translate business requirements to IT actions. For example: The organization might have a policy to expire files that are 10 years old and are not critical to the business. This policy can be translated to use the new file management tasks to expire files across file servers. Furthermore, when new data directories are added, there is no need to change the file management tasks since the action is taken as per the business criticality of the files regardless of their location.


I would like to briefly touch on classification. Many people I talk to raise their eyebrows when I start discussing this subject. I tend to agree with them, classification is hard to determine what organization wide properties to assign to files and it is also hard to actually classify files.


The process that seems to work for determining the organization properties is to have a discussion that includes both the business and IT people and determine how they would like to manage their data and what classification properties should be assigned to files in order to easily manage them. What I found is that this usually amounts to just a few properties such as a mix of the below:


·         Personal information (yes/no)

·         Business criticality

·         Confidentiality

·         Project

·         Retention period


Now that you defined what properties should be assigned to files, comes the next challenge: actually classifying files. There is no magic formula here but the File Classification Infrastructure really helps you get a long way to achieve this with automatic classification rules to classify the large amount of files residing on your file servers as well as an extensibility mechanism that allows plug-ins and last but not least, the ability to recognize manual classification of Office files. The various classification methods that we observed across the IT organizations we were working with include:


·         Manual classification

·         Line Of Business application classification (e.g.: When an HR application saves a file to the file server, it can also set the “Personal Information” property to “yes”)

·         Automatic classification based on

·         Location of files

·         File owner

·         File content

·         Other (e.g.: file size, file extension …)


All these methods might be used to classify files and the File Classification Infrastructure extensibility supports multiple classification mechanisms that can run in tandem to determine the file classification.


With classification in place, data management scenarios light up and become easier to accomplish – here are a few examples of scenarios that can be automated using the Windows Server 2008 R2 inbox functionality with no additional code and scenarios that can be enabled by writing IT PowerShell scripts or using partner solutions that leverage the File Classification Infrastructure APIs.




These additional blogs provide deep dives into how to leverage File Classification Infrastructure (FCI) in your IT environment and how to develop solutions to further plug-in and enhance FCI:


·         Classifying files based on location and content using the File Classification Infrastructure (FCI) in Windows Server 2008 R2

·         Dealing with stale data on File Servers

·         Customizing File Management Tasks


If you are interested to learn more, please join us for our Tech-ed session, hands on lab and file services booth:





ID: WSV329

Title: Windows Server 2008 R2 File Classification Infrastructure: Managing Cost and Mitigating Risk on File Servers

Date/Time: 5/14/2009 4:30PM-5:45PM

Location: Petree Hall D

Hands-on Lab:


Track: Windows Server

Title: How to Reduce Cost and Risk on File Servers Using the New File Classification Infrastructure

Location: T9311


Post by Nir Ben Zvi

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • PingBack from

  • I wanted to call your attention to four new blog posts this morning from Nir Ben Zvi (Senior Program

  • One of the responsibilities I had last year was the delivery of a data classifications program within

  • As a records management professional I love the concept, but was disheartened to see "Retention Period" as one of the attributes being collected in the above example.  "Retention Code" is a far more valuable property and a RM industry standard - retention periods change all the time as various regulations or business requirements change, but if you can find all the objects relating to a particular CODE and match that to a retention period for that code, you have a powerful way to reduce risk to your organization and disposition records.

  • @Dennis Oberhofer: Retention Period is just one example.  A storage administrator can choose to define any kind of property he/she wants, including Retention Code.

  • Awesome news!  I am so glad to hear that file classification features and hooks will finally be available natively in the operating system!  We here at InDorse Technologies look forward to extending our context-oriented document assurance solutions to take advantage of the native features (FCI) while extending the FCI infrastructure to manage categorically and content across non-Windows 2008 systems (Un*x, Mainframes, SANs, NASs, SoA, etc) using the InDorse Core platform!  Moreover, we're excited to extend the InDorse tagging capabilities to FCI.  This to me is music to our ears.  THANK YOU, Microsoft Storage Team!  I look forward to the collaboration.

  • Here's a neat new feature of R2 that you might have missed: Microsoft has updated the feature capabilities of an old friend, the File Server Resource Manager, with the added ability to manage file "classifications". These classifications arrive as an

  • FCI certainly is interesting and long overdue... I hope that the SharePoint team has been in the Groove (hehe) with the Server 2008 team to have SharePoint leverage the new file classification infrasture!

  • (This is a follow up to the blog entry that presented the File Classification Infrastructure in Windows

  • The Content Classifier in the File Classification Infrastructure extracts text from files using the IFilter

  • Classifying files based on their content is something we have covered before for the File Classification

  • in the article u wrote: "In addition to the functionality delivered in Windows we also aimed at building an extensible infrastructure in order to help provide integration points for different partner offerings by enabling classification solutions to plug into Windows to classify files and persisting the file classification so that data management products can query the file classification to apply appropriate policy/action"

    Is there any sample code/documentation regarding this?

  • @DB for documentation check out the updated documentation on MSDN:

    We will soon update this with samples

  • This has been needed for many years - but is a solution born out of the past. Business value will no longer be defined in traditional records management ways, and RM policies will begin to shift to recognize Web 2.0 and Web 3.0 elements as being a much better indicator of true business value. Attributes relating to Tagging, user Voting/relevance, comments and other social mechanisms will better identify which records need to really be retained.

    However, anything that gets the end user and business population to better engage in the process of disposing of unnecessary files is a step forward, and I'm looking forward to mechanisms that will aid users in getting rid of junk.