The Storage Team Blog about file services and storage features in Windows Server, Windows XP, Windows Vista and Windows 7.
by Anupadmaja Raghavan
The File Classification Infrastructure in Windows Server 2008 R2 is great for classifying files based on file content and/or location. You can define rules that can discover sensitive files (such as those containing social security card or SSN numbers) based on the presence of certain text patterns in all your files. See http://blogs.technet.com/filecab/archive/2009/05/11/classifying-files-based-on-location-and-content-using-the-file-classification-infrastructure-fci-in-windows-server-2008-r2.aspx.
We also heard from IT administrators that they would like to classify files based on other methods such as file types, custom parsing techniques for certain file formats, incorporating information from AD, etc. FCI (the File Classification Infrastructure) provides you an extensible model that you can use to define your own classifiers. You can write today classifiers in languages such as C++ and C# as illustrated in the SDK (http://go.microsoft.com/fwlink/?LinkID=150217&clcid=0x409 http://www.microsoft.com/downloads/details.aspx?FamilyID=6db1f17f-5f1e-4e54-a331-c32285cdde0c&displaylang=en).
You might wonder at this point, I am an IT administrator - is it possible to write classification rules without writing a complicated C# or C++ DLL? Like writing a simple PowerShell script? The answer is yes.
The FCI team has developed a custom classifier module, the PowerShell classifier module, which is capable of using PowerShell scripts to perform custom classification of files.
The PowerShell classifier module consists of two components
The PowerShell host classifier has been developed by the FCI team for the convenience of administrators. The PowerShell host classifier takes care of plugging into FCI and PowerShell. Administrator has to only provide the PowerShell script required to do the custom classification.
The way this works is, the PowerShell host classifier of the PowerShell classifier module is registered with FCI as a classifier module. Then a classification rule is setup in FCI that defines the parameters that need to be sent to the PowerShell host classifier when classification runs. During classification, for every file that it scans, FCI passes the file with the parameters from the rule to the PowerShell host classifier. One of the mandatory parameters that the PowerShell host classifier takes is the name of the PowerShell script file to use. For each file it receives from FCI, the PowerShell host classifier calls into the PowerShell script that determines the classification for that file and sends the result back to FCI. FCI finally reads back the result of classification and proceeds with other actions to complete the classification for that file.
The steps involved in using the PowerShell classifier module are as follows:
Once the PowerShell host classifier of the PowerShell classifier module is installed and registered, administrator has two main actions to perform:
The PowerShell host classifier module provides a simple way to do custom classification using PowerShell scripts. More details on the topics discussed in this post and other capabilities of the PowerShell classifier module are available in the SDK. Check back with us for a post on “How to do Content Classification of Files using Windows PowerShell scripts”.
> You can write today classifiers in languages such
> as C++ and C# as illustrated in the SDK
I found sample code in C# which uses Fsrm API to classify files. But there seems to be no sample code for C++. I am facing very basic level difficulties, such as what files to include, what libs to link with and what DLL actually implements the interface. MSDN documentation of IFsrmClssificationManager ( http://msdn.microsoft.com/en-us/library/dd392349(VS.85).aspx) does not talk about it.
Would you please talk more on these aspects?
Thanks,
-Nilesh.