File types the Content Classifier can search on a new Windows Server 2008 R2 install

File types the Content Classifier can search on a new Windows Server 2008 R2 install

  • Comments 1
  • Likes

The Content Classifier in the File Classification Infrastructure extracts text from files using the IFilter mechanism that enables the Search Indexer. Here is a list of file types that have a corresponding IFilter installed on a Windows Server 2008 R2 install without any other software installed on it:

Filter Name Extension
HTML Filter .ascx .asp .aspx .css .hhc .hta .htm .html .htt .htw .htx .odc .shtm .shtml .sor .srf .stm
Microsoft Office Filter .doc .dot .pot .pps .ppt .xlb .xlc .xls .xlt
MIME Filter .mht .mhtml .p7m
Plain text Filter .a .ans .asc .asm .asx .bas .bat .bcp .c .cc .cls .cmd .cpp .cs .csa .csv .cxx .dbs .def
.dic .dos .dsp .dsw .ext .faq .fky .h .hpp .hxx .i .ibq .ics .idl .idq .inc .inf .ini
.inl .inx .jav .java .js .kci .lgn .lst .m3u .mak .mk .odh .odl .pl .prc .rc .rc .rct
.reg .rgs .rul .s .scc .sol .sql .tab .tdl .tlh .tli .trg .txt .udf .udt .usr .vbs
.viw .vspscc .vsscc .vssscc .wri .wtx
RTF Filter .rtf
Wordpad Filter .docx .odt
XML Filter .csproj .user .vbproj .vcproj .xml .xsd .xsl .xslt

Additionally, the a few Microsoft IFilters that can easily be added without extra cost

Filter Name Extension Reference
Windows TIFF IFilter .tif Server Manager-> Add Feature –> Windows TIFF IFilter
Microsoft Filter Pack for Office 2007 .docx, .docm, .pptx, .pptm, .xlsx, .xlsm, .xlsb .vdx, .vsd, .vss, .vst, .vdx, .vsx, .vtx .one .zip http://www.microsoft.com/downloads/details.aspx?FamilyId=60C92A37-719C-4077-B5C6-CAC34F4227CC&displaylang=en
Microsoft Office 2010 Filter Packs .doc, .ppt, .xls, .xlsm, .xlsb, .docx, .docm, .pptx, .pptm, .xlsx, .zip, .one, .vsd, .vsx, .vss, .vst, .vdx, .vtx, .pub, .odt, .ods, .odp http://www.microsoft.com/downloads/en/details.aspx?FamilyID=5cd4dcd7-d3e6-4970-875e-aba93459fbee

Other free and commercial IFilters also exist. You can start at http://ifilter.org/ to find more IFilters.

However, just because an IFilter exists, does not mean it will extract data from files. Some of these will not retrieve text to be scanned by the content classifier. For a complete list of file types that the default IFilters can extract text from and what data is extracted from each, you can look at the official list of included Filters at http://www.live.com/docs/toolbarts.aspx?t=MSNTbar_CONC_SearchableFileTypes.htm

If you would like to figure out which IFilters are installed on one of your servers, you can use the FiltReg (http://msdn.microsoft.com/en-us/library/ms692537(VS.85).aspx) tool.

The official IFilter blog has a lot more information as well as more links to IFilters to extend the reach of the Content Classifier (and the Search Indexer) into more file types.

Update: Formating issues with table

Update 2: Added entry for Office 2010 filter pack

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment