Search Service Application: Flip “Include” FileTypes list to “Exclude”

It’s been a while since I have posted anything, so hello again! :) I have been meaning to post this for some time, but time seems to be my enemy.

I worked with a customer a while back that ran into a scenario where they were using a standard “Search Service Application” within the SharePoint 2010 and they were crawling a non-SharePoint, Apache based wiki site, called “MediaWiki.” MediaWiki will allow you to create a new page with pretty much any name you would like.

Example:

https://www.contoso.com/mediawiki/wiki.test.page

When their content source would go out and crawl this site, it would not crawl and index this new page, so it was not showing up in search results.

By default, within the SSA (Search Service Application) Administration page we have a link called “File Types”, and this page controls what File Extensions we will and will not crawl.

 

FileTypes

 

By Default, within a SharePoint SSA, NOT FAST 4 SharePoint 2010, this “File Types” list is an “Include” list, so anything NOT in this list will not get crawled. Here in lies the problem for my customer. Being that end users had the capability to create a page with any name, the “extensions” could vary from anything. Trying to control what extensions will need to be in this list would be nearly impossible and require a lot of man hours to even try to maintain this!

Here is where my customer turned to Microsoft support, to look for a solution or a potential work around. The solution we provided the customer was to utilize PowerShell and manually “flip” this “Include” list to an “Exclude” list, so that the SSA will crawl any extension, except for what is now in the “File Types” list.

Here are the steps to accomplish this task:

Flip the List to an Exclude List

  • Open PowerShell Management Shell for SharePoint with elevated rights, or if you have PowerShell ISE, you can use this also
  • Execute the following:

$sa = Get-SPServiceApplication | where { $_.ApplicationClassId -eq "52547a3d-66ed-468e-b00a-8c4a3ec7d404" }

$sa.SetIsExtensionIncludeList($sa.GetVersion(),0);

  • This will flip the SSA to make it an Exclude list
  • Restart OSearch14 via services.msc console or with command line

net stop OSearch14

net start OSearch14

Keep in mind now that this list is now an “Exclude” so anything in here will no longer be crawled..

We then utilized additional PowerShell to clean out all the existing “File Types” and allowed the customer to start with a clean slate:

Remove the existing file types

 

  • Again using PowerShell Management Shell or PowerShell ISE
  • Execute the Following: (make sure you replace the “SSA” with the name of your Search Service Application)

$ssa = Get-SPEnterpriseSearchServiceApplication -Identity "SSA" $content = New-Object Microsoft.Office.Server.Search.Administration.Content($ssa)

  • $extList = $content.ExtensionList

  • $list = New-Object System.Collections.ArrayList

  • foreach ($ext in $extList)

  • {

  • $list.Add($ext);

  • }

  • for ($i = 0; $i -lt $list.Count; $i++)

  • {

  • $ext = $list[$i]

  • $ext.FileExtension

  • $ext.Delete()

  • }

Below is a list of Extensions that FAST includes OOB.. Feel free to add these in as needed.. I just wanted to give you a list of extensions that MS uses as “excludes”

OOBExclude1OOBExclude2

I have tested this within our lab, using a clean slate, and it worked. Please let me know if you have any questions. Thanks!

I believe this should transfer over to SP 2013 also…

' THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF

' ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO

' THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A

' PARTICULAR PURPOSE.

' It is highly recommended that you FULLY understand what this code is doing
' and use this code at your own risk.

' Copyright (C) Microsoft Corporation, 2013