I have not seen too much in the way of guidance for monitoring and tuning content feeding for SharePoint 2013. This post will cover isolating the bottleneck in your feeding chain. In a future post, I will cover how to address these bottlenecks. But, in most cases you will simply need to add more instances of a component or hardware resources. If you are familiar with the content feed monitoring and tuning process for Fast Search for SharePoint 2010, the process is basically the same for SP2013. This article is starting for monitoring the performance of content feeding in SharePoint 2013. The focus is on the SharePoint 2013 counters; as guidance for system level counters is already provided elsewhere.
Note: some of the thresholds are a rough estimate and may be replaced by more formal product documentation or field experience in the future.
The main components in the feeding chain are:
Content Processing Component
Here are some useful counters to monitor for each component.
Search Gatherer Projects -SharePointServerSearch\Transactions Waiting
Search Gatherer Projects -SharePointServerSearch\Transactions In Progress
Search Gatherer Projects -SharePointServerSearch\Transactions Completed
Search Gatherer - SharePointServerSearch\ThreadsAccessing Network
Search Gatherer -SharePointServerSearch\Threads Filtering
Search Gatherer -SharePointServerSearch\Threads Idle
Search Submission Service\# Pending Items
Search Flow Statistics\# Items Queued ForProcessing
Search Flow Statistics\Input Queue Empty Time
Search Flow Statistics\Input Queue Full Time
When we begin content tuning we need to isolate the bottleneck in the feeding chain. I find it useful to approach this as a two step process. First we want to determine if the bottleneck occurs prior to the search subsystem (Upstream) or after hitting the search subsystem (Downstream). You could also look at these two pieces as content acquisition and content processing.
Content Acquisition consists of:
Content Processing consists of:
A good first counter to look at is Search Gatherer Projects - SharePointServerSearch\Transactions Waiting for all of the Crawl Components. If this counter is low (less than a few thousand), content processing is keeping up with content acquisition. If the counter is high and/or consistently rising, then we are pushing more data than Content Processing can keep up with. This is also visable in the Crawl Health report by viewing the queue length.
In the case where Content Processing is slow, we have two possibilities: the Content Processing Components are the bottleneck or the Index Components are the bottleneck. Both Search Submission Service\# Pending Items and Search Flow Statistics\# Items Queued For Processing will be high (greater than a few hundred).
Processor utilization will be high on servers running Content Processing Components
Search Flow Statistics\Input Queue Full Time will be low (less than about 1000)
If Index Components are the bottleneck
There will be high disk I/O and/or latency on servers running Index Components
Search Flow Statistics\Input Queue Full Time will be high (greater than about 1000)
When Content Acquisition is slow, there are three possible bottlenecks: Content Source, Crawl Database, Crawl Component
If Crawl Database is the bottleneck
Search Gatherer Projects - SharePointServerSearch\Transactions Completed will be high (greater than a few hundred)
There will be high disk queue length / disk latency on Crawl DB
If Crawl Component is the bottleneck (not common)
There will be high processor utilization on Crawl Component servers
There will be no disk latency issue on Crawl DB
If Content Source is the bottleneck
Search Gatherer - SharePointServerSearch\Threads Accessing Network will be close to Search Gatherer - SharePointServerSearch\Threads Filtering
There will be low processor utilization on Crawl Component servers
There will be no disk latency issues on Crawl DB
Mikkel mentioned that Search Gatherer Projects - SharePointServerSearch\Transactions Waiting does not always indicate CPC cannot keep up with the flow. In the case of a large web crawl or where the document download is the bottleneck, this counter can be high while the CPC has capacity.
Good stuff Peter! I'm noticing very strang numbers coming back from Search Flow Statistics - # Items Queued For Processing - Content Processing Component, on one of my servers (I have two servers with the crawl, content processing and analytics processing roles). It keeps suggesting that there are 4,294,967,295 items queued, while the 2nd server tops out at 133.
Both are at the same patch level (Dec 2013) and are spec'ed identically.
FYI I'm also seeing an imbalance in the number of items assigned to the two crawlers...
can you post a sample powershell please