On the last session, back in July last year we saw an issue that on the first perspective looks like an ISA Server issue, however it end up to be another component. The scenario that I’m going to describe now is really something that could be interpreted as an issue with ISA Server, however only was analysis was it possible to prove that ISA Server was actually the victim.
2. Scenario – Server’s Performance Gradually Become Less Responsive Until it Stops Accepting Incoming Connections
That was the description that customer gave us, and additionally he said that if he stops the Microsoft Firewall Service the server starts accepting new connections. During the time of the issue the following event was also appearing in the Application Log:
Event Type: Error
Event Source: Microsoft Firewall
Event Category: None
Event ID: 21280
Time: 11:23:25 PM
The size of the free non-paged pool fell below the system-defined minimum. ISA Server will reject new connections unless they are initiated by the ISA Server computer. In addition, the timeout for idle TCP connections was reduced. This alert may indicate an attack on the ISA Server computer. Use the log viewer to examine ISA Server traffic. This event may also be raised when the ISA Server computer does not have the resources needed to handle legitimate traffic. If this is the case, you may need to add more memory to the system.
As the event says, this could be potentially an issue with ISA Server, but usually this event is caused by something outside of ISA. we needed to narrow it down to understand why we were receiving this error.
2.1. Collecting Data
Since we were dealing with a server’s performance degradation in general, we were still unsure that ISA Server was the real root cause for this problem. We had to collect data from the operating system to analyze the performance in general. Here what the action plan was:
Preparing the Server
1) Configure the system to get a Kernel memory dump when the server became unresponsive. To do this configure the server according to KB244139.
2) Configure the system to get PoolMon. Use KB177415.
3) Configure PerfMon. Use the objects suggested by the article “Monitoring and Troubleshooting Performance” on Microsoft TechNet.
When the issue happens
1) Use the keyboard combination to get the dump as mentioned on KB244139. This will generate a blue screen (which is expected).
2) Right after the server comes up run the Setup/Perf MPSReports.
3) Get the PoolMon result.
4) Get the Perfmon result.
Even without having the Microsoft Internal symbols you will still be able to review some key data using the public symbols. Download the debugging tools and configure the symbols according to the Debugging Tools and Symbols: Getting Started Page in Windows Hardware Developer Central.
Reviewing the dump one of the main points that came up was the excessive use of Non Paged Pool, there the result:
0: kd> !vm
*** Virtual Memory Usage ***
Physical Memory: 917316 ( 3669264 Kb)
Page File: \??\C:\pagefile.sys
Current: 2095104 Kb Free Space: 2085964 Kb
Minimum: 2095104 Kb Maximum: 4190208 Kb
Page File: \??\E:\pagefile.sys
Current: 4193280 Kb Free Space: 4187508 Kb
Minimum: 4193280 Kb Maximum: 4193280 Kb
Available Pages: 452869 ( 1811476 Kb)
ResAvail Pages: 835185 ( 3340740 Kb)
Locked IO Pages: 685 ( 2740 Kb)
Free System PTEs: 185982 ( 743928 Kb)
Free NP PTEs: 2036 ( 8144 Kb)
Free Special NP: 0 ( 0 Kb)
Modified Pages: 213 ( 852 Kb)
Modified PF Pages: 213 ( 852 Kb)
NonPagedPool Usage: 63071 ( 252284 Kb)
NonPagedPool Max: 65280 ( 261120 Kb)
********** Excessive NonPaged Pool Usage *****
PagedPool 0 Usage: 24312 ( 97248 Kb)
PagedPool 1 Usage: 846 ( 3384 Kb)
PagedPool 2 Usage: 824 ( 3296 Kb)
PagedPool 3 Usage: 866 ( 3464 Kb)
PagedPool 4 Usage: 859 ( 3436 Kb)
PagedPool Usage: 27707 ( 110828 Kb)
PagedPool Maximum: 90112 ( 360448 Kb)
Shared Commit: 4379 ( 17516 Kb)
Special Pool: 0 ( 0 Kb)
Shared Process: 5009 ( 20036 Kb)
PagedPool Commit: 27771 ( 111084 Kb)
Driver Commit: 1899 ( 7596 Kb)
Committed pages: 486017 ( 1944068 Kb)
Commit limit: 2450007 ( 9800028 Kb)
In addition to checking the pool usage in the kernel dump (using !poolused) you can also use the PoolMon tool to see which drivers might be leaking. On this particular case the tags were:
Pool Tag: Vmem
Mapped Driver: Unknown Driver
Pool Tag: Ddk
Mapped Driver: <unknown>
Those tags do not belong to Microsoft products and you can find that by using the article “How to find pool tags that are used by third-party drivers” at Microsoft Help and Support.
To analyze the Performance Monitor data you can follow the recommendations of the article “Analyzing performance data” at Microsoft TechNet and review the Operating System side. In this particular case, the ISA Server process (wspsrv.exe) was not showing any kind of leaking behavior or excessive use of memory or CPU.
4. The Culprit
With those pieces of the puzzle in hands we could match such behavior with the following two KB articles:
This particular product was installed on the server and the problem was starting to make sense at this point.
It makes sense that after the customer stopped the Microsoft Firewall service the server started working normally. The fact is that if the Operating System itself runs out of the Non Paged Pool Resources it will trigger the LowNonPagedPoolCondition which will cause ISA Server to stop to accept new connections. This doesn’t cause issues only in ISA Server, this also happens with other products, such as IIS (per KB933844).
The ISA Server was just a victim in this situation and the workaround was to uninstall the antivirus product for immediate relief from this problem in this customer’s environment (as per KB923125). TrendMicro has released a hotfix that addresses this problem.
Security Support Engineer – ISA Server Team – Microsoft Texas
I too had this problem with Trend Micro OfficeScan, not only on our ISA server, but Exchange and SQL too. Trend has released two patches since then, and these solve many problems, including this one. (Patch 1.1 and Patch 2. You only need the latest patch, Patch 2, they're cumulative)
Nice workflow Yuri...surprising how many times "ISA problems" are not really ISA problems! :-)
Not really the point, but this is also a good reason not to put file level anti-virus on your ISA Servers unless you have a really good reason to do so...
Thanks Jason !!!
That's absolutely right. There is also a good post from Dr. Tom Shinder about have AV or not on ISA box:
SECURITY Security Tip of the Month: Initial Considerations for Secure Deployment http://go.microsoft.com/?linkid=8514049