1. Introduction
As I mentioned on my post Get Used to IAG 2007, this is a powerful tool that allows internal resources to be available to Internet users in a secure manner. IAG 2007 provides SSL-based application access and protection with endpoint security management, granular access control and deep content inspection. Here the classical picture that shows the core idea of the IAG:
Figure 1 – IAG allowing access from anywhere to internal resources.
On this post I’m going to show you the step by step to create a Portal and add the OWA 2007 application to this portal.
2. Creating the Application Portal
You can publish OWA 2007 through IAG 2007 via the Application Portal or individually, just like ISA Server does. For this step by step we are going to use the Application Portal. Let’s start the configuration:
1) Launch the IAG Configuration Console (Start / Programs / Whale Communication IAG / Configuration).
2) Enter your password to open the console and click OK.
3) Right click on the HTTPS Connections and click on the option New Trunk.
4) Select the option Portal Trunk (remember that we are going to publish the Application Portal and then the OWA within the portal) and click Next.
5) On the Step 2 window, the options will be filled like this and then click in Next:
Figure 2 – Step 2 of Creating the Portal Trunk Wizard.
Note: The IP address that I’m using on this example is not an Internet valid IP. Usually when you have the IAG on the edge of your network you will use an IP address that is valid.
6) On the Authentication window we are going to select which server will be used to authenticate the user. Click in add and select the option AD, click Select and you window should appear like this:
Figure 3 – Configuring the Authentication Servers.
7) Click in Next on this window.
8) On the certificate screen you should select which certificate you are going to use for the portal. In this case I’m using a wildcard certificate as showed on the screen below:
Figure 4 – Selecting the certificate that will be used to the Portal.
Note: It is important to emphasize that in this example we already have the certificate created and installed on the local computer. If you don’t have the certificate installed on the local computer you will not see the certificates listed there.
9) Click in Next on this window to continue and then you will have the Endpoint Policies selection as show below:
Figure 5 – Endpoint Policy Selection.
This window allows you to choose two types of endpoint policies:
· Session Access Policy: Allow you to configure compliance permissions to access this site.
· Privileged Endpoint Policy: Allow you to control the conditions for the session (ex.: session timeout).
9) Leave the default options and click in Finish on this window.
Now we have the portal created and the console will appears like this:
Figure 6 – Application Portal and the options that were configured during the Wizard.
After that you need to activate your configuration, to do that select the option Activate on the File menu. Click on the Activate button and then click OK.
On the client workstation I added the iag.contoso.com on the HOST file mapping to the external IP of my IAG 2007 Server (192.168.0.1). After typing the URL https://iag.contoso.com the following screen will appear:
Figure 7 – Portal logon screen.
Since there is no application published through this Portal the screen that will appears after the user logon will be similar to this one:
Figure 8 – First page after logon.
As you can see there are a lot of things to explore and even if you don’t do too much, the default configuration already take care of a lot of things related to security. For example, if the session stays inactive for 300 seconds the window below will appear when the last 60 seconds are counting:
Figure 9 – Session Timeout warning pop up window.
On the next two sessions I will cover the following topics:
· Adding OWA 2007 into the Application Portal
· Redirecting HTTP to HTTPS traffic
BPOS is growing in a fast pace and as IT Admins starts to use this service they need to adjust their Firewall in order to proper allow the traffic to traverse the on-premise clients to the cloud. Microsoft Online Services did a good job documenting what needs to be in place from the Firewall perspective to allow this traffic to correctly flow. Here are the main articles for this type of deployment:
KB2410859 Firewall prevents users from using Microsoft Online Services Directory Synchronization, rich clients, or the Microsoft Online Services Identity Federation Management tool in Office 365
KB2409256 You cannot connect to Lync Online, or certain features do not work, because an on-premises firewall blocks the connection
Both articles mention ISA Server as an example and they also mention that for ISA you may need to use Firewall Client in order to make this deployment to work. If you use Firewall Client, nothing else needs to be done on the client workstation, however, if you don’t want to install Firewall Client you will need to edit the file Program Files\Microsoft Online Services\Sign In\SignIn.exe.config and add the entry below:
<system.net>
<defaultProxy useDefaultCredentials="true">
<proxy usesystemdefault="True" />
</defaultProxy>
</system.net>
Source: http://technet.microsoft.com/en-us/library/ee832722.aspx
2. Scenario
Consider a scenario that you have all the implementations in place, rules are correctly configured on ISA Server as per KB2410859 and have Firewall Client on the workstation, however the issue persists and on ISA log you see access denied due anonymous request. When closely look to the detailed logging (Monitoring/Logging/Lower Pane) you see that no rules appear in there, which means that the request is getting processed in lower level mode (kernel).
3. Solution
The problem here was caused because the option below was enabled:
When you enable this option you might have issues with a variety of applications (not only BPOS), because this option completely disable Anonymous access for Web Proxy requests on the network. This application forces the user’s credential to be requested even before the firewall policy is starting to get evaluated. This is the reason why when you enable this option you receive the warning below:
As you can see on this warning window, this option can cause compatibly issue with applications such as Windows Update (and I found out that with BPOS too). In order to avoid compatibly problems, disable this option and make sure to control your user access via Firewall Policy. There are many other scenarios where we recommend to disable this option, see this article for more information. After disabling this option the user was able to login:
Have a good migration to the Cloud!!
Last week I presented a session on MVP Summit in Redmond about Troubleshooting TMG Performance issues. During that presentation I said to the MVPs there that I will be writing a cheat sheet with some WinDBG commands that can be used while troubleshooting TMG performance issues. I thought about this type of document and concluded that this content can have a base framework but it should be expanded and enhanced by the community. Having said that, I decided to write this article in two places:
Enjoy it !!
The Microsoft Windows Server 2008 Event Viewer is a whole new program inside the Operating System, the changes made to it were completely significant and rich in new features. There are so many things that you can now do with Event Viewer that it is worth to take some time off and play with it. The new Event Viewer in Windows Server 2008 bring also new security capabilities for auditing and more in depth explanation of the events. In this area my recommendation is that you read the following article Auditing and Compliance in Windows Server 2008 from TechNet Magazine.
I’m also pointing out about this because recently I worked again in an ISA case where the infamous 5783 was happening and again the challenge was to get the data while the issue was happening. During the call I was explaining that the new eventmon can assist a lot on that since we can attach an action to the event, as you can see below:
Obviously the "wow" came out due this feature that we asked so much for many years and the “what” was followed by the statement: so are you saying that TMG still have this problem?
Let me clarify this once more: there is no bug when ISA Server lose the secure channel with the DC, there is no option to turn on or turn off this error. This problem can happen due many circumstances as I explained and demo on my blog about that. The fact is that if the circumstances are still in place, the 5783 can potentially happen in TMG. The old MaxConcurrentAPI registry key is still there in Windows Server 2008 and can be used to tuning authentication performance as you can see in the “Increase the Number of NPS Concurrent Authentications” article.
So what it is our hope to once for all stop dealing with this problem? Well, the main hope is that the companies start to use a Web Browser that supports Kerberos authentication, such as Internet Explorer 7 or higher. This can dramatically decrease the authentication pressure in ISA and in the DC, making this problem go away.
While many Microsoft Application Servers such as Exchange take full advantage of /3GB switch in boot.ini, ISA Server can be seriously affected if you use this switch on it. Before we discuss what this switch does and how can affect ISA Server I strongly recommend you to read this great post from Windows Performance Team. After read that, we can be in the same page to talk about the side effects on ISA Server.
2. Why this is bad for ISA?
When you add /3GB on boot.ini what you are telling to the Operating System is to use 1GB for Kernel and 3GB for User Mode space. The problem is that the main ISA Server service runs in Kernel Mode (fweng) and limiting this driver to use only 1GB can be very dangerous. Besides when you add this switch you also reduce the memory available for:
• Nonpaged Pool
• Paged Pool
• System Page Table Entries (PTEs)
Considering that ISA Server is the default gateway for many of your networks, during a period of high network activity, the usage of non-paged pool memory may cause the server to stop responding because there will be no more memory available. The reason why is because when you use /3GB the maximum size of the nonpaged pool reduces from 256 MB to 128 MB.
3. ISABPA is your friend
Many people still not aware of how powerful ISA BPA can be when the subject is: identify what is wrong and tell you about it. When you run a Health Check on your ISA Server using ISABPA and this machine has /3GB the following alert will show up in the report:
Figure 1 – ISABPA alerts about /3GB.
Don’t know how to download ISABPA, easy: now can just type www.isabpa.com and you will be redirected to the download page for ISABPA.
4. Conclusion
Many myths are behind /3GB switch and many system admin still think that this could be beneficial for all servers’ regardless; untrue statement. Each server/application has its own needs and you need to carefully analyze and understand these needs before add such switch in your boot.ini.
Recently I worked in a case where the ISA Server was not used only as an edge firewall but also as main concentrator of “ALL” network traffic. While this could really sound nice in terms of central management and security awareness of the traffic this also imposes other risks in the network availability. The topology used was similar to the one below:
Figure 1 – Topology used in this case.
As you can see above, every single traffic between client and any other server needs to traverse ISA Server. Well, let’s see the take away from this case and the lesson learned.
2. Hitting the Connection Limit
First problem that we faced in this case was the connection limit being hit all the time to the domain controllers segment. The amount of TCP request per minute going to the domain controller segment on this scenario was huge. By default, ISA Server limits the number of TCP requests per client to 600 per minute and some DCs were answering way more than that per minute. Result: ISA Server was thinking that we were under a flood attack and triggered the TCP connect requests per minute, per IP address. The other problem on the same line was the number of TCP concurrent connections per IP address, which are 160 by default. We were slamming the DCs again in this area and therefore again triggering the flood attack.
3. Action and Reaction
The ISA Server was hitting those limits and therefore it needs to take an action to avoid that to happen, since it thinks that it is dealing with an attack. To control that it blocked the communication with the domain controller. Well, in this case the domain controller was also the PDC Emulator (ouch!). This means that every operation that needs to be address by the PDC Emulator it was compromised, for example: if the user needs to change the password, the PDC Emulator needs to be used for that. Since the users were not able to reach the DC they start to have logon problems (slow logon or sometimes unable to logon at all).
On top of that, for the users that were already logged in, they start to have problems to browser Internet. Since ISA Server started to have problems with the DCs it started to reset the secure channel which caused the 5783 event happen and therefore the infamous authentication window being prompt for the user. Result: chaos in the environment and the sky was falling.
4. Alleviating the Problem
After intense research and understand the whole picture it was possible to conclude what was going on and then take an action to alleviate the problem. The immediate remedy was: let’s bump it up all those connections limits and add the DCs in the exception list. The result: It helped, but it didn’t fix though.
The fact is that the scenario was like an active volcano that could potentially blow up again very soon. Because of that it was much better to rearrange those DCs and let them communicate directly with the clients.
5. Tuning TCP on ISA Server
While the network re-design helped to reduce that amount of traffic, we were still being hit by 5783 at least once a month, causing again the infamous authentication window prompt to the final user. Analyzing the traffic and data that were passing through this ISA Server we could see tons of TIME_WAIT hanging on ISA Server computer. To tweak TCP timeouts we adjusted the following parameters:
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]"TcpTimedWaitDelay"=dword:00000028
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]"StrictTimeWaitSeqCheck"=dword:00000001
While changing this parameter the following important points needs to be considered:
· Changing these values requires a reboot. Plan to do that out of your production hours.
· TcpTimedWaitDelay is 2 minutes by default, even if the value is not present in the registry.
· You must set the StrictTimeWaitSeqCheck to 0x1 or the TcpTimedWaitDelay value will have no effect.
Note: for more information on TcpTimedWaitDelay review the article about it on Microsoft TechNet.
6. Conclusion
We all want to protect our assets and sometimes we overlook the results of tight too much the communication. There are many ways that we can be secure and available at the same time and Microsoft does have a good architecture proposal for that called Domain Isolation. If you don’t know or never read about this it is really important to start to do so. Put your ISA Server to concentrate all the traffic might be interesting on your perspective, however it could cause those headaches that are completely avoidable.
With Domain Isolation you can still have your ISA Server protecting your edge, while you have Windows Firewall capabilities on the DCs and beyond that; you have protection on the network layer using IPSec. That is a good example of securing your assets in a multilayer approach.
Another document that can help you to design a secure Active Directory deployment is called Best Practice Guide for Securing Active Directory Installations and Day-to-Day Operations: Part I, from Microsoft TechNet. There you will see the following statement:
“…Placing the domain controller behind a standalone router helps prevent Internet users from directly accessing the domain controller. To help minimize security risks, place the domain controller on the same network segment as the client computers…”
This is something that when we collaborate with Directory Services Team they beat really hard. Since we have other mechanisms to protect the communication between client and DCs (domain isolation is the key), why adding a firewall in between them? It probably will cause more issues then benefits and it will not guarantee the same level of protection that Domain Isolation proposes.
Consider a scenario where the TMG administrator is publishing servers that are behind TMG and after enabling NLB on the External interface the users are not able to access those resources. If he uses the DIP (Dedicated IP) to publish the resource it works. The basic diagram is showed in the figure below:
The traffic from inside (VLAN E) to outside resource (for example to VLAN A) using the internal VIP (10.10.10.1) as default gateway was working, however the incoming traffic (from VLAN A for instance) to the external VIP (192.168.4.10) was failing.
Troubleshooting
There is no clever troubleshooting here, always start from the basics in scenarios like this, which basic? Try enabling live logging and see the traffic is ever arriving to TMG in the first place. In this case was not but, to be in the safe side we installed netmon in all three nodes to see if the traffic was hitting the box at all. It was not, no traffic arriving on TMG.
Resolution
After reviewing the core switch we were able to see that the NLB MAC address in there was wrong, hence the traffic was never arriving on TMG. This was a Cisco 6500 Switch Series and we follow Cisco’s recommendation below in order to address this issue:
Cisco Catalyst 6500 Series Switches - Catalyst Switches for Microsoft Network Load Balancing Configuration Example http://www.cisco.com/en/US/products/hw/switches/ps708/products_configuration_example09186a0080a07203.shtml
Additional Info
TMG NLB leverages Windows NLB and all the recommendations that we have on Windows NLB applies to TMG NLB. In other words, if Windows NLB has restrictions, TMG will obey those restrictions. Here are some important points to remember when implementing NLB:
Switch is operating in Layer-3 mode
NLB is not supported when the hosts are homed to a switch operating at Layer-3. Instead, create a VLAN for all the nodes in the NLB cluster, and configure that VLAN to operate in Layer-2 mode.
An unusual number of TCP connections to the cluster are being reset.
Possible Causes:
See more “gotchas” at http://download.microsoft.com/download/3/2/3/32386822-8fc5-4cf1-b81d-4ee136cca2c5/NLB_Troubleshooting_Guide.htm
…and always remember the 5 commandments when troubleshooting NLB on TMG:
1. Never assume that the problem is on TMG in the first place. 2. Never think that because your network infrastructure worked fine for years without problems that is free of issues. 3. Do not take it personal if someone says: you’ve got a problem on your switch. 4. Always have a hub available for testing purpose, sometimes having a dumb device to validate NLB functionality can save much more time than dealing with smart devices for hours. 5. NLB Unicast causes switch flooding, don’t be surprise by having flooding after enabling NLB on TMG and blame TMG, this is how Windows NLB works.
Consider a scenario where you are publishing a third party web server, in this case an Apache Server that uses HTTPS through ISA Server 2006. Randomly the site doesn’t work, clients are unable to access it and when this happens the publishing rule test button shows the result below (error 0x80090326):
Notice that this error talks about a server certificate error, so clearly it is something during the SSL process. Reviewing the Data Using ISA Data Packager in repro mode (web proxy / web publishing template) was possible to collect simultaneous netmon traces from both NICs (internal and external). During those traces it was possible to see that the SSL handshake on the external interface using the certificate that was bound to the Web Listener on ISA was working just fine. Reviewing the SSL Handshake on the internal interface, while ISA was negotiating with the published server (Apache) we had a failure. Here it is the moment of the failure: ISA APACHE TCP TCP:Flags=......S., SrcPort=24433, DstPort=443, PayloadLen=0, Seq=3108278462, Ack=0, Win=65535 ( ) = 65535 APACHE ISA TCP TCP:Flags=...A..S., SrcPort=443, DstPort=24433, PayloadLen=0, Seq=2120534540, Ack=3108278463, Win=5840 ( Scale factor not supported ) = 5840 ISA APACHE TCP TCP: Flags=...A...., SrcPort=24433, DstPort=443, PayloadLen=0, Seq=3108278463, Ack=2120534541, Win=65535 (scale factor 0x0) = 65535 After finishing the TCP Handshake they start the SSL handshake and this is done by ISA sending the SSL Client Hello as shown below: ISA APACHE TLS TLS:TLS Rec Layer-1 HandShake TLSSSLData: Transport Layer Security (TLS) Payload Data - TLS: TLS Rec Layer-1 HandShake - TlsRecordLayer: TLS Rec Layer-1 HandShake ContentType: HandShake - Version: TLS 1.0 Major: 3 (0x3) Minor: 1 (0x1) Length: 88 (0x58) - SSLHandshake: SSL HandShake ClientHello(0x01) HandShakeType: ClientHello(0x01) Length: 84 (0x54) - ClientHello: TLS 1.0 + Version: TLS 1.0 + RandomBytes: SessionIDLength: 16 (0x10) SessionID: Binary Large Object (16 Bytes) CipherSuitesLength: 22 + TLSCipherSuites: TLS_RSA_WITH_RC4_128_MD5 { 0x00,0x04 } + TLSCipherSuites: TLS_RSA_WITH_RC4_128_SHA { 0x00,0x05 } + TLSCipherSuites: TLS_RSA_WITH_3DES_EDE_CBC_SHA { 0x00,0x0A } + TLSCipherSuites: TLS_RSA_WITH_DES_CBC_SHA { 0x00,0x09 } + TLSCipherSuites: TLS_NTRU_NSS_WITH_AES_256_CBC_SHA { 0x00, 0x64 } + TLSCipherSuites: TLS_NTRU_NSS_WITH_3DES_EDE_CBC_SHA { 0x00, 0x62 } + TLSCipherSuites: TLS_RSA_EXPORT_WITH_RC4_40_MD5 { 0x00,0x03 } + TLSCipherSuites: TLS_RSA_EXPORT_WITH_RC2_CBC_40_MD5 { 0x00,0x06 } + TLSCipherSuites: TLS_DHE_DSS_WITH_3DES_EDE_CBC_SHA { 0x00,0x13 } + TLSCipherSuites: TLS_DHE_DSS_WITH_DES_CBC_SHA { 0x00,0x12 } + TLSCipherSuites: TLS_NTRU_NSS_WITH_AES_128_CBC_SHA { 0x00, 0x63 } CompressionMethodsLength: 1 (0x1) CompressionMethods: 0 (0x0) ExtensionsLength: 5 (0x5) Right after that Apache sends a SSL Encrypt Alert error as shown below: APACHE ISA TLS TLS:TLS Rec Layer-1 Encrypted Alert TLSSSLData: Transport Layer Security (TLS) Payload Data - TLS: TLS Rec Layer-1 Encrypted Alert - TlsRecordLayer: TLS Rec Layer-1 Encrypted Alert ContentType: Encrypted Alert - Version: TLS 1.0 Major: 3 (0x3) Minor: 1 (0x1) Length: 2 (0x2) EncryptedData: Binary Large Object (2 Bytes) 15 03 01 00 02 02 2F ....../ By looking to the last two bytes of the hex value under Encrypted data we can find the meaning of the alert: o 2F in Hex = 47 in Decimal o 47 maps to illegal_parameter(47) error according to TLS RFC (http://www.ietf.org/rfc/rfc2246.txt?number=2246) Note: thanks to sudeepg for this nice approach reading SSL Encrypt Alert. Apache Server is saying that the TLS SSL Client Hello sent by ISA as an illegal parameter for this SSL negotiation. Conclusion This problem can happen because MS10-049 which is installed on ISA Server. As a temp workaround this update was removed and the issue got resolved on this particular scenario. However, ultimately you should not remove this update; you should talk to the third party company web server admin and discuss the CVE-2009-3555 with him and how their product adequate for that. If you are publishing an IIS Server you might have this issue too, if you do, read the post below to see how to fix it: http://blogs.msdn.com/b/jpsanders/archive/2010/09/08/understanding-problems-with-ms10-049-kb-980436-and-ietf-rfc5746.aspx
Notice that this error talks about a server certificate error, so clearly it is something during the SSL process.
Reviewing the Data
Using ISA Data Packager in repro mode (web proxy / web publishing template) was possible to collect simultaneous netmon traces from both NICs (internal and external). During those traces it was possible to see that the SSL handshake on the external interface using the certificate that was bound to the Web Listener on ISA was working just fine. Reviewing the SSL Handshake on the internal interface, while ISA was negotiating with the published server (Apache) we had a failure. Here it is the moment of the failure:
ISA APACHE TCP TCP:Flags=......S., SrcPort=24433, DstPort=443, PayloadLen=0, Seq=3108278462, Ack=0, Win=65535 ( ) = 65535
APACHE ISA TCP TCP:Flags=...A..S., SrcPort=443, DstPort=24433, PayloadLen=0, Seq=2120534540, Ack=3108278463, Win=5840 ( Scale factor not supported ) = 5840
ISA APACHE TCP TCP: Flags=...A...., SrcPort=24433, DstPort=443, PayloadLen=0, Seq=3108278463, Ack=2120534541, Win=65535 (scale factor 0x0) = 65535
After finishing the TCP Handshake they start the SSL handshake and this is done by ISA sending the SSL Client Hello as shown below:
ISA APACHE TLS TLS:TLS Rec Layer-1 HandShake
TLSSSLData: Transport Layer Security (TLS) Payload Data
- TLS: TLS Rec Layer-1 HandShake
- TlsRecordLayer: TLS Rec Layer-1 HandShake
ContentType: HandShake
- Version: TLS 1.0
Major: 3 (0x3)
Minor: 1 (0x1)
Length: 88 (0x58)
- SSLHandshake: SSL HandShake ClientHello(0x01)
HandShakeType: ClientHello(0x01)
Length: 84 (0x54)
- ClientHello: TLS 1.0
+ Version: TLS 1.0
+ RandomBytes:
SessionIDLength: 16 (0x10)
SessionID: Binary Large Object (16 Bytes)
CipherSuitesLength: 22
+ TLSCipherSuites: TLS_RSA_WITH_RC4_128_MD5 { 0x00,0x04 }
+ TLSCipherSuites: TLS_RSA_WITH_RC4_128_SHA { 0x00,0x05 }
+ TLSCipherSuites: TLS_RSA_WITH_3DES_EDE_CBC_SHA { 0x00,0x0A }
+ TLSCipherSuites: TLS_RSA_WITH_DES_CBC_SHA { 0x00,0x09 }
+ TLSCipherSuites: TLS_NTRU_NSS_WITH_AES_256_CBC_SHA { 0x00, 0x64 }
+ TLSCipherSuites: TLS_NTRU_NSS_WITH_3DES_EDE_CBC_SHA { 0x00, 0x62 }
+ TLSCipherSuites: TLS_RSA_EXPORT_WITH_RC4_40_MD5 { 0x00,0x03 }
+ TLSCipherSuites: TLS_RSA_EXPORT_WITH_RC2_CBC_40_MD5 { 0x00,0x06 }
+ TLSCipherSuites: TLS_DHE_DSS_WITH_3DES_EDE_CBC_SHA { 0x00,0x13 }
+ TLSCipherSuites: TLS_DHE_DSS_WITH_DES_CBC_SHA { 0x00,0x12 }
+ TLSCipherSuites: TLS_NTRU_NSS_WITH_AES_128_CBC_SHA { 0x00, 0x63 }
CompressionMethodsLength: 1 (0x1)
CompressionMethods: 0 (0x0)
ExtensionsLength: 5 (0x5)
Right after that Apache sends a SSL Encrypt Alert error as shown below:
APACHE ISA TLS TLS:TLS Rec Layer-1 Encrypted Alert
- TLS: TLS Rec Layer-1 Encrypted Alert
- TlsRecordLayer: TLS Rec Layer-1 Encrypted Alert
ContentType: Encrypted Alert
Length: 2 (0x2)
EncryptedData: Binary Large Object (2 Bytes)
15 03 01 00 02 02 2F
....../
By looking to the last two bytes of the hex value under Encrypted data we can find the meaning of the alert:
o 2F in Hex = 47 in Decimal
o 47 maps to illegal_parameter(47) error according to TLS RFC (http://www.ietf.org/rfc/rfc2246.txt?number=2246)
Note: thanks to sudeepg for this nice approach reading SSL Encrypt Alert.
Apache Server is saying that the TLS SSL Client Hello sent by ISA as an illegal parameter for this SSL negotiation.
Conclusion
This problem can happen because MS10-049 which is installed on ISA Server. As a temp workaround this update was removed and the issue got resolved on this particular scenario. However, ultimately you should not remove this update; you should talk to the third party company web server admin and discuss the CVE-2009-3555 with him and how their product adequate for that. If you are publishing an IIS Server you might have this issue too, if you do, read the post below to see how to fix it:
http://blogs.msdn.com/b/jpsanders/archive/2010/09/08/understanding-problems-with-ms10-049-kb-980436-and-ietf-rfc5746.aspx
Introduction
First let’s understand what silent quits means:
When a silent exit occurs, the JIT debugger is never invoked because the process itself asked to be terminated. For example, two Win32 Application Programming Interface (API) functions that perform this action are TerminateProcess and ExitProcess.
From: http://support.microsoft.com/kb/329629
Note: Although this article is for Exchange these functions are Windows (Win32) related.
What about graceful shutdown, what is that? That’s simple: a service received an expected command to gracefully stop.
The Scenario
The scenario of this article was based on a real case where customer had to manually start Firewall Service every day, it was “apparently” quitting every night. The problem with a silent quitting is that debugger will not catch; therefore there will be no dump file to analyze. Even knowing that we tried to get a dump and of course the result was a 1st chance exception dump, no second chance. Therefore we got useless data.
Moving Forward
After researching more and more we found out that Telephony Service was set to disable and ISA Server Control depends on Remote Access Connection Manager that depends on Telephony Service:
Figure 1 – ISA Server Control Dependencies.
Looking the System Log, there following sequence of events were showing up:
Event Type: Information
Event Source: Service Control Manager
Event Category: None
Event ID: 7040
Date: 2/19/2009
Time: 10:09:05 PM
User: NT AUTHORITY\SYSTEM
Computer: ISASRVSTD
Description:
The start type of the Telephony service was changed from demand start to disabled.
Event ID: 7035
Time: 10:09:06 PM
The Microsoft Firewall service was successfully sent a stop control.
Event ID: 7036
Time: 10:09:16 PM
User: N/A
The Microsoft Firewall service entered the stopped state.
Time: 10:09:17 PM
The Microsoft ISA Server Control service was successfully sent a stop control.
The Microsoft ISA Server Control service entered the stopped state.
Time: 10:09:18 PM
The Remote Access Connection Manager service was successfully sent a stop control.
In the application log we got the prove that this was not a silent exit, it was actually a graceful shutdown:
Event Source: Microsoft ISA Server Control
Event ID: 14181
The ISA Server Control service was stopped gracefully.
Event Source: Microsoft Firewall
Event ID: 14182
The Firewall service was stopped gracefully.
Now What?
If those services are stopping every night and the administrator needs to manually start those, this leads to a conclusion that something (a process) is stopping it. For a domain joined ISA the first thing you shoul check is Group Policy. A simple thing that can be done without impact the production just to check if ISA Server is receiving any policy is run the command RSOP.MSC. The result for this case was shown in Figure 2:
Figure 2 – RSOP.MSC result.
Bingo !!! Now everything makes sense. What was happen here was that ISA Server was inside of an OU that has a policy which was disabling those services. To fix that we created a new OU, moved ISA Server to this new OU and block inheritance in this OU.
Sometimes IT administrators using their best of intention disable some services that are considered not necessary from a Windows perspective (attempting to hardening). However, for ISA Server this needs to be carefully done since it can stop Firewall Service which will cause downtime in your Internet access. Before do this, review the article below that has a list of services that ISA Server depends on:
http://technet.microsoft.com/en-us/library/cc302488.aspx
Hi Folks, I just want to drop a quick note here about KB http://support.microsoft.com/kb/2433623/ that brings the list of the updates that are part of the new Software Update 2 for Forefront Threat Management Gateway (TMG) 2010 Service Pack 1.
Enjoy it!!
As more I deal with Performance issue on ISA Server (or TMG), more I realize that there are not really a lot of new things on this area to be explored. The reason why I say that there are not much of new things on this area is because I can easily map the top five core causes of ISA/TMG stop responding requests and causing the “server hanging” symptom, which are:
· DNS – a wrong DNS configuration or a lack of response from the DNS Server can definitely cause issues on ISA. Please see the following related articles:
http://blogs.technet.com/b/isablog/archive/2009/08/27/side-effects-of-incorrect-dns-configuration-on-isa-server-10060-connection-timeout-scenario.aspx
http://blogs.technet.com/b/isablog/archive/2009/01/12/isa-server-2006-stops-answering-requests.aspx
· Authentication – if the DC doesn’t answer, ISA can’t authenticate and as result new authentication requests will start to accumulate. The infamous 5783/5719 scenario is a good example of that. Please see the following related articles:
http://blogs.technet.com/b/yuridiogenes/archive/2008/06/05/isa-server-losing-secure-channel-with-the-dc-the-5783-nightmare.aspx
· Logging – if we can’t log we will eventually stop responding. On ISA we go to lockdown mode, on TMG we start to write the LLQ files in the disk, which can fill up the disk and the server runs out of disk space, which will end up causing ISA/TMG stop responding. Please see the following related articles:
http://blogs.technet.com/b/yuridiogenes/archive/2008/08/06/intermittent-performance-problem-while-accessing-internet-through-isa-server-2006.aspx
· Disk – this is one key element, because if we have disk bottleneck, everything else will fall apart. Please see the following related articles:
http://blogs.technet.com/b/isablog/archive/2010/05/10/how-disk-bottleneck-can-affect-tmg-performance.aspx
· Antivirus – well, yeah…this is true. There are many elements here that can go wrong, for example: some antivirus also introduces firewall modules and cause conflicts with ISA/TMG firewall kernel engine, which is something that I already explained in here. Please see the following related articles:
http://blogs.technet.com/b/isablog/archive/2008/03/11/isolating-problems-that-seems-to-be-related-to-the-isa-server-part-iii.aspx
http://blogs.technet.com/b/yuridiogenes/archive/2009/07/18/isa-server-stop-answering-requests-and-firewall-service-hangs.aspx
As you can see this is a long list and the scenario that I’m about to describe on this post is a combination of all elements above and I like to call it: the perfect storm. What’s the symptom? The usual: ISA Server stop responding request and to fix ISA Admin have to restart Firewall Service.
2. Data Gathering
On this type of scenario the most common action plan is to gather perfmon, dump of the wspsrv.exe process and ISA Data Packager in repro mode. Here are the core steps:
a. Install ISABPA from www.isabpa.com
b. Configure Performance Monitoring with the following objects:
> ISA Server Firewall Packet Engine/*
> ISA Server Firewall Service/*
> ISA Server Web Proxy/*
> Memory/*
> Processor/*
> Network Interface/*
> Process/*
> Physical Disk/*
> Threads/*
Note: configure the maximum size file for 200MB, the refresh time to 15 seconds and configure Perfmon to stop when the log is full and create a new file (Schedule Tab).
c. Install the DebugDiag (download from the link below):
http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=28bd5941-c458-46f1-b24d-f60151d875a3
When the issue happens, the following steps needs to be done in order to gather the correct data:
a. Go to Start / Programs / Microsoft ISA Server / ISA Tools / ISA Data Packager
b. On the option "Collect data using one of the following repro scenarios", select "Web Proxy and Web Publishing" and click Next;
c. Click in Modify Options;
d. In addition to the options that are already selected please select also:
- ISA BPA
- ISA Info
e. Click in Start data collection
f. The Data Packager will start to run. When the option "Press spacebar to start the capture" appears, press the spacebar and repro the issue by trying to connect from the client workstation.
g. After you finishing testing then press space bar again in the ISA Data Packager console.
h. When ISA Data Packager finishes collecting data, open DebugDiag.
i. On the first Debugdiag screen (Select Rule Type) click cancel.
j. Go to Processes tab and look for the wspsrv.exe process.
l. While this window is open, go back to the workstation and try to connect again.
m. While the workstation is trying to connect, go back to debugdiag window, right click on wspsrv.exe process and choose Create Full User Dump.
n. Stop Perfmon counter.
3. Data Analysis
Having the correct data in hands you can start looking for obvious issues on the ISA BPA report. If there is nothing there that is relevant for this type of issue, move forward to analyze the perfmon or dump. In this particular case I’m going to review the dump first.
First step was to check if there was any thread locked in Critical Section:
0:000> !cs -l
-----------------------------------------
DebugInfo = 0x000c0fe8
Critical section = 0x011c9024 (+0x11C9024)
LOCKED
LockCount = 0x0
WaiterWoken = No
OwningThread = 0x0000113c
RecursionCount = 0x1
LockSemaphore = 0x0
SpinCount = 0x00000000
Notice that this thread is locked and we have the address of the owning thread; let’s see which thread is that:
45 Id: 1c38.113c Suspend: 1 Teb: 7ffa0000 Unfrozen
ChildEBP RetAddr Args to Child
36d8f354 7c827d29 77e61d1e 00000d28 00000000 ntdll!KiFastSystemCallRet
36d8f358 77e61d1e 00000d28 00000000 36d8f39c ntdll!ZwWaitForSingleObject+0xc
36d8f3c8 77e61c8d 00000d28 00004e20 00000000 kernel32!WaitForSingleObjectEx+0xac
36d8f3dc 74cd2e3f 00000d28 00004e20 00004e20 kernel32!WaitForSingleObject+0x12
36d8f408 6d56ddde 002fb5b0 0138d2a0 00000009 DBmsLPCn!ConnectionRead+0xaf
36d8f428 6d5687fc 0138f2c0 0138d2a0 00000009 dbnetlib!WrapperRead+0x2c
36d8f480 4e2597ce 0138f2c0 0138d2a0 0138d2a0 dbnetlib!ConnectionRead+0x519
36d8f4b4 4e25982d 0138f2c0 0138d2a0 00000009 sqloledb!CDataSource::ConnectionRead+0x35
36d8f500 4e252358 0138d06e 00000001 00000000 sqloledb!CDBConnection::GetBytes+0x269
36d8f54c 4e2555c4 011da180 00000088 0000001e sqloledb!CDBConnection::ProcessTDSStream+0x157
36d8f608 4e255691 011c5680 0000003d 011fb198 sqloledb!CStmt::ExecDirect+0x786
36d8f620 4e254d32 011c5680 0000003d 00000000 sqloledb!CStmt::SQLExecDirect+0x28
36d8f650 4e25517d 00000000 4e25321c 0000003d sqloledb!CCommand::ExecuteHelper+0x157
36d8f6d4 4e254c4b 011d4888 00000000 4bbea778 sqloledb!CCommand::Execute+0x76b
36d8f70c 4bbea64d 011e8550 00000000 4bbea778 sqloledb!CImpICommandText::Execute+0xdd
36d8f74c 4bc0c79b 011c8d78 011fb22c 011f8738 msado15!CConnection::Execute+0x9d
36d8f91c 4bbea4a7 011cfed8 00000000 011d16c8 msado15!_ExecuteAsync+0x19f
36d8f930 4bbea385 011cfed8 ffffffff 00000000 msado15!ExecuteAsync+0x23
36d8fa18 4bbea258 00000000 7c828200 00000000 msado15!CQuery::Execute+0xa5e
36d8fa84 4bc21717 011d16c8 00000000 7c828200 msado15!CCommand::_Execute+0x153
The yellow line in the second stack shows that the machine is submitting a SQL statement using the SQLExecDirect function. Now let’s see what SQL command is being executed:
0:000> du 011c5680
011c5680 "SELECT RTRIM(filename) FROM ISAL"
011c56c0 "OG_20101107_FWS_000..sysfiles"
Logs starting with FWS suffix represent the Firewall log; in this case ISA was querying the SQL database for this log. Now where is SQL located? According to ISAInfo collected by ISA Data Packager the Log was located on the D: drive, which was actually part of the same disk as C:, only in a different partition. Now it is time to review perfmon and see if we can match this with something going on from the disk perspective.
Here it is sample of the time where the issue was happening:
The black line represents the Average Disk Queue Length that goes from zero to 26 (maximum should be 2 per spindle - in this case we just have 1 spindle) and got stuck there from 1:36PM to 1:39PM. During the same time we see the ISA Server Firewall Packet Engine\Backlogged Packets goes from zero to 113 (maximum should never be higher than 10). The logic here is the following:
1. ISA is trying to query a firewall log located on the SQL (MSDE in this case) database. ISA is waiting on SQL.
2. SQL is performing a reading operation for a piece of information located in disk. SQL is waiting on Disk.
3. Disk is having bottleneck and it is queuing up requests.
4. Since ISA can’t proceed (since is waiting on disk), ISA starts to accumulate requests (backlog starts to grow). ISA stop answering new requests.
5. Clients can’t browse.
You might be thinking, but this is only for 3 minutes, I can live with that. Really? I doubt your helpdesk will not overflow of calls if nobody can browse Internet for 3 minutes.
I hope this post gives you a big picture of what goes behind an ISA (or TMG) performance issue in scenarios where ISA/TMG stops responding. There are much more elements that needs to be investigated other than ISA/TMG itself.
It is very interesting to me that many people didn’t fully realized yet the benefits of ISABPA. Certainly we already have lots of admins that use this tool, but do you know if you really use the full capability of this tool? This post will describe the most common scenarios of using ISABPA and how to take full advantage of it. In this first part of the post I’m going to discuss how ISA BPA can assist you proactively to mitigate possible issues.
2. Proactive Health Check
When you deploy ISA Server you should first of all plan, plan and plan. I worked in many cases where the ISA was installed like you install Microsoft Office, using NNF technology (Next, Next and Finish), no kidding. We all know that it is easy to install, but you need to collect information prior to deploy. Here some typical questions that can influence how you will size ISA for your environment:
· What type of scenario you plan to install ISA:
o Web Proxy?
o Firewall?
o VPN Access?
o Secure Publishing Server?
o All of them?
· What applications are you planning to publish through ISA?
o Exchange OWA?
o Outlook Anywhere?
o Sharepoint?
This is definitely not the complete list, is just an example of some questions that you should ask your customer (or yourself) when planning an ISA Server installation. After gather all the data, go ahead and use ISA Server Capacity Planner to see if you have the correct hardware for ISA.
Ok, but where ISA BPA comes in on this? I didn’t want to lose the opportunity to bring how important it is the planning phase; this is the reason why I started with that. ISABPA using Health Check option will be a post installation task.
Figure 1 – Starting a new scan.
The following screen shows ISABPA performing the scanning operation:
Figure 2 – Scanning in Progress
When this process finished you can click and view report and you will see (depends on the amount of warning or errors you have) a screen similar to Figure 3:
Figure 3 – ISA Report
This is an example of a pristine installation of ISA Server 2006 on top of Windows Server 2003 SP2 with some basic rules configured on it. Notice how many warnings I have and how many improvements I can make on this configuration. If you want more details about each one of those suggestions, just click on it and you will see what the recommendation is as shown in Figure 4:
Figure 4 – Details about the warning message.
If you want to see a hierarchal view plus more details about this configuration you can click in Tree Reports and you will have a view like the one below:
Figure 5 – Tree Reports View.
3. Conclusion
In this first part of the article I explained some advantages of using ISA BPA for a proactive work, next article I will show you how ISBPA Tools can assist you during a troubleshooting scenario.
This post is about a support call where customer was complaining that the ISA Server was stop answering requests every five days in average. Of course this is too broad statement and we have to narrow it down with more and more questions. The interrogatory started:
· When you say stop answering requests you are talking about Web Proxy requests?
o Answer: Internet doesn’t work.
· Can you access the server using Remote Desktop?
o Answer: no.
· Can you logon locally in the server?
o Answer: yes.
· Is it slow or doesn’t answer at all?
o Answer: it is extremely slow.
· What you do as workaround?
o Answer: I have to restart the machine.
· Do you gracefully shutdown/restart?
o Answer: no, it is so slow that I can’t click in Start/Shutdown.
That being said, let’s start gathering data.
2. Gathering Data
Since the issue was affecting the server itself and we were entering in hang mode it was necessary to get a full memory dump. To prepare the machine to get a full memory dump we follow KB244139 and also reviewed the main points of KB130536.
3. Analyzing
With the full memory dump in hands we can start to look at the possible root cause for this issue. First thing in this case is to check what locks are held on resources by threads. To do that we need to run the!locks command:
0: kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks....
Resource @ nt!CmpRegistryLock (0x808ad480) Shared 2 owning threads
Contention Count = 14
Threads: 89a30ca8-01<*> 899ef708-01<*>
Resource @ Ntfs!NtfsData (0xf7053590) Shared 2 owning threads
Contention Count = 1
Threads: 8a398020-01<*> 8a398b40-01<*>
KD: Scanning for held locks.
* ----- hiding other resources since it were hundreds of them ----- *
Resource @ 0x8a045534 Shared 2 owning threads
Contention Count = 1893
NumberOfSharedWaiters = 54
NumberOfExclusiveWaiters = 1
Threads: 89a30ca8-01<*> 8a127020-01<*> 89883ba8-01 89f5eb50-01
8985f020-01 897e3600-01 89ac0ba8-01 89fb9940-01
89834db0-01 8978c380-01 897ed020-01 897559d8-01
88b40bd0-01 898a7590-01 89a3b700-01 8a398b40-01
897c59a8-01 89ffbd48-01 898d4658-01 8983b4c0-01
897c8020-01 89a84168-01 8986d020-01 8a179ba8-01
899fe7a0-01 8a109db0-01 89a19be0-01 89a26c80-01
897c9020-01 88b3e740-01 88b1adb0-01 8a0ca428-01
89727b18-01 8a1d55a0-01 8a0d7db0-01 897ff7c8-01
8a0c0ba0-01 8a0c3b28-01 88aec960-01 88b17020-01
8971f020-01 88b14db0-01 88ae3890-01 88ae0aa0-01
88b0b430-01 88ada840-01 88b052d0-01 88b034d8-01
88ace738-01 88ac9020-01 8a106a20-01 88ac4b60-01
88abb350-01 88ab4db0-01 88aaeca8-01 8a14eb18-01
Threads Waiting On Exclusive Access:
898a2db0
The resource in red is the one that has more contention count (1893). Reviewing the threads 89a30ca8 and 8a127020 we have:
0: kd> !thread 8a127020
THREAD 8a127020 Cid 12d4.172c Teb: 00000000 Win32Thread: 00000000 WAIT: (WrResource) KernelMode Non-Alertable
8a25d110 SynchronizationEvent
8a127098 NotificationTimer
IRP List:
8a1279f0: (0006,01fc) Flags: 00000884 Mdl: 00000000
Not impersonating
DeviceMap e1433fd0
Owning Process 8972f2e0 Image: wepmcoll.exe
Wait Start TickCount 2658502 Ticks: 84 (0:00:00:01.312)
Context Switch Count 2096 LargeStack
UserTime 00:00:00.0015
KernelTime 00:00:00.0015
Win32 Start Address 0x00401846
Start Address 0x77e6b5ff
Stack Init b85ed000 Current b85ec054 Base b85ed000 Limit b85e9000 Call 0
Priority 14 BasePriority 8 PriorityDecrement 6
b85ec06c 8083e6a2 8a127098 8a127020 8a1270c8 nt!KiSwapContext+0x26
b85ec098 8083f164 8a127020 899bf900 00000000 nt!KiSwapThread+0x284
b85ec0e0 80818613 8a25d110 0000001b 00000000 nt!KeWaitForSingleObject+0x346
b85ec11c 80841266 00000000 e16d8008 88b2dc20 nt!ExpWaitForResource+0xd5
b85ec13c f7038438 899bf900 88b2dc01 b85ec170 nt!ExAcquireResourceExclusiveLite+0x8d
b85ec14c f706a3dc 88b2dc20 e16d8008 88b2dc01 Ntfs!NtfsAcquireResourceExclusive+0x20
b85ec170 f706c59b 88b2dc01 e16d8008 00000000 Ntfs!NtfsAcquireExclusiveFcb+0x42
b85ec19c f70553d9 88b2dc20 e16d8008 00000000 Ntfs!NtfsAcquireFcbWithPaging+0x7f
b85ec1d0 f706b903 88b2dc20 8a045260 e145f948 Ntfs!NtfsFindPrefixHashEntry+0x35c
b85ec320 f706c1e5 88b2dc20 8a1279f0 b85ec360 Ntfs!NtfsCommonCreate+0xaff
b85ec424 8083f9d0 8a045020 8a1279f0 8a1279f0 Ntfs!NtfsFsdCreate+0x17d
b85ec438 f71ef51e b85ec4dc 8a127ba4 8a2bef38 nt!IofCallDriver+0x45
b85ec468 8083f9d0 8a3ab530 8a1279f0 8a127bc8 fltmgr!FltpCreate+0x1aa
b85ec47c ba5f76f1 8a127ba4 8a127bc8 b85ec4dc nt!IofCallDriver+0x45
WARNING: Stack unwind information not available. Following frames may be wrong.
b85ec4a4 ba5fe740 8a3ab530 00000000 b85ec4dc SYMEVENT+0x76f1
b85ec4c0 ba5f7769 b85ec4dc 8082b0b9 ba5f782a SYMEVENT+0xe740
b85ec4fc 8083f9d0 8a1146d0 8a1279f0 8a1279f0 SYMEVENT+0x7769
b85ec510 f71e1b43 00000000 8a1279f0 8a127bc8 nt!IofCallDriver+0x45
b85ec534 f71ef5af b85ec554 89d43620 00000000 fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x20b
b85ec570 8083f9d0 89d43620 8a1279f0 8a1279f0 fltmgr!FltpCreate+0x23b
b85ec584 8092e269 b85ec72c 8a34e910 00000000 nt!IofCallDriver+0x45
b85ec66c 80936caa 8a34e928 00000000 88b23cb0 nt!IopParseDevice+0xa35
b85ec6ec 80936aa5 00000000 b85ec72c 00000240 nt!ObpLookupObjectName+0x5a9
b85ec740 80936f27 00000000 00000000 83da0600 nt!ObOpenObjectByName+0xea
b85ec7bc 80936ff8 b85ec8ec 00100080 b85ec89c nt!IopCreateFile+0x447
b85ec818 ba2dec67 b85ec8ec 00100080 b85ec89c nt!IoCreateFile+0xa3
b85ec8bc ba2def24 b85ecdf0 b85e9000 00000007 SPBBCDrv+0x2c67
b85ec8f0 ba2df451 b85ec914 b85ec924 b85ec91c SPBBCDrv+0x2f24
b85ec93c ba2eb6d6 e14e5378 e305bb40 00000000 SPBBCDrv+0x3451
b85ec978 ba2fa603 b85ecaec 00000005 0000000b SPBBCDrv+0xf6d6
b85ec9a8 ba2fa2e8 b85eca74 e352f678 00000000 SPBBCDrv+0x1e603
b85ec9e4 ba2eb335 00000001 b85eca74 e365b6c0 SPBBCDrv+0x1e2e8
b85eca44 ba2ec67e b85ecbb8 005ecb8c 00000001 SPBBCDrv+0xf335
b85ecb08 8083f893 0000005f b85ecb3c e1920e14 SPBBCDrv+0x1067e
ba32df50 ba2f9fe4 ba2e7852 ba2e7870 ba2e7852 nt!ExReleaseResourceLite+0x8c
ba32df54 ba2e7852 ba2e7870 ba2e7852 ba2fa006 SPBBCDrv+0x1dfe4
ba32df58 ba2e7870 ba2e7852 ba2fa006 ba2e7852 SPBBCDrv+0xb852
ba32df5c ba2e7852 ba2fa006 ba2e7852 ba2f9f72 SPBBCDrv+0xb870
ba32df60 ba2fa006 ba2e7852 ba2f9f72 ba2fa05e SPBBCDrv+0xb852
ba32df64 ba2e7852 ba2f9f72 ba2fa05e ba2e787e SPBBCDrv+0x1e006
This thread has an I/O request packet (IRP), let’s see what we have in this IRP using the !irp command:
0: kd> !irp 8a1279f0
Irp is active with 11 stacks 9 is current (= 0x8a127b80)
No Mdl: No System Buffer: Thread 8a127020: Irp stack trace.
cmd flg cl Device File Completion-Context
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
>[ 0, 0] 1 e0 8a045020 897c27a0 ba5f7640-b85ec490 Success Error Cancel Pending
\FileSystem\Ntfs SYMEVENT
Args: b85ec5b0 01000021 00070000 00000000
[ 0, 0] 1 e0 8a1146d0 897c27a0 f71e0fac-88af0480 Success Error Cancel
\Driver\SymEvent fltmgr!FltpGeneralCompletion
[ 0, 0] 1 0 89d43620 897c27a0 00000000-00000000
\FileSystem\FltMgr
The IRP is pending during a file access (FileSystem\NTFS) for the filter driver SYMEVENT. Let’s see which NTFS object this drive was accessing during the time that the issue was happening:
0: kd> !object 897c27a0
Object: 897c27a0 Type: (8a3be560) File
ObjectHeader: 897c2788
HandleCount: 0 PointerCount: 1
Directory Object: 00000000 Name: \urlcache\dir1.cdat
Great, looks like the Antivirus is scanning the ISA Server cache file. Let’s see the version of the 3rd party drivers that were in the stack:
0: kd> lmvm SYMEVENT
start end module name
ba5f0000 ba612000 SYMEVENT (no symbols)
Loaded symbol image file: SYMEVENT.SYS
Image path: \??\C:\Program Files\Symantec\SYMEVENT.SYS
Image name: SYMEVENT.SYS
Timestamp: Mon Jan 16 14:53:50 2006 (43CC07DE)
CheckSum: 0001A8D7
ImageSize: 00022000
Translations: 0000.04b0 0000.04e0 0409.04b0 0409.04e0
0: kd> lmvm SPBBCDrv
ba2dc000 ba33e000 SPBBCDrv (no symbols)
Loaded symbol image file: SPBBCDrv.sys
Image path: \??\C:\Program Files\Common Files\Symantec Shared\SPBBC\SPBBCDrv.sys
Image name: SPBBCDrv.sys
Timestamp: Mon Feb 06 14:34:12 2006 (43E7B2C4)
CheckSum: 0006C4AB
ImageSize: 00062000
This was one of many examples that I see in daily basis when I’m working with customers that have Antivirus installed on ISA Server. Fortunately we published in the Tales from the Edge site the official statement about having Antivirus installed in ISA Server. So, if you really need to have an Antivirus on ISA Server, follow the guidelines of the article Considerations when using antivirus software on ISA Server. This particular issue was fixed after update the Antivirus filter driver and excludes the ISA Server folders from the real time scan.
When question that I always receive when working with Firewall Service crashing is: why is it crashing? When the answer is: due a third party application…then the next question is: how is that? I thought each process was running independently and one couldn’t crash the other, right? That’s correct; however you need to remember how things work on ISA core architecture. Let’s step back and review the following diagram:
Figure 1 – ISA Server architecture (from ISA Server 2006 Firewall Core Document)
Notice that Firewall Service (wspsrv.exe) runs in User Mode while Firewall Engine (fweng.sys) runs in kernel mode. While is true that each process has its own address space, security token, etc, it is also true that each process is composed by threads, where each thread can be executing a different set of instructions and interacting with different components. ISA Server 2006 allows third party application to build their proprietary Web Filter (ISA Server supports ISAPI filter development) and by doing so it will somehow interfere in the way that Web Proxy Filter acts by default.
2. Digging in
If you use Process Explorer (or ProcMon) to open the properties of wspsrv.exe process you will see that there are many threads in execution as shown Figure 2:
Figure 2 – Threads running in the context of wspsrv.exe process.
If you select one of those threads and click Stack you will see the stack content and the modules in use. A stack is a region of the memory that is used to temporarily store data; it is added and removed in a last-in-first-out base. When you choose the thread and click on the stack you can see what it is in execution on that thread on that moment. Having this foundation understanding let’s take a look in the following diagram to understand how wspsrv.exe process can be affected by a third party filter:
Figure 3 – Firewall service process and the threads that belongs to it.
As you can see in this diagram there are some threads within the wspsrv.exe process and I’m using the stack of two of them as example. First stack from thread 1988 just have Microsoft modules and for the purpose of this example let’s focus on the stack that belongs to the thread 1920 which has a third party module (MyWebFilter.dll) loaded into it.
If this module for some reason execute an operation that cause an unhandled exception we might compromised the whole thread and possible crash the process. If you do not have a debugger attached to the process you will not get a dump for the wspsrv.exe, the only thing that will happen is that Firewall Service will crash (process quits from the memory) and an event is registered in the event viewer saying that the Firewall Service crashed. If you want to catch this type of crash you need a debugger attached to the process, to do that you can use an article that I wrote some time back about that, check it out here.
3. Access Violation
For the purpose of this example let's assume that this fake third party filter module did cause Firewall Service to crash and since I did have DebugDiag attached to wspsrv.exe process I was able to catch the second chance crash. In this case here it is the result for this crash by a partial output from !analyze –v command:
0:040> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
FAULTING_IP:
ntdll!KiUserExceptionDispatcher+e
7c82857e 0ac0 or al,al
EXCEPTION_RECORD: 102cf8cc -- (.exr 0x102cf8cc)
ExceptionAddress: 10161a50 (MyWebFilter.dll+0x00001a50)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000000
Parameter[1]: 10192068
Attempt to read from address 10192068
DEFAULT_BUCKET_ID: STATUS_STACKOVERFLOW
PROCESS_NAME: wspsrv.exe
ERROR_CODE: (NTSTATUS) 0xc00000fd - A new guard page for the stack cannot be created.
READ_ADDRESS: 1016caac
NTGLOBALFLAG: 0
APPLICATION_VERIFIER_FLAGS: 0
IP_MODULE_UNLOADED:
MyFilter+1a50
10161a50 ?? ???
CONTEXT: 102cf8e8 -- (.cxr 0x102cf8e8)
eax=102cfe44 ebx=00000000 ecx=10192048 edx=f9b10046 esi=10192048 edi=102cfe38
eip=10161a50 esp=102cfbb4 ebp=102cfbdc iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010206
MyWebFilter.dll+0x1a50:
Resetting default scope
RECURRING_STACK: From frames 0x7 to 0xa
LAST_CONTROL_TRANSFER: from 102cfe38 to 10161a50
IP_ON_STACK:
+102cfe38
102cfe38 5c pop esp
FRAME_ONE_INVALID: 1
STACK_TEXT:
WARNING: Frame IP not in any known module. Following frames may be wrong.
102cfbb0 102cfe38 00000006 00000000 10192048 MyWebFilter.dll+0x1a50
102cfc2c 776bf813 77796898 00327277 00000000 0x102cfe38
102cfc30 77796898 00327277 00000000 00000000 ole32!COleStaticMutexSem::Release+0x1a
102cfe6c 7c83ac6c 00000000 0efd5400 0efd54a8 ole32!gComLock+0x18
102cfec8 7c83ca92 62251dc0 00000000 0efd5400 ntdll!RtlpWaitOrTimerCallout+0x74
102cfeec 7c83a857 0efd54a8 7c88b080 0ef88528 ntdll!RtlpAsyncWaitCallbackCompletion+0x37
102cff44 7c83aa3b 7c83ca5b 0efd54a8 00000000 ntdll!RtlpWorkerCallout+0x71
102cff64 7c83aab2 00000000 0efd54a8 0ef88528 ntdll!RtlpExecuteWorkerRequest+0x4f
102cff78 7c839f90 7c83a9fa 00000000 0efd54a8 ntdll!RtlpApcCallout+0x11
102cffb8 77e6482f 00000000 00000000 00000000 ntdll!RtlpWorkerThread+0x61
102cffec 00000000 7c839f2b 00000000 00000000 kernel32!BaseThreadStart+0x34
Let see our registers:
0:040> r
eax=00000000 ebx=00000000 ecx=1016caac edx=7c828786 esi=00000000 edi=00000000
eip=1016caac esp=102911b0 ebp=102911d0 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246
MyWebFilter.dll+0xcaac:
1016caac ?? ???
Now let’s look at the EIP register (which points to where in the program the processor was currently executing the code):
0:040> r eip
eip=1016caac
Let’s dump it:
0:040> dd eip
1016caac ???????? ???????? ???????? ????????
1016cabc ???????? ???????? ???????? ????????
1016cacc ???????? ???????? ???????? ????????
1016cadc ???????? ???????? ???????? ????????
1016caec ???????? ???????? ???????? ????????
1016cafc ???????? ???????? ???????? ????????
1016cb0c ???????? ???????? ???????? ????????
1016cb1c ???????? ???????? ???????? ????????
Well, it doesn’t looks good since it is pointing to a bunch of question mark (either invalid or not accessible memory). Let’s see what memory address EIP was pointing to:
0:040> !address eip
10160000 : 10160000 - 00040000
Type 00000000
Protect 00000001 PAGE_NOACCESS
State 00010000 MEM_FREE
Usage RegionUsageFree
What this PAGE_NOACCESS means? Let’s see the definition from MSDN:
“Pages in the region become guard pages. Any attempt to read from or write to a guard page causes the operating system to raise the STATUS_GUARD_PAGE exception and turn off the guard page status. Guard pages thus act as a one-shot access alarm. The PAGE_GUARD flag is a page protection modifier. An application uses it with one of the other page protection flags, with one exception: it cannot be used with PAGE_NOACCESS. When an access attempt leads the operating system to turn off guard page status, the underlying page protection takes over. If a guard page exception occurs during a system service, the service typically returns a failure status indicator.”
From: http://msdn.microsoft.com/en-us/library/aa450977.aspx
One strong hypothesis here (since we don’t have the code for the third party application to debug) is that this module tried to access an invalid memory address and therefore corrupted the stack causing the access violation. This was enough to cause the whole process (wspsrv.exe) to crash.
FWEngmon can be used in many circumstances and here are some great examples on how to use this tool:
http://blogs.technet.com/isablog/archive/2008/03/12/bi-directional-affinity-in-isa-server.aspx
http://blogs.technet.com/isablog/archive/2008/06/24/server-publishing-with-isa-server-2004-2006-and-route-relationship-between-networks.aspx
http://blogs.technet.com/isablog/archive/2007/06/25/rpc-over-http-logging-wildness.aspx
With Forefront TMG 2010 this tool is gone, but no worries, now it is actually much better since is part of the netsh command. Here it is an output of the command that shows the active sessions:
C:\>netsh tmg show connections
Active Sessions:
Source / Destination /
ID Protocol Source Proxy Dest. Proxy 2-way Timeout
-- -------- ----------- ------------ ----- -------
15583 TCP(6) 10.20.20.1:41099 10.20.20.10:445 Yes Yes
4518 TCP(6) 10.20.20.1:41130 10.20.20.10:135 Yes Yes
10.20.20.1:34635
4516 TCP(6) 10.20.20.1:41131 10.20.20.10:135 Yes Yes
10.20.20.1:41130
4522 TCP(6) 10.20.20.1:41132 10.20.20.10:49158 Yes Yes
4520 TCP(6) 10.20.20.1:41133 10.20.20.10:49158 Yes Yes
10.20.20.1:41132
4525 TCP(6) 10.20.20.1:41135 10.20.20.10:135 Yes Yes
4523 TCP(6) 10.20.20.1:41136 10.20.20.10:135 Yes Yes
10.20.20.1:41135
4529 TCP(6) 10.20.20.1:41137 10.20.20.10:49155 Yes Yes
4527 TCP(6) 10.20.20.1:41138 10.20.20.10:49155 Yes Yes
10.20.20.1:41137
15602 UDP(17) 10.20.20.1:49014 10.20.20.10:389 Yes Yes
15603 UDP(17) 10.20.20.1:49015 10.20.20.10:389 Yes Yes
15605 UDP(17) 10.20.20.1:49016 10.20.20.10:389 Yes Yes
15606 UDP(17) 10.20.20.1:49017 10.20.20.10:389 Yes Yes
15601 TCP(6) 192.168.1.154:41129 192.168.1.45:445 Yes Yes
There are much more options available, just use the /? And you will see:
C:\>netsh tmg show /?
The following commands are available:
Commands in this context:
show all - Shows all available information.
show allowedrange - Shows current allowed IP ranges.
show connections - Shows connection element information.
show creations - Shows creation element information.
show global - Shows driver configuration information.
show holdpackets - Shows information about the hold packets in driver.
show nlbhookrules - Shows NLB hook rule and NLB server assigned ranges information.
show usermodepackets - Shows information about the hold packets currently being handled in user mode.
Now go ahead and start playing with this new built in toy.
Have your heard of that? This was a case about an user that was getting his account locked out in AD every time that he connected through Outlook Anywhere. As usual until get to this conclusion we had to pass through painful troubleshooting steps. Starting from the Troubleshooting Account Lockout approach to narrow it down that the issue was happening only when using Outlook Anywhere through ISA Server. Here the explanation of the issue:
An Outlook Anywhere client continually uses the wrong credentials every time that it tries to authenticate itself on an Exchange server after you install ISA Server 2006 Service Pack 1
http://support.microsoft.com/kb/956192/en-us
Wants to fix that? Make sure to update your ISA Server 2006 SP1 with the Post SP1 update (July 2008 Package).
The problem that this post is going to discuss was related to a random issue where certain times of the day the ISA Server was stopping answering requests and when the firewall administrator tried to restart the firewall service the service didn’t start. The only event that we have prior to the issue happens was the one below:
Event Type: Error
Event Source: Microsoft ISA Server Web Proxy
Event ID: 14172
Date: 13/3/2009
Time: 18:37:43
Computer: ISASRV
The cache was not properly initialized. caching will be disabled (internal code 503.287.4.0.2167.887). Identify the specific reason for the failure from previous relevant event logs. Fix the problem, and then restart the Firewall service to enable caching.
Doing a quick assessment I could see that the Antivirus was scanning all folders, including ISA Folders (not good at all). As a troubleshooting step I disabled the AV but the issue persisted. Using ProcMon I could see that when ISA Storage process (ISAStg.exe) was trying to read a value in register the AV filter drive was still present in kernel mode and intercepting the request. Here it is the sequence:
ISASTG process:
34408 2:23:05.8643957 PM isastg.exe 3904 RegEnumValue HKLM\SOFTWARE\Microsoft\Fpc\Storage\Array-Root\Arrays\{0A8D8F99-6862-47B9-9388-12890728AF1A}\Servers\{B622A644-418A-40E1-988F-C1182B246652}\Proxy-Cache-Directories\Proxy-Cache-Directory1 SUCCESS Index: 3, Name: msFPCDirectoryName, Type: REG_SZ, Length: 34, Data: D:\urlcache\Dir1
The stack for this process shows the AV filter drive (klif.sys):
0 ntoskrnl.exe ntoskrnl.exe + 0x17859f 0x8097859f C:\WINDOWS\system32\ntoskrnl.exe
1 ntoskrnl.exe ntoskrnl.exe + 0x146c3c 0x80946c3c C:\WINDOWS\system32\ntoskrnl.exe
2 klif.sys klif.sys + 0xfa1c 0xf685fa1c C:\WINDOWS\system32\drivers\klif.sys
3 ADVAPI32.dll ADVAPI32.dll + 0x12530 0x77f62530 C:\WINDOWS\system32\ADVAPI32.dll
4 isastg.exe isastg.exe + 0x8352 0x408352 D:\Program Files\Microsoft ISA Server\isastg.exe
5 isastg.exe isastg.exe + 0x9054 0x409054 D:\Program Files\Microsoft ISA Server\isastg.exe
6 RPCRT4.dll RPCRT4.dll + 0x30193 0x77c80193 C:\WINDOWS\system32\RPCRT4.dll
7 RPCRT4.dll RPCRT4.dll + 0x933e1 0x77ce33e1 C:\WINDOWS\system32\RPCRT4.dll
8 RPCRT4.dll RPCRT4.dll + 0x935c4 0x77ce35c4 C:\WINDOWS\system32\RPCRT4.dll
9 RPCRT4.dll RPCRT4.dll + 0x2ff7a 0x77c7ff7a C:\WINDOWS\system32\RPCRT4.dll
10 RPCRT4.dll RPCRT4.dll + 0x3042d 0x77c8042d C:\WINDOWS\system32\RPCRT4.dll
11 RPCRT4.dll RPCRT4.dll + 0x30353 0x77c80353 C:\WINDOWS\system32\RPCRT4.dll
12 RPCRT4.dll RPCRT4.dll + 0x311dc 0x77c811dc C:\WINDOWS\system32\RPCRT4.dll
13 RPCRT4.dll RPCRT4.dll + 0x312f0 0x77c812f0 C:\WINDOWS\system32\RPCRT4.dll
14 RPCRT4.dll RPCRT4.dll + 0x38678 0x77c88678 C:\WINDOWS\system32\RPCRT4.dll
15 RPCRT4.dll RPCRT4.dll + 0x38792 0x77c88792 C:\WINDOWS\system32\RPCRT4.dll
16 RPCRT4.dll RPCRT4.dll + 0x3872d 0x77c8872d C:\WINDOWS\system32\RPCRT4.dll
17 RPCRT4.dll RPCRT4.dll + 0x2b110 0x77c7b110 C:\WINDOWS\system32\RPCRT4.dll
18 kernel32.dll kernel32.dll + 0x24829 0x77e64829 C:\WINDOWS\system32\kernel32.dll
Later on we fail to create the file:
34838 2:23:05.9429702 PM mspadmin.exe 612 CreateFile D:\urlcache SUCCESS Desired Access: Read Attributes, Read Control, Write DAC, Disposition: Open, Options: Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, Impersonating: S-1-5-21-2611182321-852623426-2620623114-500, OpenResult: Opened
34839 2:23:05.9430612 PM mspadmin.exe 612 QueryBasicInformationFile D:\urlcache SUCCESS CreationTime: 2/13/2009 1:51:15 PM, LastAccessTime: 2/13/2009 2:23:04 PM, LastWriteTime: 2/13/2009 1:51:15 PM, ChangeTime: 2/13/2009 1:51:15 PM, FileAttributes: D
34840 2:23:05.9431081 PM mspadmin.exe 612 QuerySecurityFile D:\urlcache BUFFER OVERFLOW Information: Owner, Group, DACL, 0x80000000
We uninstalled the AV and the issue didn’t happen anymore. Since his environment had a requirement to have AV installed on ever single Windows machine we implemented the correct folder exclusion following the article “Considerations when using antivirus software on ISA Server” and the environment got stabilized.
Interesting side of this story is that this article was published exactly one year ago, one year later we still have firewall administrators not following such recommendation and therefore having unexpected downtimes.
Recently I received a question from a TMG Admin saying that can’t install DebuDiag on Windows Server 2008 since it is not supported and therefore don’t know how to catch a user mode dump of the wspsrv.exe process on TMG 2010. The good news is that with Windows Server 2008 the task of getting a manual dump of a process is even easier since it doesn’t need any additional tool; this capability is built in on the system. Just open Task Manager, go to Processes tab, highlight the wspsrv.exe process, right click on it and choose Create Dump File.
Easy isn’t it?
Having a dump of the wspsrv.exe process using this approach can be useful for the following scenarios:
The previous post was about performance and some of the symptoms that can confuse us during the problem definition. This post will be about a crash on a user mode process, in this case for the process ISA Process wspsrv.exe.
Usually, when this process crashes we have the following event on the application log:
Event ID: 14057
Date: DATE
Time: Time
Computer: MyISA
The Firewall service stopped because an application filter module C:\Program Files\Microsoft ISA Server\Module generated an exception code YYYYYYY in address XXXXXX when function ZZZZZZZ was called. To resolve this error, remove recently installed application filters and restart the service.
Where:
· Module – module that generated the exception.
· YYYYYYY – Exception code, for example an access violation (C0000005).
· XXXXXX – memory address where the exception occurred.
· ZZZZZZZ – function that was called during the exception.
When the Firewall Service crashes it causes the ISA Server to stop working and the following event will appear on the application log:
Event ID: 14079
Time: TIME
Due to an unexpected error, the service fwsrv stopped responding to all requests. Stop the service or the corresponding process if it does not respond, and then start it again
3. Gathering Data
The big problem when this kind of event happens is that when you realize that the issue occurred then it is already too late. The root cause was already gone and you just lost the chance to grab the data.
Since we are dealing with a crash we need to attach a debugger to the process that is crashing. The program that we are going to use to do this is called DebugDiag and it was created by the IIS Team to troubleshoot inetinfo.exe crashes and leaks. Later on became a robust tool to grab and analyze user mode hangs and crashes for IIS or other processes. First thing to do is to download the DebugDiag from the link below:
http://www.microsoft.com/downloads/details.aspx?FamilyID=28bd5941-c458-46f1-b24d-f60151d875a3&DisplayLang=en
After install the tool on the ISA Server then follow the steps below to configure:
1. Click Start / Programs / Debug Diagnostic Tool 1.1 / DebugDiag 1.1 (x86). The following window will appear:
Figure 1 – Creating a new crash rule.
2. Leave the Crash option enabled and click in Next. The following window will appear:
Figure 2 – Target object.
3. Select the option “A specific process” and click in Next. The following window will appear:
Figure 3 – Selecting the process.
4. Select the ISA Server process that is crashing. For the purpose of this example I’m going to select the process wspsrv.exe. Click in Next and the window below will appear:
Figure 4 – Advanced configuration.
5. On this window we need to select a couple of things, let’s check below:
· Action type for unconfigured first chance exception: Full Userdump.
· Action limit for unconfigured first chance exception: 0 (unlimited).
6. Click on the exception button and select the options as showed below and on the same order:
Figure 5 – Final steps.
7. Notice that we select the “Access Violation” exception as an example, because the event 14057 in this case was the C0000005. If you are not sure which exception is happening, you don’t need to select anything here and the Debugdiag will capture the dump regardless of the exception. Click in Next to continue.
8. Type the name of rule and click in Next and then click in Finish to activate the rule.
4. Now what?
Now we just need to wait for the next occurrence and the debugger with catch the crash. Next post will show how the crash dump looks like and what to do after you have it.
I was reading the Windows IT Pro Magazine of this month (September 2009) and there I found a nice article written by an Escalation Engineer here from Microsoft Texas (Michael Morales) where he describes how to use ProcDump to catch high CPU utilization. This is an amazing tool that can also help ISA Administrators, mainly for scenarios where we just can’t get the right data (most case dumps) because the issue is random and when it happens there is nobody available to execute a command (for example: launch DebugDiag and choose the option for manual dump the process).
For an ISA Server high CPU utilization scenario a simple example will be dump out the Firewall Service process two times when the CPU for wspsrv.exe is at or exceeds 90 percent for 5 seconds and store the dumps in the c:\dumps folder:
c:\procdump.exe -c 90 -s 5 -n 2 wspsrv.exe c:\dumps
Isn’t that cool?
Make sure to read the article from Michael Morales to fully understand how this tool works:
http://windowsitpro.com/article/articleid/102479/got-high-cpu-usage-problems-procdump-em.html
The expected ISA Server 2006 SP1 is on the way, Jim Harrison has blogged in the ISA Team Blog about the new features of this release. The ISA Server community is very excited about this announcement as externally by Tom Shinder on his blog.
I just delivered a webcast to the IT community in Brazil through Support Academy (an initiative from Microsoft Latam Team) about those new features and here are some of the demos (I removed the narration since it was in Portuguese J):
Demo 1 – Configuration Change Tracking
· This demo shows the functionalities of this new feature and how to easily identify what change was done in the Firewall Policy.
Demo 2 – Web Publishing Rule Test Button
· Very cool feature that can be used proactively to see if the publishing rule is working (prior to put in production) or reactively while troubleshooting an issue.
Demo 3 – Traffic Simulator
· Tired to wait for the user to be able to see the error that he is receiving when accessing a web site? Now you can do your own simulation with this tool.
Demo 4 – Diagnostic Logging Query
· With this integration you will be able to understand exactly what is going on when ISA is processing your request.
Start planning your summer migrations for ISA Server 2006 SP1.
Recently the number of times that I received this question increased (not sure why), but the fact of the matter is that you can do something to resolve this problem (or at least identify the source of the problem), if you have the right tools. When your ISP informs you that you are in an email blacklist you can review the results by going to a site that provides real time blacklist results, such as http://www.mxtoolbox.com/blacklists.aspx.
2. Step 1 – Identifying
Next thing to do is identify the source of the problem, why your company got blacklisted? If you have ISA Server in the border of your network, what you can do is just watch your live logging creating a filter for SMTP Protocol coming from the internal network, as shown below:
If you have only one SMTP Server internally, then the only IP that you should see sending SMTP traffic is your SMTP server, if you start to see workstation’s IP address in the list then you need to investigate that further. Make sure to also get a netmon trace on ISA (internal and external adapter) at the same time or simple get an ISA Data Packager in repro mode using Web Proxy and Web Publishing template.
2. Step 2 – Containing
To contain the amount of SMTP traffic leaving your network you need to make sure that the only host that is capable to send SMTP traffic out (to the Internet) is your SMTP Server.
This is the reason why I get disappointed when I open a Firewall Policy rule and see a rule that allows All Outbound. If your environment doesn’t have such rule, the SMTP traffic that are not coming from the SMTP Server will not be able to leave your internal network in the first place.
Notice that I’m using step two as contention because on step one you are logging the real traffic, which means that you will have real data to analyze it later.
3. Step 3 – Remediating
After implementing this contention, you can now start working on the workstation and verify why it is sending traffic out to the Internet. At this point, you can:
· Get a netmon sample of the traffic, so you can see which process is sending out SMTP traffic.
· Unplug this workstation from the network
· Use an Antivirus (like Forefront Client Security or Microsoft Security Essentials) to scan the local workstation and see if it is infected.
After cleaning the system (last time that I worked in a case similar to this one, the piece of malware that was found in the workstation was Backdoor:Win32/Oderoor.gen!A.), one other thing that you can also do is to follow the procedures form the article Capturing a Trace at Boot Up and get a sample capture while the machine is starting up. With that you can see the traffic profile and make sure that there is no malicious attempt to go out right in the beginning of the system boot.
This is only a simple example of an incident response in case your SMTP is being blacklisted due a piece of malware that is running inside your network. A full description of Microsoft Incident Response recommendation read the article below:
Responding to IT Security Incidents
http://technet.microsoft.com/en-us/library/cc700825.aspx
Consider the following scenario:
In this scenario, when the Web server receives an HTTP request, it redirects the request to the TMG adding the https on the new location within the header as shown below:
- GET Request sent from TMG to the internal Server:
Http: Request, GET /default.aspx Command: GET + URI: /default.aspx ProtocolVersion: HTTP/1.1 Via: 1.1 TMG Host: contoso.com Accept: */* Accept-Language: en-us Connection: Keep-Alive Accept-Encoding: peerdist HeaderEnd: CRLF
- Web Server reply with the new location:
Http: Response, HTTP/1.1, Status: Moved temporarily, URL: /default.aspx ProtocolVersion: HTTP/1.1 StatusCode: 302, Moved temporarily Reason: Found Cache-Control: private Location: https://contoso.com/default.aspx Server: Microsoft-IIS/7.5 XAspNetVersion: 2.0.50727 XPoweredBy: ASP.NET ContentLength: 149 HeaderEnd: CRLF
Problem: TMG receives the request with the new location and instead of sending this new location to the client workstation, it sends http://contoso.com/default.aspx (removing the “s”), client receives this 302 and send the request again, causing an eternal loop.
Resolution: in order to fix this problem, use the resolution (method 2) from KB http://support.microsoft.com/kb/924373. Although the KB doesn’t have Forefront TMG 2010 listed, the same approach applies to TMG 2010 (yes, we will update the KB).
When we think that we had covered all scenarios to mitigate possible issues with change password feature through ISA Server 2006 something new happen. This quick post is about a scenario where only users that belong to a specific OU were unable to change password through ISA Server 2006. The users that were located in this scenario in the OU called Adm/Fin as shown in Figure 1:
Figure 1 – ISA Server 2006 web publishing rule with a deny action.
2. Troubleshooting
The articles below were used to initially troubleshoot this issue:
1. The "change password" feature does not work as expected after you install ISA Server 2006 Service Pack 1 http://support.microsoft.com/kb/957859
2. Configuring and Troubleshooting the Password Change Feature in ISA Server 2006 http://technet.microsoft.com/en-us/library/cc514301.aspx
3. Troubleshooting Forms Base Authentication using Secure LDAP Authentication on ISA Server 2006 http://technet.microsoft.com/en-us/library/dd316279.aspx
4. Unable to Change Password through ISA Server 2006 http://blogs.technet.com/isablog/archive/2009/04/28/unable-to-change-password-through-isa-server-2006.aspx
After all the efforts to fix the issue using the articles above one little piece of information was gathered within the isalog.bin (which is part of ISA Data Packager as explained in one of my articles). The information found in the log says that ISA failed to change the password because of the error 80005000, which means E_ADS_BAD_PATHNAME.
Interesting having this error because the user could logon just fine, which means that the path was correct, besides the same user was able to change the password through a Windows workstation logged internally in the domain.
After collaborate with DS Team we found the following statement in one article about LDAP:
If the name of an organizational unit contains a forward slash character (/), the system requires an escape character in the form of a backslash (\) to distinguish between forward slashes that separate elements of the canonical name and the forward slash that is part of the organizational unit name.
Source: http://technet.microsoft.com/en-us/library/cc977992.aspx
The problem was the name of the OU that has a slash character, this problem is because LDAP parses the slash as a break and this makes the query to fail. After rename the OU to remove the slash the user was able to change the password.
Now that TMG Beta 3 is released you can enjoy the best of both words for VPN access. In the past I was questioned about SSTP on ISA Server 2006 since Windows Server 2008 was capable to do it. The sad answer was that ISA Server 2006 didn’t have this feature built in. But now you can use TMG and select SSTP the same way as another protocol as shown in Figure 1:
Figure 1 – SSTP available in TMG Console.
When configuring SSTP on TMG you will need to carefully plan:
· Web Listener that will be used by SSTP.
· Certificate that is going to be bound to the Web Listener.
Besides that you will need Windows Vista with SP1 on the client workstation to test this new feature.
Troubleshooting Client Access
Since I’m working remotely some these days I was able to reproduce some of the nice errors that I didn’t have when I was in my home lab. Today for example I got the following error when I was trying to connect from my laptop:
Figure 2 – First error due the cert name.
That was pretty self explanatory, but just to confirm the name that I used to issue the certificate I got a netmon trace and got the subject name:
SSL: Server Hello. Certificate. Server Hello Done.
Seq=1878717387 - 1878718743, Ack=2650000305, Win=256 (scale factor 0x8) = 65536
- Ssl: Server Hello. Certificate. Server Hello Done.
- TlsRecordLayer:
Length: 1351 (0x547)
- SSLHandshake: SSL HandShake TLS 1.0 Server Hello Done(0x0E)
HandShakeType: ServerHello(0x02)
Length: 70 (0x46)
+ ServerHello: 0x1
HandShakeType: Certificate(0x0B)
Length: 1269 (0x4F5)
- Cert: 0x1
CertOffset: 1266 (0x4F2)
- Certificates:
CertificateLength: 1263 (0x4EF)
- X509Cert: Issuer: contoso-DC01-CA,contoso,com, Subject: vpn.contoso.com,IT,Contoso,Dallas,Texas
+ SequenceHeader:
- TbsCertificate: Issuer: contoso-DC01-CA,contoso,com, Subject: vpn.contoso.com,IT,Contoso,Dallas,Texas
+ Tag0:
+ Version: v3 (2)
+ SerialNumber: 0x6168a464000000000002
+ Signature: Sha1WithRSAEncryption (1.2.840.113549.1.1.5)
+ Issuer: contoso-DC01-CA,contoso,com
+ Validity: From: 06/15/09 21:03:46 UTC To: 06/15/10 21:13:46 UTC
+ Subject: vpn.contoso.com,IT,Contoso,Dallas,Texas
+ SubjectPublicKeyInfo: RsaEncryption (1.2.840.113549.1.1.1)
+ Tag3:
+ Extensions:
+ SignatureAlgorithm: Sha1WithRSAEncryption (1.2.840.113549.1.1.5)
+ Signature:
HandShakeType: Server Hello Done(0x0E)
Length: 0 (0x0)
To quick fix this I edited my host file and created a manual entry there. But then right after that I got:
Figure 3 – Now is the CRL.
Looking to the properties of the certificate it was possible to see that the CRL was poiting to my internal CA:
Figure 4 – The CRL for my internal CA.
To resolve this I created a web publishing rule to publish my CRL and after that all worked fine.
Additional Resources
While testing those settings I got some great links from the RRAS team (which is the component that TMG uses for VPN capability). Check it out the links below:
http://blogs.technet.com/rrasblog/archive/2007/09/26/how-to-debug-sstp-specific-connection-failures.aspx
http://blogs.technet.com/rrasblog/archive/2007/01/17/sstp-faq-part-2-client-specific.aspx
http://blogs.technet.com/rrasblog/archive/2007/01/25/sstp-faq-part-3-server-specific.aspx
You might be wondering: how did you get access to those things if you were unable to establish the VPN connection? The answer is: through my backup PPTP connection :)