Why is it a bad request? If you haven’t seen this before, I see quiet often in some scenarios involving ISA Server and guess what: if I remove ISA Server from the picture it works just fine. Well, we all know that when you are wide open to Internet there is nobody inspecting your traffic anyway so you shall pass. Is the same thing as trying to get into your house without the key for the door, you can’t get in, but guess what, if you remove the door you can get in. Oh..so it got be the door the root cause of the problem, right? Better remove the door or look for your key?
In some scenarios when you receive the “400 Bad Request” is because is ISA Server is acting according to RFC 2616 and then sending the HTTP response to the client like this:
- Http: Response, HTTP/1.1, Status Code = 400,
ProtocolVersion: HTTP/1.1
StatusCode: 400, Bad request
Reason: Bad Request ( The data is invalid. )
Connection: Keep-Alive
Pragma: no-cache
Cache-Control: no-cache
- ContentType: text/html
MediaType: text/html
ContentLength: 1852
HeaderEnd: CRLF
- payload: HttpContentType = text/html
Here it is a piece of the HTTP header with a type of request that can be interpreted by ISA Server as invalid:
HTTP HTTP:Request
...
Content-type: multipart/mixed
If the content type is multipart the HEX value might look like this:
49 53 41 2A 30 30 2A 20 20 20 20 20 20 20 20 20 20 2A 30 30 2A 20 20 20 20 20 20 20 20 20 20 2A 31 32 2A 36 31 34 32 37 38 36 35 35 31 20 20 20 20 20 2A 30 38 2A 36 31 31 31 33 35 35 30 30 31 20 20 20 20 20 2A 30 36 30 31 30 35 2A 30 39 35 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
The value 20 in HEX means space (see http://en.wikipedia.org/wiki/ASCII for more info), which makes perfect sense in this case since the next HTTP packet just continued with a bunch of more spaces at the end, which supposedly will extend to a third packet if ISA haven’t drop that:
20 20 20 20 20 20 20 20 20 20 20 20 30 32 31 30 30 30 31 7E 47 45 2A 31 2A 31 37 32 31 7E 49 45 41 2A 31 2A 30 30 30 30 30 31 37 32 31 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 7E 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
Going back to the RFC 2616 (section 4.2) you will see that this is not a good practice and ISA Server does what is supposed to be done, drop the packet. The best way to fix that is to contact the web site administrator (or application developer) and ask him to fix that. However since in almost of all outbound access you don’t have control of the site that your internal client is trying to access, ISA Server has a workaround to accept that. To workaround on this problem (since is not an ISA issue) you need should install ISA Server 2006 SP1 and after that make the registry changes suggested in the article below:
941293 Error message when you access a Web site through ISA Server 2006 or Microsoft Forefront Threat Management Gateway, Medium Business Edition: "HTTP 400 - Bad Request"
http://support.microsoft.com/default.aspx?scid=kb;EN-US;941293
1. Introduction
I wrote some time back about an issue on ISA Server related to VPN and on that post I emphasized the following:
“…ISA Server VPN component uses Windows RRAS component and that when the ISA is isolated from the main problem, the troubleshooting line needs to follow RRAS tools and techniques.”
This is what boils down to on the most case that I work where ISA Server is being used as VPN Server. Troubleshooting on the ISA Server side most of the time is minimum, most of the time on the ISA Server side is just a matter of configuration, or should I say misconfiguration. The big and deep troubleshooting involve other elements around:
· RRAS
· Networking in general
· Connectivity with the DC (or RADIUS Server)
This week I got a couple of VPN cases that were really interesting and I would like to share with you the troubleshooting steps and solution for those.
2. Scenario 1 – VPN Client was unable to authenticate
In this scenario, all VPN settings were correct on ISA and on client; on ISA Server logging we just see error 0x80074E20, which means “A connection was gracefully closed in an orderly shutdown process with a three-way FIN-initiated handshake” (see error codes for more information). Basic connectivity with the DC was also good…but, basic means only a ping with a reply. No other device in between internal NIC of ISA and the DC, full communication was allowed.
Investigating further it was possible to see the following:
· Event Viewer
Event Type: Error
Event Source: Microsoft Firewall
Event Category: None
Event ID: 21137
Date: 9/24/2009
Time: 5:51:18 PM
User: N/A
Computer: ISACONTN1
Description:
The connectivity verifier "Domain Controller" reported an error when trying to connect to DCCONT.contoso.msft.
Reason: The request has timed out.
Event Source: NETLOGON
Event ID: 5719
Time: 5:54:27 PM
This computer was not able to set up a secure session with a domain controller in domain CONTOSO due to the following:
There are currently no logon servers available to service the logon request.
This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator.
ADDITIONAL INFO
If this computer is a domain controller for the specified domain, it sets up the secure session to the primary domain controller emulator in the specified domain. Otherwise, this computer sets up the secure session to any domain controller in the specified domain.
· RRAS Tracing (need to run “netsh ras set tracing * enable” in order to have those files under %systemroot%\tracing)
Content of the file IASSAM.LOG:
[924] 09-24 17:53:39:429: NT-SAM Names handler received request with user identity CONTOSO\administrator.
[924] 09-24 17:53:39:429: Username is already an NT4 account name.
[924] 09-24 17:53:39:429: SAM-Account-Name is "CONTOSO\administrator".
[924] 09-24 17:53:39:429: NT-SAM Authentication handler received request for CONTOSO\administrator.
[924] 09-24 17:53:39:429: Processing MS-CHAP v2 authentication.
[924] 09-24 17:54:27:198: LogonUser succeeded.
[924] 09-24 17:54:27:198: NT-SAM User Authorization handler received request for CONTOSO\administrator.
[924] 09-24 17:54:27:198: Using native-mode dial-in parameters.
[924] 09-24 17:54:27:198: Sending LDAP search to DCCONT.contoso.msft.
[924] 09-24 17:54:27:198: LDAP ERROR in ldap_search_ext_sW. Code = 81
[924] 09-24 17:54:27:218: Extended error string: (null)
[924] 09-24 17:54:27:218: Retrying LDAP search.
[924] 09-24 17:54:27:618: Failed to connect to the cached DC, try DC locator ...
[828] 09-24 17:54:31:284: NT-SAM Names handler received request with user identity CONTOSO\administrator.
[828] 09-24 17:54:31:284: Username is already an NT4 account name.
[828] 09-24 17:54:31:284: SAM-Account-Name is "CONTOSO\administrator".
[828] 09-24 17:54:31:284: NT-SAM Authentication handler received request for CONTOSO\administrator.
[828] 09-24 17:54:31:284: Processing MS-CHAP v2 authentication.
[828] 09-24 17:54:31:284: LogonUser succeeded.
[828] 09-24 17:54:31:284: NT-SAM User Authorization handler received request for CONTOSO\administrator.
[924] 09-24 17:54:50:922: Failed to connect to the DC discovered by DC locator, try DC enumerator ...
[828] 09-24 17:55:02:979: Using native-mode dial-in parameters.
[828] 09-24 17:55:02:979: Could not open an LDAP connection to domain CONTOSO.
[924] 09-24 17:55:02:979: Could not open an LDAP connection to domain CONTOSO.
[828] 09-24 17:55:03:089: NTDomain::getConnection failed: This operation returned because the timeout period expired.
[828] 09-24 17:55:03:089: Retrying LDAP search.
[828] 09-24 17:55:03:089: Could not open an LDAP connection to domain CONTOSO.
[924] 09-24 17:55:03:089: NTDomain::getConnection failed: This operation returned because the timeout period expired.
By this time you can be thinking: how we have basic connectivity and it doesn’t reach the DC? That was the key question. The next step was a simple test: try to change the groups that had permission to access VPN and user a local group, from ISA itself. In order to do that I created a local group called VPNUsers, added the local administrator on it (for testing purpose) and try to add the CONTOSO\Domain Users when….got the following error: The RPC server is unavailable. Ok, now it makes sense and to start troubleshooting this I used:
Troubleshooting RPC Endpoint Mapper errors using the Windows Server 2003 Support Tools from the product CD
It turns out that the issue was related to the binding order of the network adapter. On ISA Server we had the following binding order:
Figure 1
It is strongly recommended that the internal network (in this case LocalCorp) is on the top. This can avoid many issues such as this one. Here are some references on that:
· Delay in NetBIOS connections from a multi-homed computer
· How to properly configure network binding order on a BackOffice Small Business Server system
· DNS: Valid network interfaces should precede invalid interfaces in the binding order
After changing the binding order it worked like a charm!
3. Scenario 2 – Unable to Authenticate using RSA VPN Client
First and foremost, even before start touching on ISA Server, it is very important to review the article below from RSA about how to configure ISA Server 2006 to use RSA for VPN authentication purpose:
http://www.rsa.com/rsasecured/guides/imp_pdfs/Microsoft_ISA2006_AM7.1_VPN.pdf
By following the above article you can pretty much configure all you need in order to make the VPN work with RSA. However in this case, although everything was done, it was not authenticating. The key here it was that in this case the authentication was done using RADIUS.
By looking the Netmon trace while authentication was happening (internal NIC of ISA) we could see that ISA (10.20.20.1) keep sending requests to the RADIUS Server (10.20.20.20) without answer:
Figure 2
According to the system administrator this RADIUS is being used for other purposes and it only fails with ISA. Going to the RADIUS Server it was possible to see the following event on Event Viewer:
Figure 3
That was simple, right? What happens is that if the RADIUS Server doesn’t have the RADIUS Client configured in there, it will fail to answer the request. To fix that we just need to add ISA Server as RADIUS Client and make sure that the secret between them is the same. Here are some references:
Event ID 13 ��� RADIUS Client Configuration
Error 930; The authentication server did not respond to authentication requests in a timely fashion
4. Conclusion
These are more examples about how troubleshoot VPN on ISA Server 2006. Try to not focus so much on ISA itself because many times ISA is not the culprit.
The Update 1 Rollup 3 for Microsoft TMG 2010 is now available for you. This rollup address the following issues:
KB article
Title
2501646
FIX: "A security package specific error occurred" error when you run a recurring report on a Forefront TMG 2010 server that is managed by an EMS and that is in a workgroup
2502685
FIX: "0xc0360007 (STATUS_IPSEC_CLEAR_TEXT_DROP)" error when you try to access the internal IP address of a Forefront TMG 2010 server through an IPsec site-to-site network
2472894
"HTTP/1.1 502 - Error 11 Bad format" error when you access SSL websites that use SAN certificates in Forefront TMG Server 2010 if a non-English version of a Windows operating system is installed
2501650
FIX: "Page Cannot Be Displayed" error when you try to access a website that requires a client certificate authentication on a Forefront TMG client in Forefront TMG 2010 if HTTPS Inspection is enabled
2501776
FIX: "502 Proxy Error. An attempt was made to load a program with an incorrect format. (11)" error when you try to use a HTTPS URL through Forefront TMG 2010 if HTTPS inspection is enabled
2498831
How to configure the "HTTPS inspection caching in a forward proxy scenario" and "HTTPS inspection inclusion list" features in Forefront TMG 2010
2498837
An enterprise node is incorrectly added in Forefront TMG MMC after you install Forefront TMG 2010 SP1 Update 1
2445386
"Sign in as a Different User" does not work on a SharePoint website that is published by Forefront TMG 2010
2498835
PPTP or L2TP/IPsec connection is not reestablished between Forefront TMG 2010 servers
2501777
FIX: "502 Proxy Error. An unknown error occurred while processing the certificate. (-2146893017)" error when you try to access a website over HTTPS in Forefront TMG 2010 if HTTPS inspection is enabled
2497959
Forefront TMG Firewall service may stop when users run desktop sharing software over HTTPS that is proxied by Forefront TMG 2010
2500737
"0xc0040446" or "0xc004041d" error if the primary IP address or DNS address uses 128.0.0.0/16, 191.255.0.0/16, or 223.255.255.0/24 in Forefront TMG 2010
2497858
SCOM logs many "Forefront TMG Server - Cache: Current Cache Fetches Average Ms Per Request error" error alerts from TMG Management Pack through Forefront TMG 2010
2501755
Mspadmin.exe may crash if you do not use SQL Server Express to log traffic in Forefront TMG 2010
2502686
Forefront TMG Firewall service might crash when WP_TRAFFIC tracing is enabled in Forefront TMG 2010
2501782
"0xc004039E" error when you use the "Allow user override" setting for a HTTP deny rule in an enterprise policy in Forefront TMG 2010
2501780
FIX: Forefront TMG Job Scheduler service (Isasched) stops responding on an array member server that is not a report server in Forefront TMG 2010
As you can see there are a lot of fixes in this rollup, I particularly worked in many issues involving 2501650 and 2502686 while the hotfixes were not even ready. Due the nature of those issues I strong recommend you to download this update and plan the installation on your Forefront TMG. To install this update, you must have TMG 2010 SP1 and Update 1 already installed.
Got get it at http://support.microsoft.com/kb/2498770.
Introduction
It could be just another case where we grab some data, look at it and resolve the issue, but it was a long road until we get to the bottom of this. The scenario was quiet simple: clients that were behind TMG couldn’t access HTTPS sites. All other sites using HTTP were working just fine and the most phantasmagoric thing that we noticed was that on TMG Logging we just have entries for 8080, no HTTPS request at all as show below:
On the client, the only thing that was showing up was:
Troubleshooting
All the usual suspects during a standard troubleshooting phase were eliminated (HTTPS Inspection, Malware Inspection, NIS, URL Filtering, HTTP Filter, Network, etc), it was time to thing: what else is left? What about the client? Let’s take a netmon on the client and see what’s going, right? Yep…let’s do it. For my surprise, after starting netmon and repro the issue, we did not even saw the HTTPS Request leaving the workstation; we just saw requests on port 8080 and then the clients sent a FIN:
Digging Deeper
In order to investigate further what was going on I used the tool TCPView, which gives more details information about network connectivity and process that are using for a specific connection. So I tried to access the Bank of America site (which uses HTTPS by default) and here it is the result:
Notice two things:
Note1: I’m not saying the driver name because this behavior was observed in many others third party AV filter drivers.
Ok, now we have some footprints that we can use to investigate further. The process XXX.exe belongs to a third party antivirus. As a temporary test we disabled the AV solution using MSConfig and restarted the workstation. It didn’t help since there were some other kernel drivers loaded in the system. As a temporary measure we uninstalled the AV, tested and worked. At that point we handled over the issue to the AV Vendor.
Note2: Only disable AV solution on the workstation for troubleshooting purpose, validate your tests and enabled it back. It is strongly recommended to have antvirus solution on your system and address potential issues with the AV vendor.
Just another reminder that not even everything that you think is caused by TMG, it actually is….this is just another example of TMG (as it was on ISA) be a victim of another problem.
This week Microsoft released a major update of Forefront TMG 2010 and many TMG Admins are very excited with the new features that were announced in the Forefront TMG team blog, such as the support for Kerberos authentication in an array scenario, the improved error pages and the new site activity report. These are already three reasons to apply SP2 on your TMG, but instead of only adding two other features I’m going to give you five more reasons to apply this update. Here are those:
1. Forefront TMG 2010 SP2 makes TMG startup operation ten times faster.
2. Do you remember KB2498831? No need to run that script anymore, with TMG 2010 SP2 a new option was added in the screen to allow you to do that as shown below:
3. Performance improved for cloud migration.
4. Improvement in the E-Mail Protection feature
5. Account lockout enhancements for FBA.
That’s it…go grab TMG 2010 SP2 and remember: in order to apply TMG 2010 SP2 you need TMG 2010 SP1 + Update 1.
Without a doubt one of the most powerful features that IAG 2007 has is the capability to allow full VPN access using a pure SSL connection. This feature is not available on ISA Server and the only other product that Microsoft has to enable this technology is the SSL VPN that comes with Windows Server 2008. If you want to learn more about the implementation of SSL VPN on Windows Server 2008, read the following article that my friend Tom Shinder recently published on ISAServer.org. The article is called Publishing a Windows Server 2008 SSL VPN Server Using ISA 2006 Firewalls.
The part 1 of my post will show how to configure IAG Network Connector on the server side. The part 2 will show how to establish the connection from outside and some troubleshooting tips.
2. Configuring Network Connector Server
The follow steps assume that you already have a trunk created to be used as your Portal entry point. The whole idea is to create a link on the Portal so the client can use to fully access the internal network.
1. Open the IAG Console, highlight the Portal , click in Admin and then click in Network Connector Server.
Figure 1 – Network Connector Server
Note: for the purpose of this demonstration, the external IP address of the IAG Server is a non-valid Internet IP Address (192.168.0.70).
2. After select this checkbox, the options on this window will be available. Click in the option Use the Following Connection and select the internal network adapter. If you leave the option Only if Network Configuration is Missing then the client that connects from outside will use the same DNS configuration as used on the internal NIC of the IAG Server. If you want to use a different DNS, WINS or Default Gateway, make sure to select the option Always, Overriding Existing Network Configuration of this Server.
3. Click in the IP Provisioning and select the pool type. Here is the definition for each one of the available options:
Option
Description
Corporate IP Address
Select this option to use the same IP range is the internal network of the IAG Server. Make sure to exclude the range from your internal DHCP Server. If the question is: Can I use my current DHCP Server to assign IP address to the remote clients, like RRAS and ISA Server do? The answer is: you can’t. This version of the IAG doesn’t support that.
Private IP Address
Use this option if you want to use another range (such as 10.20.20.0/24) for the remote clients.
4. For the purpose of this demo, we will use the option Corporate IP Address. Click in the Add button and add the range below:
Figure 2 – Adding the Corporate IP address.
Note: IAG will use the first IP address from this pool to the Whale Network Connector adapter.
5. Click OK and click in the IP Provisioning Tab. Here is the definition for each one of the available options:
Split Tunneling
Use this option if you want to allow the VPN user to access the Internet and the corporate LAN at the same time. The corporate access goes through the IAG and the Internet access goes through the user’s Internet access (user’s ISP). This is a good option on the performance standpoint, because the user will not use the company’s link to access the internet.
Non-Split Tunneling
Use this if you want to control the Internet access on the remote client workstation. This option will allow you to restrict the remote user to access Internet, but will add an extra load on the company’s Internet link since all Internet traffic will pass through it. Optionally you can select the checkbox Disable Local Area Network Access to disallow the client to access the local company’s LAN.
No Internet Access
This option will disallow the remote client to access Internet.
IP Spoofing Policy
When enabled, this option will analyze the IP address to verify if this is a spoofed IP.
Protocol Blockers
Allow an extra control on the TCP, UDP and ICMP packages. Be careful because this option is not granular, which means that if you select TCP then you will be blocking all TCP traffic through the VPN connection.
6. For the purpose of this demo, we will use the default options.
7. Click in Additional Networks tab and you will have the option to add other networks that the remote client can use it. Usually this option will be used when the company’s network has more the one IP subnet and you want to grant access to some or all subnets located within your internal network. In this case we will not enable this option.
8. Click in Advanced and the following options will be available:
Figure 3 – Network Connector advanced options.
9. By default the Network Connector’s Listener uses the TCP port 6003, but here you can change it. You also can change the log level from 1 to 5 (where 5 is the most detailed one). You also can change the log location and tune the server’s performance using the Server Resource options. Click in OK to finish the configuration.
10. Press CTRL + G to activate the configuration.
3. Adding the Network Connector to the Portal
Now that we have the feature already enabled, the last step on the server side is to add the Network Connector link into the IAG Portal. Follow the steps below to add that:
1. On the Application option click in Add.
2. Select the Network Connector option as show below:
Figure 4 – Adding the Network Connector link.
3. Click in Next.
4. Type the application’s name and leave the other default options selected. Click in Next to continue.
5. Leave the Server and Port options with the default option selected. Click in Next to continue.
6. Add a name for the application; this name is the one that will appear on the Portal’s web site. Click in Finish.
Now that the full network connector configuration is done, the window will show that IAG is trying to start the service:
Figure 5 –Network Connector service.
If you type ipconfig on the command prompt you will notice that the Whale Network Connector is now using the first IP from the pool that we configured earlier:
Figure 6 –Automatic IP.
On this post we saw how to configure the Network Connector Server and how to add the Network Connector entry on the IAG Portal. Next session will cover what needs to be done on the client side and also some troubleshooting tips.
Quick post just to bring awareness about this new KB that explains how to manually remove Conficker. Follow the steps from:
http://support.microsoft.com/kb/962007
Although the VPN template screen (see figure below) doesn’t seems to have any news on this area, the new TMG Data Packager introduces new logs that can assist you when troubleshooting VPN site to site issues.
The Oakley log file that TMG Data Packager creates contains the IKEEXT.ETL (IKE Tracing) and the WFP.TMF (file that will be used to parse the ETL file) files. In order to parse this file you will need to download the tools TRACEFMT.exe and TRACEPRT.dll from the Windows XP SP2 Support Tools. After installing those tools you can extract the content of the TMG CAB file to a folder and run the command below to parse it:
C:\Program Files\Support Tools>tracefmt.exe Y:\temp\VPN\TmgPackage\IkeExt\ikeext.etl -tmf Y:\temp\VPN\TmgPackage\IkeExt\wfp.tmf -o Y:\temp\IKEOutput.txt
Setting log file to: Y:\temp\VPN\TmgPackage\IkeExt\ikeext.etl
Getting guids from Y:\temp\VPN\TmgPackage\IkeExt\wfp.tmf
Event traces dumped to Y:\temp\VPN\TmgPackage\IkeExt\IKEOutput.txt
Event Summary dumped to Y:\temp\VPN\TmgPackage\IkeExt\IKEOutput.txt.sum
Exit Status: 38
After converting it you can read the IKEOutput.txt file, there you will find the log in the following format:
Package is received and processed according to IPSec Parameters that should match between both endpoints:
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 0|192.168.0.10|Received packet
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 0|192.168.0.10|Local Address: 192.168.0.7.500 Protocol 0
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 0|192.168.0.10|Peer Address: 192.168.0.10.500 Protocol 0
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 1|192.168.0.10|iCookie 98b22fe79d9d675f rCookie 1610c0b30c6bbe60
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 1|192.168.0.10|Exchange type: IKE Quick Mode Length 300 NextPayload HASH Flags 1 Messid 0x3d6edc77
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 1|192.168.0.10|mmSa: 0x000000000265B9F0
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 1|192.168.0.10|Create QMSA: qmSA 000000000265ED60 messId 3d6edc77
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 1|192.168.0.10|Processing QM. MM 000000000265B9F0 QM 000000000265ED60
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 1|192.168.0.10|Process Payload HASH, SA 000000000265B9F0 QM 000000000265ED60
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 1|192.168.0.10|Process Payload ID, SA 000000000265B9F0 QM 000000000265ED60
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 1|192.168.0.10|Process Payload SA, SA 000000000265B9F0 QM 000000000265ED60
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 1|192.168.0.10|QM propNum 1, transformNum 0, peerSpi 3151228040
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 1|192.168.0.10|QM transNum 1
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 1|192.168.0.10|PROTO: ESP Algo 3
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 1|192.168.0.10|IPSEC_LIFE_TYPE: 1
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 1|192.168.0.10|IPSEC_LIFE_DUR: 3600
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 1|192.168.0.10|IPSEC_ENCAPSULATION_MODE: 1
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 1|192.168.0.10|IPSEC_HMAC_ALG: 2
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 1|192.168.0.10|IPSEC_GROUP_DESC: 2
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 1|192.168.0.10|IsRecvPolicyTunnelPolicy: TRUE
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 1|192.168.0.10|Looking up QM policy for IKE
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 1|192.168.0.10|QM localAddr : 10.10.10.0.0 Mask 255.255.255.0 Protocol 0
[0]00F8.0B50::01/01/1601-05:01:53.375 [ikeext] 1|192.168.0.10|QM peerAddr : 10.40.40.0.0 Mask 255.255.255.0 Protocol 0
Policy identification and verification
[0]00F8.0B50::01/01/1601-05:01:53.387 [ikeext] 1|192.168.0.10|Policy
GUID: {a167bf6c-78ff-4b3d-b619-1ea03d29664a}
LUID: 0x8000000000000003
Name: ISA VPN S2S tunnel to network STSTMG
Description: (null)
Flags: 0x00000000
Provider: <unspecified>
Provider data:
Verification of the Quick Mode parameters
Type: IKE Quick Mode Tunnel
Proposals: 1
-- 0 --
Lifetime:
Seconds: 3600
Kilobytes: 100000
Packets: 2147483647
PFS group: 2
SA transforms: 1
Type: ESP-Auth & Cipher
Auth transform:
Type: SHA1
Config: HMAC-SHA1-96
Crypto module: <unspecified>
Cipher transform:
Type: 3DES
Config: CBC-3DES
Flags: 0x00000080
Dont negotiate 'byte' lifetime
Local tunnelEndpoint: 192.168.0.7
Remote tunnelEndpoint: 192.168.0.10
Normal idle timeout (seconds): 300
Idle timeout in case of failover (seconds): 60
.
. log continues..
The log can be pretty extensive and it is very important to know what you are looking for (which error are you chasing), mainly when the scenario is related to TMG site to site VPN with third party vendors. Sometimes the IPSec parameters doesn’t match and this is the most common cause for failures during the IPSec negotiation. This logging can be pretty handy in those scenarios since it gives verbose information about what it is happening behind the scene.
I had this article almost ready way back when I was on Forefront TMG Team but never had time to finish. This is about an issue where wspsrv.exe process was consuming high CPU in random moments of the day and the only workaround to make this process to use less CPU was to restart Firewall Service. Maybe the behavior sounds familiar, but the final resolution was never documented here in this blog.
2. Gathering Data
Using Process Monitor was possible to see that there were lots of ETW Trace threads running as shown below, which was kind of interesting to me:
To move forward in this investigation the usual perfmon and dump of the wsprv.exe process were collected while the issue was happening.
3. Analyzing the Data
Using the same approach that I documented in the Troubleshooting Forefront TMG 2010 Performance issues Cheat Sheet it was possible to notice a pattern in the threads that were stuck in Critical Section, all of them had a similar stack as shown below:
At that point it was clear to me that the component involved in such behavior was NIS, because is NIS that uses GAPA Engine (read NIS white paper for more information). As a test we disabled NIS and restarted Firewall Service and as a result of this action the issue stopped occurring.
Of course this was not the solution, as we don’t want to permanently disable this feature, but at least confirmed that NIS was the component causing the issue. We enabled NIS again and the issue came back. Another set of dumps and Process Monitor analysis lead the investigation to confirm that verbose tracing was enabled causing NIS to impact wspsrv.exe process by consuming more CPU. The traces are:
The value possible values are: 0, 1, 2, 3 and 4 corresponding to Error, Warning, Info, Function and Noise, respectively. In this case it was 4, which indeed caused a lot of noise. The resolution was to change back to zero and restart firewall service. It is important to clarify that is not always that this behavior will happen when the lower level trace is high, in order words, don’t think you can always repro this issue by just increasing this value. The issue was a combination of factors, in this particular scenario the server was very busy and by having the lower level trace so high the CPU utilization was increasing. The overall recommendation is to increase this value only for troubleshooting purpose and decrease after collecting data.
Whatta busy week here with this problem. As you might be aware, Exchange 2010 SP1 breaks the integration that TMG has with Exchange Edge when using E-Mail Protection feature, the issue causes the TMG Managed Control Service to fail to start. This issue is documented on our team blog http://blogs.technet.com/b/isablog/archive/2010/09/01/problems-when-installing-exchange-2010-service-pack-1-on-a-tmg-configured-for-mail-protection.aspx and on Exchange Team blog at http://msexchangeteam.com/archive/2010/09/01/456094.aspx. We are working on a hotfix for this issue and more news will be posted on the TMG Team Blog as soon as we test the fix in our lab and we confirm that works.
One thing that it is important to keep in mind is the following: don’t make changes directly on Exchange or FPE installed on the TMG Server when this issue happens. What I mean with that? Here it is the issue flow so you can better understand:
The reason behind that is because if you make changes, there is a possibility that once we fix the original issue (with the hotfix that will come up soon), TMG Managed Control Service will still failing to start with the error message below:
…and on Event Viewer you might see the error below:
Log Name: Application
Source: Microsoft Forefront TMG Control
Event ID: 31308
Task Category: None
Level: Error
Keywords: Classic
The Forefront TMG Managed Control service failed to initialize. Error information: Value does not fall within the expected range.
This error has nothing to do with the issue introduced by Exchange 2010 SP1, this is another problem that can be caused by changes that are made directly on those products and when TMG Managed Control Service tries to synchronize the config, it fails. It is important to mention that this failure doesn’t always happen, the normal behavior is to TMG detect the changes and undo it. When TMG is able to do that you will see the following alert:
Solution? Undo the manual changes that were done via those products' console and make sure to always use TMG to manage the configuration among those products (at least this is the behavior now TMG RTM and SP1). In this particular case that TMG Managed Control Service is down, you can’t make changes via TMG console because it will not sync with Exchange Edge and FEP, therefore you should hold any change to be done only after the hotfix for this issue (Exchange 2010 SP1 problem) is released and you had applied on the system.
Great news folks! Netmon Team just released Netmon 3.3 for public download. This release brings new enhancements from previous version, such as:
· Ability to capture on WWAN and Tunnel interfaces on Win7
· Right-click add to alias.
· Right click go to definition
· Auto-scroll
· API Extension…and many others.
Don’t wait go grab your new tool at:
http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=983b941d-06cb-4658-b7f6-3088333d062f
Also, remember that there are many netmon experts available at http://nmexperts.codeplex.com
Have fun!
Throughout the years working with ISA and TMG I notice that one of the most challenging configuration for many Admins is to correctly setup the network settings on ISA/TMG. Although we have some great content out there about the subject, such as the An Inside Look into TMG Firewall Networks by Deb Shinder and the great series of 3 articles written by Tom Shinder: Overview of ISA and TMG Networking and ISA Networking Case Study (Part 1), Overview of ISA and TMG Networking and ISA Networking Case Study (Part 2) and Overview of ISA and TMG Networking and ISA Networking Case Study (Part 3). Those are “most read” articles if you are planning your network configuration on TMG, add also on top of that the article Planning Forefront TMG network topology. But, in case you inherited an environment with Forefront TMG and you are experiencing weird problems, than it is time to step back and review your configuration. This post will highlight a very common configuration mistake that can cause network route issues.
Problem: clients on remote networks were randomly unable to access Internet.
Scenario: In this case the TMG Admin has the following topology that he needs to configure:
When TMG Admin opens up TMG Management Console this is what he see:
When he look at this he thinks it is okay. But you know what, IT IS NOT!! Here are the reasons:
For the second bullet, I want to call out the original source of this information, which says:
Forefront TMG does not support defining separate network objects that represent remote subnets
Issue: Forefront TMG does not support defining separate network objects that represent remote subnets.
Cause: When you define IP address ranges for a network, Forefront TMG checks all network adapters. When Forefront TMG finds an adapter with an IP address in the network range, it associates the network with that adapter. When a network includes remote subnets accessible by Forefront TMG through routers, the IP address of the remote subnets should be included in the network definition. If you define a separate network object for a remote subnet (instead of including it in the network definition), Forefront TMG tries to locate an adapter with an IP address of the network object, and fails. Forefront TMG assumes that the adapter is not available (disconnected or disabled), and sets network status to disconnected.
Solution: For best practice when defining your network configuration in Forefront TMG, take note of the following:
From: http://technet.microsoft.com/en-us/library/ee796231.aspx#NetworkAndRoutingIssues
This is a VERY common mistake and I’ve seen this over and over. The main argument that I also hear is: it always worked like that, why this is a problem now? Well, mainly because it is not supported, which means that Microsoft can’t guarantee that your setup will be functional with this setting.
Resolution: The correct way to setup this particular environment is:
Plan your network setting, if you can’t plan, make sure to review the TMG Alerts, usually TMG is screaming out loud saying that there is something wrong in this area.
Many companies are starting to budget for the second half of calendar year 2010, some companies actually start the fiscal year 2011 now in July, regardless of which scenario are you in the fact of the matter is that during this time of the year many companies are getting ready to overcome currently challenges, re-evaluate the current technologies in place and analyze migration options across the core platforms. In recently conversations that I had with ISA Server admins I notice that there are some common scenarios where ISA Server capabilities are limiting the company to achieve their business goal. The good thing is that TMG can indeed help on that.
On this post I will enumerate the top 5 scenarios where TMG can overcome ISA limitations in order to achieve the desired goal.
Scenario 1 - Consolidation and Dynamic Control
“Currently I have ISA Server 2006 that works pretty well on my company, but as our business is growing and the IT people in my team are shirking I would like centralize many things on the perimeter in order to facilitate the administration. For inbound scenario my main problem is E-Mail, it will be cool if I could have one single server to manage firewall policies as well as Exchange spam filtering capabilities. For outbound scenario I would like to have more control over the sites that my users can access in a dynamic manner, I’m can’t keep up with all suspicious site and add them to my Block List, which is a URL Set that I created. There is anything on ISA that can help me with that?”
Solution
No, ISA can’t help too much here. However, with TMG you can integrate Forefront Protection for Exchange, Exchange Edge and TMG itself in a single box. The combination of those products will allows you to implement the E-Mail Protection feature and manage all the policies in single location, which is TMG console. For your outbound challenge, TMG can indeed help here. You can use URL Filtering capability that uses MRS in order to dynamically categorize URLs that your users will access. Yes, this is the end of your endless attempt to keep up will all sites on the URL Set.
Scenario 2 – Protection against Malware
“Recently we got hit by a malware, it was pretty bad but we were able to contain the damage and cure all affected machines. After that we started a post mortem analysis to understand how this happened since all corp net workstations have anti virus, sadly we found the breach. Our guest network was not enforcing that guest computers have antivirus and I remember why we disabled that enforcement there, it was a political decision. The problem is that I have no idea if the user brought the malware or got this malware while browsing Internet through our proxy. We now need to be able to have a type of protection on the edge that can block attempts to download malicious content and help to protect unmanaged workstations that have no antivrus. Not sure if ISA can do that, please advise.”
No, ISA cannot do that. This is actually a very strong point on TMG Malware Inspection feature. With this feature enabled you can keep up the latest signature, regardless of the client workstation state (managed or unmanaged). If TMG detects an attempt to download a file that is infected, TMG will try to clean this file and if it can’t clean it will block the access to it (according to administrator's choice). The user name, file name, threat and URL will be stored on the TMG logging and you can quickly identify who attempted to download the infected file and the site that the user was trying to download it from. Yeah, I know, it’s awesome.
Scenario 3 – Keep up with the updates
“My currently ISA deployment it’s is using the 3-Leg template, I have some servers on the DMZ. Those servers are highly utilized and I’m having hard time to keep up them updated based on the monthly patch Tuesday Microsoft update cycle. The whole change request process to install new updates on the server plus the request to restart the server can take up to two weeks, in order words: my servers that live on the DMZ can be out of date for up to two weeks. In a recently internal auditing process the auditors saw that breach and we need to come up with a solution where we can mitigate that without reduce the two weeks gap that we have to apply security patches. Can ISA help us on that?”
ISA will not be able to help you to achieve this goal but TMG will. With TMG Network Inspection System you can mitigate known Microsoft vulnerabilities from being exploited via a traffic that cross TMG networks. NIS will grab the updates from Microsoft Update Service and will inspect all traffic that cross TMG, since your servers are on DMZ, NIS will evaluate traffic that are going to the DMZ (or coming from the DMZ) and verify if that traffic matches with any NIS signature, if it does and the action is set up to block, TMG will block the traffic and trigger an alert so you can easily identify a potential exploitation attempt. Now this is cool, isn’t it?
Scenario 4 – Controlling Remote Users
“We just migrate all of our domain to Windows Server 2008 and we are now implementing NAP. Since our VPN solution is based on ISA Server 2006 I would like to integrate NAP with ISA 2006, can I do that? Also, we want to allow user to connect to our VPN via SSTP. Does ISA supports SSTP?”
ISA does not integrate with NAP neither offer built in SSTP capabilities, good thing is that TMG does both. With TMG you will take advantage of Windows Server 2008 x64 bits platform which is much more robust and will be able to natively integrate with NAP via TMG Console. On top of that, TMG will also be able to help you to enable users to connect via VPN using SSTP protocol since this feature comes built in with the product. “Two birds with with single stone”, this is what I’m talking about.
Scenario 5 – We can’t stop
“Our company is growing in a fast pace, which is great, but we are becoming more and more dependent on the Internet. Recently we had an outage on our Internet connectivity with our ISP because our border router broke and we had to replace it. This replacement took two hours, it was a chaos in our company without Internet connectivity. Since this day my manager is under pressure to implement a backup plan so we have fault tolerance Internet connectivity in case the main connectivity with our ISP goes down again. I want to use ISA 2006 for that, but I’m not sure how. Any clue on how to do that?”
ISA Server 2006 doesn’t offer a built in ISP Redundancy capability that can assist you on that, but TMG does. With the new ISP Redundancy capability on TMG you can have two paths to the Internet that can be used as fail over mechanism or load balancing mechanism. This will allow you to achieve your goal and be up and running with Internet connectivity in a matter of seconds if your main ISP goes down. You’re welcome.
These are only 5 of many other scenarios that TMG can assist you to overcome the challenges that your company might be facing to keep up the business running in a secure manner. If you have ISA Server 2006, 2004 or even the almost dead ISA 2000 (extend support finishes April 2011) you should be planning your TMG migration and I will remember you again: chapter 6 of the TMG Book is your friend for that.
There is not more frightening then knowing that you can’t backup your server and when I say knowing is because many times administrators don’t backup because they forgot or because they think someone else is doing. But when you know that the backup cannot be performed, than you know that Murphy’s Law might get you.
This post is about a scenario where Firewall administrator was trying to backup ISA Server configuration and it was receiving the following error:
Figure 1 – Error when trying to perform a backup.
Understanding the Error Message
Before panic it is important to take a deep breath and understand what this error message is trying to say to you. If you BING for the error code 0xc0040357 (simple like that http://www.bing.com/search?q=0xc0040357 ), you will find the KB 922222, which is a good start. But for now, let’s leave this on hold and understand the error description. It says:
“The Web Listener referenced by HTTP Compression
HTTP-Compression-Configuration does not exist.
The error occurred on object ‘HTTP-Compression-Configuration’ of class
‘HTTP Compression’ in the scope of array ISACONTN1
I’m highlighting the keywords of this error message, with that we know that the problem is related to:
· A Web Listener that does not exist anymore however it is present in the HTTP Compression option.
To confirm that this understanding is correct, next step is to try to open the HTTP Compression Preferences under General and see if we are able to see the properties. When we do that we get the error below (which confirms that there problem resides in there):
Figure 2 – Error when opening HTTP Compression options.
2. Investigating Further
This scenario happened in an ISA Server 2006 Enterprise Edition that apparently had no problem, everything was working fine, nodes were in sync and the only operation that was failing was the backup. Since this is an Enterprise Edition we have ADAM and having ADAM we know that the values are primarily stored there. According to the error message the error occurred in the object ‘HTTP-Compression-Configuration’, therefore we should take a look on this object.
In order to do that we can use the same approach explained in KB922222 (the one that I mentioned before). Since that article is for ISA 2004, there is a slightly difference when trying to connect to CSS using ADAM on ISA 2006. Instead of using the Distinguished Name CN=FpcConfiguration, you will use CN=FPC2 as shown below:
Figure 3 – Connecting to CSS using ADAMADSIEdit.
After connecting to ADAM, browse to the following location:
Figure 4 – Looking for the HTTP-Compression-Configuration object.
Notice that under the HTTP-Compression-Configuration object we have another object called WebListenerUsed and in this case it has a GUID for a WebListener. This GUID is not the real name of the WebListener, this is actually the CN for this object. To see the name of the real web listener in which this object is referring to, you need to right click on this object and choose properties. Look for the attribute called msFPCName as shown below:
Figure 5 – Checking the msFPCName for this attribute.
If this is a valid listener, we should see this GUID under CN=RuleElements,CN=WebListeners as shown below:
Figure 6 – Valid Web Listeners.
Notice that in this case I do not have this value in there, which means that this is the reason why we are receiving this error. In other words: there is an object present under HTTP-Compression-Configuration object that has an attribute that points to an invalid object.
3. Now What?
Since this is an invalid object we need to remove it from there. However, before do that it is important to emphasize that before any intervention directly on ADAM or Registry make sure that you have a backup of your system. In this case since we cannot backup the whole array at least we should backup the Firewall Policy (which works fine since doesn’t look for that object). Also, before delete this object, you can dump it using LDIFDE, so you can have a backup of the attributes for this object (in case you need). To export this object uses the command below:
C:\>ldifde -t 2171 -f backup.ldf -s isacontn1 -d "CN={58231C84-C3B7-4BF7-9A18-1943A657D410},CN=WebListenerUsed,CN=HTTP-Compression-Configuration,CN=WebProxy,CN=ArrayPolicy,CN={878CC789-AF34-48A1-849B-89A806E2CB88},CN=Arrays,CN=Array-Root,CN=FPC2"
Connecting to "isacontn1"
Logging in as current user using SSPI
Exporting directory to file backup.ldf
Searching for entries...
Writing out entries.
1 entries exported
The command has completed successfully
Notes:
· -t allows you to specify which port you are going to use to connect to ADAM. In this case port 2171,
· -d allows you to specify the Distinguished Name (DN) of the object that you want to dump. In this case you need to open the properties of the object and search for the DN.
The output (backup.ldf) of this command for this case is:
dn: CN={58231C84-C3B7-4BF7-9A18-1943A657D410},CN=WebListenerUsed,CN=HTTP-Compression-Configuration,CN=WebProxy,CN=ArrayPolicy,CN={878CC789-AF34-48A1-849B-89A806E2CB88},CN=Arrays,CN=Array-Root,CN=FPC2
changetype: add
objectClass: top
objectClass: msFPCRef
cn: {58231C84-C3B7-4BF7-9A18-1943A657D410}
distinguishedName:
CN={58231C84-C3B7-4BF7-9A18-1943A657D410},CN=WebListenerUsed,CN=HTTP-Compressi
on-Configuration,CN=WebProxy,CN=ArrayPolicy,CN={878CC789-AF34-48A1-849B-89A806
E2CB88},CN=Arrays,CN=Array-Root,CN=FPC2
instanceType: 4
whenCreated: 20090917112759.0Z
whenChanged: 20090917112759.0Z
uSNCreated: 321498
uSNChanged: 321498
name: {58231C84-C3B7-4BF7-9A18-1943A657D410}
objectGUID:: CkNXAblXDkWSUJaZY5Bexw==
objectCategory:
CN=msFPC-Ref,CN=Schema,CN=Configuration,CN={F2298771-D6AA-42E1-B32D-4C0DCFD325
4D}
msFPCRefClass: msFPCWebListener
msFPCName: {DC6A3B0D-9E21-454D-BF68-00E9A79C4E3E}
Now that we have a backup of the Firewall Rules and a dump of the object that we are deleting, let’s get rid of this invalid object. For this particular scenario the object is the one below:
Figure 7 – Object that needs to be eliminated in this case.
After highlight this value, press delete or right click on the object and choose delete.
Note: For ISA Server 2006 Standard Edition, you have to delete this value from the registry.
4. Validating the Procedure
After remove the bad entry you should make sure that the array configuration is in sync, you can force a change, such as disabling a rule or changing the name of the rule. This is just to force a new synchronization. Now after doing that you should be able to open the HTTP Compression properties, as shown below:
Figure 8 – The bogus web listener was listed in there before, now is clear.
5. Conclusion
Fixing corrupted objects on ADAM is not always straight forward like this, sometimes you can’t really determine easily which object is corrupted because there are many objects to evaluate, compare values, etc. In scenarios where you can’t determine, don’t try to “guess” which object is corrupt and delete it without 100%, better to get an ISA Data Packager in repro mode using the Administration template and open a case with MS CSS for further analyzes.
Last week we (ISA Server Team in Texas) faced an interesting issue where remote Outlook Clients using RPC over HTTPs were not able to communicate with the internal Exchange Server. Pretty challenge case since on the ISA Server side there was nothing really obvious missing, netmon also didn’t help that much, but the old netstat tool was “The MAN” to alert us about the issue. The problem ended up to be caused by Port Exhaustion on ISA Server 2006 and netstat helped us to identify that. The approach used was the same as explained by this great post from DS Team about Port Exhaustion.
It is important to bring here the scalability problem when the ISA is not correctly sized, mainly when you are publishing Outlook Anywhere. To really know the impact that Outlook Anywhere (AKA RPC over HTTPs) can cause read the article Outlook Anywhere Scalability with Outlook 2007, Outlook 2003, and Exchange 2007. After reading this article, make sure to correct size your ISA Server 2006 using the ISA Server 2006 Capacity Planning Simulator.
For tuning purpose you also can use the TcpTimedWaitDelay registry key to faster release TCP socket connection, read the article Avoiding TCP/IP Port Exhaustion for more details. Although this article is for BizTalk, the context of the problem is the same since it is something related to the Windows OS level where the application (in this case ISA) is affected.
Troubleshooting high CPU utilization can be challenging, even when you know the process that is causing such behavior, which in this case is the wspsrv.exe. In order to better understand what’s going on in scenarios like this is recommended to get perfmon snapshot, including the time that the process is working normally, until the issue happens. It is also recommended to get a user mode dump of the wspsrv.exe process; to do that read this article here. This post is based on a scenario where wspsrv.exe process was consuming lots of CPU resource during business hours. The issue did not happen out of the business hours.
2. Troubleshooting
In this case perfmon was very decisive because it was possible to see the normal behavior (wspsrv.exe using around 20% average) and then after starting business hours the CPU utilization by wspsrv.exe process grows consistently. However, having lots of perfmon counters without knowing exactly what to start looking at is not very useful, it was the user mode dump of the wspsrv.exe process that triggered what to look for. For this particular scenario we had around 1000 threads running within wspsrv.exe process and around 200 threads had the same pattern as shown below:
0:000> ~50 kv
*** ERROR: Symbol file could not be found. Defaulted to export symbols for mpengine.dll -
Child-SP RetAddr : Args to Child : Call Site
00000000`1ea0e520 00000000`70433461 : 00000000`889f7b60 00000000`889f7b50 00000000`889f7bb0 00000000`889f78d8 : 0x1e774914
00000000`1ea0e570 00000000`70432f68 : 00000000`889f7828 00000000`889f7828 00000000`889f7bb0 00000000`7046ba90 : mpengine+0x13461
00000000`1ea0e630 00000000`704cec70 : 00000000`889b0040 00000000`889f7bb0 00000000`00000012 00000000`889f78d8 : mpengine+0x12f68
00000000`1ea0e6e0 00000000`704cdb4b : 00000000`000000fa 00000000`00000001 00000000`00000009 00000000`00000000 : mpengine+0xaec70
00000000`1ea0e8e0 00000000`7043ddd0 : 00000000`16d285f0 00000000`16d285f0 00000000`00000000 00000000`16d285f0 : mpengine+0xadb4b
00000000`1ea0ea00 00000000`7043e447 : 00000000`16d28110 00000000`16d285f0 00000000`00000000 00000000`00000000 : mpengine+0x1ddd0
00000000`1ea0ebf0 00000000`7043e1bf : 00000001`804fbf93 00000000`00000000 00000000`09180492 00000000`00000001 : mpengine+0x1e447
00000000`1ea0ec90 00000000`704adfb6 : 00000000`16d28100 00000000`16c66f01 00000000`00000000 00000000`00000000 : mpengine+0x1e1bf
00000000`1ea0ed20 00000000`7057b1b1 : 00000000`183c0e00 00000000`7068aa80 00000000`00000000 00000000`16d28110 : mpengine+0x8dfb6
00000000`1ea0edb0 00000000`7057b5e3 : 00000000`21825080 00000000`7068aa80 00000000`00000001 00000000`183c0e00 : mpengine+0x15b1b1
00000000`1ea0ee10 00000000`7057b7ba : 00000000`21825080 00000000`1ea0ef90 00000000`21825080 00000000`21825080 : mpengine+0x15b5e3
00000000`1ea0eed0 00000000`7057b6ab : 00000000`00000000 00000000`0000800c 00000000`00000001 00000000`21825080 : mpengine+0x15b7ba
00000000`1ea0ef20 00000000`70686365 : 00000000`00000000 00000000`00000020 00000000`5d69f640 00000000`5d69f640 : mpengine+0x15b6ab
00000000`1ea0ef50 00000000`7057da92 : 00000000`00000020 00000000`00004032 00000000`21825080 000007fe`f1a029e4 : mpengine!GetSigFiles+0x5b035
00000000`1ea0f080 00000000`7057f4d6 : 00000000`569def80 00000000`00000001 00000000`1d730548 00000000`76ecccd4 : mpengine+0x15da92
00000000`1ea0f230 00000000`7057f605 : 00000000`569def80 00000000`1ea0f440 00000000`00004032 00000000`76fb3660 : mpengine+0x15f4d6
*** ERROR: Symbol file could not be found. Defaulted to export symbols for EmpScan.dll -
00000000`1ea0f260 000007fe`f1a0d2d9 : 00000000`00000000 00000000`712c32c3 00000000`0195da28 00000000`76e901f2 : mpengine!rsignal+0x105
00000000`1ea0f390 000007fe`f1a0c668 : 00000000`569decd0 00000000`00004032 00000000`1ea0f440 00000000`00000020 : EmpScan!SetProxyStatsObject+0x9289
00000000`1ea0f400 000007fe`f1a104ff : 00000000`569decd0 00000000`6c8eeb68 00000000`6c8eeb70 000007fe`f19f8bc8 : EmpScan!SetProxyStatsObject+0x8618
00000000`1ea0f550 000007fe`f1a0904f : 00000000`6c8eeb70 00000000`0385b5f0 00000000`60e43070 00000000`000003b8 : EmpScan!SetProxyStatsObject+0xc4af
EMPScan and MPEngine are part of Malware Inspection as explained in this KB. This raises a flag that the good amount of CPU utilization by wspsr.exe process was caused by those 200 threads that were using functions that belongs to Malware Inspection. Having those 200 threads were already suspicious, but it could be that server was just too busy during that time performing malware inspection. In order to move I verified if there was any thread locked in Critical Section. To do that the following command was used !cs -l and the result was:
-----------------------------------------
DebugInfo = 0x0000000000343980
Critical section = 0x000000000032b500 (+0x32B500)
LOCKED
LockCount = 0x0
WaiterWoken = No
OwningThread = 0x0000000000000b28
RecursionCount = 0x1
LockSemaphore = 0x0
SpinCount = 0x0000000000000000
DebugInfo = 0x00000000324861f0
Critical section = 0x00000000648f12d0 (+0x648F12D0)
OwningThread = 0x0000000000000b1c
SpinCount = 0x0000000000000fa0
DebugInfo = 0x00000000328a9200
Critical section = 0x000000005f1b8240 (+0x5F1B8240)
OwningThread = 0x00000000000011d0
….and many more…
Now that we know the threads that are locked in Critical Section, we just need to get more details about those threads by using the command !cs –o <owning thread>, in this case the relevant thread was:
0:000> !cs -o 0x000000005f1d9460
Critical section = 0x000000005f1d9460 (+0x5F1D9460)
DebugInfo = 0x00000000565fd4f0
OwningThread = 0x000000000000194c
RecursionCount = 0x5
LockSemaphore = 0x43E80
OwningThread DbgId = ~889s
OwningThread Stack =
00000000`78fde658 000007fe`f0c0a6c9 : 00000000`566fa638 00000000`78fde6c4 00000000`566fa638 00000000`00000000 : EmpFilter!DllUnregisterServer+0x1c6b7
00000000`78fde660 000007fe`f0c22a52 : 00000000`566fa630 00000000`31eb9120 000007fe`f0bc9198 00000000`4a74d9d0 : EmpFilter!DllUnregisterServer+0x20431
00000000`78fde7b0 000007fe`f0c15f53 : 00000000`31eb9100 00000000`566fa6fc 00000000`4a74d9d0 00000000`00000000 : EmpFilter!DllUnregisterServer+0x387ba
00000000`78fde830 000007fe`f0c0c11b : 00000000`31eb9100 00000000`566fa6fc 00000000`4a74d9d0 00000000`4a74d9d0 : EmpFilter!DllUnregisterServer+0x2bcbb
00000000`78fde8a0 000007fe`f0c172ee : 00000000`566fa630 00000000`00000000 00000000`566fa648 00000000`78fde960 : EmpFilter!DllUnregisterServer+0x21e83
00000000`78fde920 000007fe`f0c0d7d4 : 00000000`566fa638 00000000`00000000 00000000`566fa740 00000000`00001355 : EmpFilter!DllUnregisterServer+0x2d056
00000000`78fde9a0 000007fe`f0c12c67 : 00000000`566fa630 00000000`618a3ba0 00000000`00001355 00000000`4a74d6b0 : EmpFilter!DllUnregisterServer+0x2353c
00000000`78fdeb20 000007fe`f0c1afad : 00000000`566fa630 00000000`618a3ba0 00000000`00001355 00000000`00000000 : EmpFilter!DllUnregisterServer+0x289cf
00000000`78fdebd0 000007fe`f0be40ee : 00000000`6cfd8640 00000000`30f6eaf0 000007fe`f0bca490 000007fe`f0bd133f : EmpFilter!DllUnregisterServer+0x30d15
00000000`78fdec60 000007fe`f0c3270b : 00000000`1af7a7b0 00000000`56cdf680 00000000`30f6eaf0 00000000`78fded64 : EmpFilter+0x240ee
00000000`78fded40 00000000`72ebf77b : 000007fe`f0c5eb18 00000000`56cdf680 00000000`00000400 00000000`30f6eaf0 : EmpFilter!DllUnregisterServer+0x48473
00000000`78fdeda0 00000000`72ed316d : 00000000`56cdf620 00000000`618a3ba0 00000000`00001355 00000000`00001355 : W3Filter!DllInstall+0x2419f
00000000`78fdeec0 00000000`72eabe84 : 00000000`562b4480 00000000`618a3ba0 00000000`00001355 00000000`78fdefe0 : W3Filter!DllInstall+0x37b91
00000000`78fdef30 00000000`72efcacd : 00000000`562b44b8 00000000`1ad135e0 00000000`618a3ba0 00000000`00001355 : W3Filter!DllInstall+0x108a8
00000000`78fdefd0 00000000`72f031bc : 00000000`27109010 00000000`00000001 00000000`000003e5 00000000`00000000 : W3Filter!DllInstall+0x614f1
00000000`78fdf040 00000000`72f038cb : 00000000`27109010 00000000`00000000 00000000`00000000 00000000`72e5d9d0 : W3Filter!DllInstall+0x67be0
00000000`78fdf140 00000000`72f04420 : 00000000`27109010 00000000`27109030 00000000`00000000 00000000`565da0e0 : W3Filter!DllInstall+0x682ef
00000000`78fdf180 00000000`72ede14f : 00000000`27109030 00000000`00001355 00000000`00000000 00000000`00000000 : W3Filter!DllInstall+0x68e44
00000000`78fdf1c0 00000001`3fcc016e : 00000000`4a5c5e80 00000000`00000001 00000000`00000000 00000000`565da0e0 : W3Filter!DllInstall+0x42b73
00000000`78fdf4a0 00000001`3fcbf557 : 00000000`3235bd50 00000000`00001355 00000000`00000000 00000000`00000001 : wspsrv!IsChainingRequired+0x582d2
3. Resolution
The resolution for this problem is documented on this article here at TMG Team Blog.
4. Another Approach to Troubleshoot Issues of this Nature
Another approach to troubleshoot high CPU utilization is by leveraging the perfmon capabilities to identify the threads that are using more resources within the process. For example, in this scenario the perfmon was showing the following result:
Here you have two key counters within thread object
· %Processor Time: specify how much processor this thread is using.
· ID Thread: represents the ID of the thread that you can use to correlate that on the dump for this process.
Using the recommendations from KB133385 we have:
1. Monitor each thread's %Processor Time and identify the instance number in Performance Monitor for the thread that takes the most processor time.
2. Find the ID Thread item that has the same instance number as the thread you identified in step 1 and select this ID Thread with the mouse. Note that its Last and Average status indicators have the same value. This value is the decimal thread ID.
3. Convert the decimal thread ID to a hexadecimal (hex) number to get the thread ID for desired component.
With the HEX value of the thread ID in hands you can go back to the dump file and look which thread is that and what this thread is doing.
Note: there is a really good step by step on Marcelo’s blog. Also, make sure to read the Chapter 13 of the Windows NT Resource Kit (old buddy that still rocking).
More and more the companies are allowing the employees to access the internal resources remotely. The good thing is that they are also concern about security. For remote access, the VPN still one of the most used solutions for small, medium and large companies. This scenario was based on a case that I worked when I was on the Platform team and it was about a problem while the user was trying to access an internal resource while connected through the VPN.
The user was traveling to participate on a congress and he was using the hotel’s network to connect to the VPN Server. The VPN connection successfully succeeds but the user was not able to access the file server located on the company’s network. Result: user was frustrated because he needs to have access to the file server right away.
Figure 1 – Network Topology used in this case.
2. How the Routing Decision Works
On this case the problem was related to routing table, however I will get back to that on the next topic. Before explain the resolution for the case let’s review the essentials of routing decision. In this case the goal is to answer the question: how the host knows that needs to send a package to a local host or to the router?
When we add an IP address to the host the system will automatically create a routing table. If the default gateway address is not specified the system will create only routes to the local network. Let’s get an example of a host that has the IP address equals to 192.168.0.250. If the user only adds this IP on the TCP/IP properties and do not add the default gateway, the routing table will appear like that:
Network Destination
Netmask
Gateway
Interface
Metric
127.0.0.0
255.0.0.0
127.0.0.1
1
192.168.0.0
255.255.255.0
192.168.0.250
20
255.255.255.255
192.168.0.255
224.0.0.0
240.0.0.0
Here the description of the main IP’s that is in use on this table:
· 127.0.0.0/8 – This is reserved for loopback purpose. As you can see, this network is using the interface 127.0.0.1 which is the host itself. This address is defined on RFC 3330.
· 192.168.0.0/24 – This is the route for the network that the host belongs to. The interface that will be used to handle this package is the host’s local interface (192.168.0.250).
· 192.168.0.250 – This is the route for the points to the host itself (loopback address).
· 192.168.0.255 – This is the route for the local broadcast. The interface that will be used for that is the host’s local interface (192.168.0.250).
· 224.0.0.0 – This is a reserved block for multicast purpose. The interface that will be used for that is the host’s local interface (192.168.0.250). This address is also defined on RFC 3330.
· 255.255.255.255 – Used for broadcasting purpose.
On the routing table above you could see that there is no default route and this is because we don’t have a default gateway specified on the host. This will limit the communication from this host to only other hosts that are on the local segment. For example: if this host uses the command ping 10.10.10.1 the result will be the error Destination Unreachable. Since we don’t have route for that and we don’t have no one to ask for help we can’t deliver the package.
Besides that, also notice that the last column on this table defines the Metric. The Metric is used to prioritize the path that the host will use to send the package. In case we have two entries on the routing table for the same destination, the system will choose the route with lower metric.
Now let’s see what will be different if we just add the default gateway to the TCP/IP stack of this host:
0.0.0.0
192.168.0.3
The first route in this table is pretty much saying: If you don’t know to where to send this package, let this gateway (192.168.0.3) handle it. The 0.0.0.0 on the destination network means all networks and the 0.0.0.0 on the netmask means all netmasks.
2.1. ANDing
The process that the host will use to determine if the destination IP belongs to the local network or to a remote network is called ANDing. Let’s see one example:
The source host (192.168.1.200/24) is trying to ping the destination (192.168.2.200/24). At this point how the source host determines if the destination belongs to the local network or not?
Step 1 – Convert the decimal number in binary:
Decimal Notation
Binary Notation
Source
192.168.1.200
11000000.10101000.00000001.11001000
11111111.11111111.11111111.00000000
Destination
192.168.2.200
11000000.10101000.00000010.11001000
Step 2 – Use the Logical AND to find the result between the IP address and the netmask
0
AND
=
This way we have the following result on the table:
Source IP
Source Network
11000000.10101000.00000001.00000000
192.168.1.0
Destination IP
Destination Network
11000000.10101000.00000010.00000000
192.168.2.0
As you can see source network and destination network doesn’t match, so the source host knows that the destination host is located on a remote network.
Now that we know how the routing decision works, let’s get back to the issue that this customer was facing.
3. Root Cause
When the remote client connected to the VPN the PPTP connection was receiving the IP 192.168.0.133/24 and the routing table was showing up like this:
Active Routes:
192.168.0.133
192.168.0.199
192.168.0.198
21
10.10.10.219
50
Default Gateway: 192.168.0.133
The entry in red is the default gateway for the network 192.168.0.0/24, as you can see this route primary goes to the notebook’s wireless interface (192.168.0.198).
Now if you use the ANDing technique you will see that the host will determine that packages where the destination is the network 192.168.0.0/24 will be consider local (notice that the metric is lower) and therefore will not be send to the default gateway.
4. Solution
The packages to the network 192.168.0.0/24 were stuck locally since the metric is lower and therefore has more priority. The solution for that is to add another route entry changing the metric and forwarding the package to the default gateway (which is the IP received from the VPN Server), as show below:
ADD 192.168.0.0 MASK 255.255.255.0 192.168.0.133 METRIC 1
Now, the routing table will show like that:
The host receives the IP 192.168.0.133 from the VPN Server and this route will force the host to use this interface primarily to send packages to the 192.168.0.0/24 network.
But, the solution couldn’t be considered affordable if we didn’t have a automated way to implement it and fortunately we do. To do that we will use the CMAK (Connection Manager Administration Kit) to pass a TXT file as parameter that will contain the route add command as show below:
Figure 2 – CMAK window.
This TXT file will contain the following command line:
ADD 192.168.0.0 MASK 255.255.255.0 default METRIC default IF default
The syntax used on this command is different from the regular Route Add command. In this case the keyword DEFAULT before the metric means that we are assuming the IP received from the VPN Server. The keyword DEFAULT after the metric means the value that will be used for the metric; in this case it is 1. For more information on this command see the article Including Routing Table Updates on Microsoft TechNet.
This could be a really complex scenario if we didn’t know really how the routing process works. It is really important for any network administrator to be really sharp on this subject to mitigate possible issues and also to quickly identify possible problems.
Many administrators are planning the migration to Forefront TMG 2010, during this planning phase it comes out many questions regarding support and on the learning plan for TMG. What should I read about it before deploy TMG? I didn’t find any MOC training, how can I learn more about TMG?
Here it is a list of resources that can assist you to learn more about Forefront TMG:
Book
Online Resources
Webcast
Videos from the Edge
Labs/VMs
Training
For Microsoft Premier Customers
For General Public
I hope that this help you to get ready to deploy and maintain Forefront TMG 2010.
This article is based on a support incident that I worked back on the Platforms day. It was originally published in Portuguese and since it is really interesting I decided to also publish here in my blog. The scenario is the one below:
In this scenario the branches connected to the headquarter using a Frame Relay link with a dedicated router on each side. The Windows Server 2003 RRAS Server was used in this case because the customer wants to create a dedicated connection between those servers for management purpose. For each branch the customer created a user account in the domain controller located on the headquarter. Each account will represent the location where the server is (for example: Dallas, Austin, Madri etc).
The DOD connection on the branch office is using a dynamic IP address and the DHCP Server is located in the Headquarter. The RRAS Server in the branch office will obtain the IP address from the headquarter, but customer wants to control which IP will be used for each location. Then you might ask: if he wants to control the IP, why doesn’t he use a static IP address? The answer for that is: he wants to use static IP per location, but using dynamic IP address assignment. Customer wants to take advantage of one feature that exists on AD that allows you to assign an IP address for one specific account.
The reason behind that is: customer wants to be able to know what IP each location will have, however if he needs to change the IP for some reason he doesn’t need to go to the location and change it there manually. All the control will be done in the DC located in the headquarter.
During the implementation of these servers customer notices the following problem:
- When the administrator, physically located in the headquarter tried to RDP (Remote Desktop) to one server in the branch office using the IP that was assign to that location, the remote desktop was establishing the connection to another server from another location. Customer than realize that the static address assignment was failing, although the DOD connection in the branch office was correctly connected. Customer also notices that this behavior happened for some branches that were temporarily losing connection with the Headquarter.
The question than was: why, even using the static address assignment in the user’s properties, the server is getting a different IP while connecting using DOD with that user account?
2. Preparing the Environment to Obtain Data
To prepare the environment to obtain the correct information when the issue happens the following actions were done:
· Enable all RRAS logs using the command: netsh ras set tracing * enable
· Enable Netmon in both interfaces (RRAS on the Branch and in the Headquarter)
After capture the data while the issue was happening it was possible to determine what was happening. But before show the logs, it is important to clarify some parameters of this configuration. Here is the connection’s properties for RRAS Server that starts the connection:
Figure 2 – Properties of the DOD connection on the Branch (Madri).
As you can see above, the redial interval is 3 seconds, which means that if the branch office cannot establish the connection it will try again in 3 seconds.
For this example, we are going to use the properties of the Madri’s account and check if the IP is correctly configured there:
Figure 3 – Madri’s account in the DC located in the Headquarter.
As you can see, the Madri’s RRAS server establish the connection with the head quarter’s RRAS server it will receive the IP 192.168.5.100. Knowing that, we star t to review the data for an issue that happened on that location, around 7:12. At that time, this location lost connectivity with the Headquarter and then try to re-establish the connection in 3 seconds.
Reviewing the file RASIPHLP.LOG located in the head quarter’s RRAS server it was possible to see exactly the moment that the server tried to re-connect:
[2384] 02-20 07:15:09:506: RasSrvrAcquireAddress(hPort: 0x82, IP address: 0x6405a8c0, UserName: Madri, PortName: VPN4-126)
[2384] 02-20 07:15:09:506: rasSrvrGetAddressForServerAdapter
[2384] 02-20 07:15:09:506: Address 0x6405a8c0 is already in use
[2384] 02-20 07:15:09:506: RasDhcpAcquireAddress
[2384] 02-20 07:15:09:516: RasSrvrActivateIp(IpAddr = 0x6505a8c0, dwUsage = 0)
[2384] 02-20 07:15:09:516: RasTcpSetProxyArp(IP Addr: 0x6505a8c0, fAddAddress: TRUE)
[2384] 02-20 07:15:09:516: rasSrvrGetAddressForServerAdapter
The IP address that shows up above (0x6405a8c0) is exactly the one that we have for this server (192.168.5.100). To find out that use the following logic:
· Hexadecimal address that appears in the log = 0x6405a8c0
· Hexadecimal address inverted and separated by two digits = c0 a8 05 64
· Converting the hexadecimal address in decimal = 192.168.5.100
So, this piece of log pretty much says:
- At 7:15:09, the IP address 192.168.5.100 is already in use and therefore cannot be assigned. To fix that the address 0x6505a8c0 (192.168.5.151) was assigned.
Now let’s see why the head quarter’s RRAS server believes that this address is in use.
4. Understanding the Behavior
The conclusion here is that the Branch’s RRAS Server is redialing faster than the headquarter’s RRAS server takes to understand that the connection is down and release the IP. The default configuration for the RRAS service can takes between 2 and 3 minutes (for more information on those timers review the KB262990) to calculate that and release the connection.
4.1. But why the connection is dropped?
After identify the reason why this behavior was happening, customer wants to know why those branches were having that temporally disruption many times during the day.
Using a sample capture from one of the locations we could figure that out. On the netmon capture, around 8:20, it was possible to see that both servers (RRAS) were correctly exchange control packages:
8:19 192.168.3.2 192.168.3.1 PPTP PPTP: Control Message , Echo Request
8:19 192.168.3.1 192.168.3.2 PPTP PPTP: Control Message , Echo Reply
At exactly 8:20 we have the following event on the RASMAN.LOG:
[1992] 02-20 08:20:43:284: d:\nt\net\rras\ras\rasman\rasman\util.c 1997: Disconnected Port 252, reason 1. rc=0x0
Right after that we have the following event on the RASTAPI.LOG:
[1992] 02-20 08:20:43:284: PortTestSignalState: DisconnectReason = 2
On the file RASMANI.LOG we have the reason why the connection was dropped, in this case reason 1 (HARDWAREFAILURE) and in the RASTAPI.LOG the reason was 2 (LINKDROPPED). With that we can conclude that something on the link layer happened that leads to the disconnection. It could be a problem on the ISP, on the channel where this connection passes through, the active component that is in front of the RRAS Server, in another words: something in link layer area. On the netmon trace we can see that we stay more than one second without connectivity and then, at 8:21 we re-connect:
- We initiate the ARP Request:
8:21 192.168.3.1 192.168.3.1 ARP ARP: Request, 192.168.3.1 asks for 192.168.3.1
8:21 192.168.3.1 192.168.3.2 ARP ARP: Request, 192.168.3.1 asks for 192.168.3.2
8:21 192.168.3.2 192.168.3.1 ARP ARP: Response, 192.168.3.2 at 00-03-FF-0F-2B-29
- Then we have the TCP 3 Way Handshake for PPTP (TCP 1723):
8:21 192.168.3.1 192.168.3.2 TCP TCP: Flags=.S......, SrcPort=1149, DstPort=1723, Len=0, Seq=3777429460, Ack=0, Win=65535 (scale factor 0) = 65535
8:21 192.168.3.2 192.168.3.1 TCP TCP: Flags=.S..A..., SrcPort=1723, DstPort=1149, Len=0, Seq=1936732946, Ack=3777429461, Win=16384 (scale factor 0) = 16384
8:21 192.168.3.2 192.168.3.1 TCP TCP: Flags=....A..., SrcPort=1723, DstPort=1149, Len=0, Seq=1936733135, Ack=3777429809, Win=65187 (scale factor 0) = 65187
- Finally we have the PPTP connection being established:
8:21 192.168.3.1 192.168.3.2 PPTP PPTP: Control Message , Start Control Connection Request
8:21 192.168.3.2 192.168.3.1 PPTP PPTP: Control Message , Start Control Connection Reply
8:21 192.168.3.1 192.168.3.2 PPTP PPTP: Control Message , Outgoing Call Request
8:21 192.168.3.2 192.168.3.1 PPTP PPTP: Control Message , Outgoing Call Reply
8:21 192.168.3.1 192.168.3.2 PPTP PPTP: Control Message , Set Link Info
As I mentioned before, the KB262990 explains the timers involved here. It is possible to adjust the time that the RRAS will check if the link is idle, this is called InactivityIdleSeconds and can be changed on the registry key below:
HKEY_LOCAL_MACHINE\SYSTEM\CCS\Control\Class\{4d36E972-....}\instanceid
The default value is 60 seconds, if you reduce this value to 1 the RRAS service can take between 30 to 45 seconds to detect that the PPP link is down.
There is another parameter called DefaultIdleTimeout that you also can use to control the idle time. Here the registry location:
HKLM\SYSTEM\CurrentControlSet\Services\RasMan\PPP
This is a global parameter that will affect all PPP connections, so you need to be careful when you change that.
Recently I was engaged on this TechNet Forum thread and I thought that it was an interesting question and concern. Let me explain the scenario on which I was able to repro the behavior on my own lab:
Contoso has a security policy where only users that belong to the Internet Users group can access Internet. To enforce that they created a rule on ISA Server on which only members of the Internet Group can access Internet. Contoso uses Internet Explorer 6 on all workstations of their network. The process today is that once the user receives authorization to access Internet, the AD Admin adds the user to Internet Users group and the user just need to open a new browser session and he will have access. Recently Contoso upgraded 50% of their workstations to use Internet Explorer 8, they notice the following behavior since this change: if an user that doesn’t belong to Internet Users group is already logged on the workstation and the administrator add this user to the Internet User’s group, the user receives an error when try to browse Internet, the error says that ISA denied the request. If the user logs off from the windows and logon again it works fine.
Why this is happening?
What it is happening on this case is simply the way that Internet Explorer 8 works for the authentication part of the conversation. Since the version 7, Internet Explorer was capable to use Kerberos for proxy authentication (while IE6 and lower only use NTLM and Basic). ISA Server will negotiate the authentication with the browser and will authenticate according to the method that is supported on both sides. The difference in this case is the following:
With IE7 (and higher)
1. Client sends the GET request to www.microsoft.com (for example). This request goes as anonymous.
2. ISA will send the 407 asking for authentication.
3. If client has already a Kerberos cached ticket it will not go to the DC to get a new one, if there is no ticket then it will go to the DC to get an updated ticket to send to ISA.
4. Client will send another GET Request now with the credentials and the ticket.
5. ISA will verify the request and allow (or deny according to the rule).
Note: in this case since ISA doesn't go to the DC, it relies on user's token to access the resource. Since the current ticket doesn’t include the update on user’s group membership, ISA will verify that the user doesn't below to the group and will deny the request.
With IE6
3. Client sends another GET request with the credentials (NTLM).
4. ISA goes to the DC to authenticate the user.
Note: since ISA goes to the DC it will get an updated version of the user's group membership and will verify that the user now belongs to this group.
5. ISA will allow the user to pass through.
Under the Hood
Prior to get to this conclusion I decided to do some tests to validate the theory and the tests worked pretty fine. Here it is the traffic from the client workstation with the user “Yuri” logged in, right after the inclusion of the user to Internet User’s group:
1. Client sends the GET Request
10.20.20.201 10.20.20.1 HTTP HTTP:Request, GET http://www.microsoft.com/
2. ISA asks for authentication:
10.20.20.1 10.20.20.201 HTTP HTTP:Response, HTTP/1.1, Status Code = 407, URL: http://www.microsoft.com/
Connection: Authentication
StatusCode: 407, Proxy authentication required
Reason: Proxy Authentication Required ( The ISA Server requires authorization to fulfill the request. Access to the Web Proxy filter is denied. )
Via: 1.1 ISACONTN1
+ ProxyAuthenticate: Negotiate
Proxy-Authenticate:
+ ProxyAuthenticate: Kerberos
+ ProxyAuthenticate: NTLM
Connection:
ProxyConnection: Keep-Alive
+ ContentType: text/html
ContentLength: 4113
+ payload: HttpContentType = text/html
Note: Notice that ISA is negotiating the authentication method.
3. Client sends another GET request now with the credential.
10.20.20.201 10.20.20.1 HTTP HTTP:Request, GET http://www.microsoft.com/ , Using SPNEGO Authorization
Command: GET
+ URI: http://www.microsoft.com/
Accept: */*
Accept-Language: en-us
UserAgent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322)
Accept-Encoding: gzip, deflate
Cookie: MUID=42623805587D4F6EB894604864D99E47; WT_FPC=id=173.74.31.197-401582496.30090674:lv=1279361835221:ss=1279361835221; MC1=GUID=ba325a1581e1aa4dbcd0988320053feb&HASH=155a&LV=20107&V=3; A=I&I=AxUFAAAAAABDBgAAgb+LoGxG8UwjOO2L0QId7Q!!
- ProxyAuthorization: Negotiate
- Authorization: Negotiate YIIE5QYGKwYBBQUCoIIE2TCCBNWgJDAiBgkqhkiC9xIBAgIGCSqGSIb3EgECAgYKKwYBBAGCNwICCqKCBKsEggSnYIIEowYJKoZIhvcSAQICAQBuggSSMIIEjqADAgEFoQMCAQ6iBwMFACAAAACjggO7YYIDtzCCA7OgAwIBBaEOGwxDT05UT1NPLk1TRlSiKTAnoAMCAQKhIDAeGwRIVFRQGxZpc2Fjb250bjEuY
WhiteSpace:
- NegotiateAuthorization:
Scheme: Negotiate
- GssapiKrb5: 0x1
Kerberos:
Note: Notice that client is using Kerberos.
4. This ticket doesn’t contain an updated information about user’s group membership, hence ISA Server sends a deny.
10.20.20.1 10.20.20.201 HTTP HTTP:Response, HTTP/1.1, Status Code = 502, URL: http://www.microsoft.com/
StatusCode: 502, Bad gateway
Reason: Proxy Error ( The ISA Server denied the specified Uniform Resource Locator (URL). )
Connection: close
ProxyConnection: close
ContentLength: 4059
5. At this point, I ran the command klist on the workstation and here are the tickets that I had at that point:
C:\Program Files\Windows Resource Kits\Tools>klist tickets
Cached Tickets: (3)
Server: krbtgt/CONTOSO.MSFT@CONTOSO.MSFT
KerbTicket Encryption Type: RSADSI RC4-HMAC(NT)
End Time: 7/17/2010 18:34:46
Renew Time: 7/24/2010 8:34:46
Server: HTTP/isacontn1.contoso.msft@CONTOSO.MSFT
Server: host/client1.contoso.msft@CONTOSO.MSFT
6. I thought to myself: well, if I wipe all the tickets I will force the user to go to the DC and get an updated ticket, this way if I try again to access Internet, I will use the updated ticket. To do that I ran the command: klist purge. I deleted all tickets and the end result was:
C:\Program Files\Windows Resource Kits\Tools>klist purge
Cached Tickets: (0)
Now is time for the truth, tried to access Internet again and then it WORKED. Here it is the netmon trace for this second attempt with my Kerberos ticket cache clean:
3. Client goes to the DC to get ticket for the user “Yuri”:
10.20.20.201 10.20.20.20 KerberosV5 KerberosV5:AS Request Cname: yuri Realm: CONTOSO.MSFT Sname: krbtgt/CONTOSO.MSFT
- Kerberos: AS Request Cname: yuri Realm: CONTOSO.MSFT Sname: krbtgt/CONTOSO.MSFT
- AsReq: Kerberos AS Request
+ ApplicationTag:
+ KdcReq: KRB_AS_REQ (10)
4. The DC responds:
10.20.20.20 10.20.20.201 KerberosV5 KerberosV5:AS Response Ticket[Realm: CONTOSO.MSFT, Sname: krbtgt/CONTOSO.MSFT]
- Kerberos: AS Response Ticket[Realm: CONTOSO.MSFT, Sname: krbtgt/CONTOSO.MSFT]
- AsRep: Kerberos AS Response
+ KdcRep: KRB_AS_REP (11)
5. Client goes again to the DC to get a ticket to the HTTP Service (ISA SPN):
10.20.20.201 10.20.20.20 KerberosV5 KerberosV5:TGS Request Realm: CONTOSO.MSFT Sname: HTTP/isacontn1.contoso.msft
- Kerberos: TGS Request Realm: CONTOSO.MSFT Sname: HTTP/isacontn1.contoso.msft
- TgsReq: Kerberos TGS Request
+ KdcReq: KRB_TGS_REQ (12)
6. The DC responds:
10.20.20.20 10.20.20.201 KerberosV5 KerberosV5:TGS Response Cname: Yuri
- Kerberos: TGS Response Cname: Yuri
- TgsRep: Kerberos TGS Response
- KdcRep: KRB_TGS_REP (13)
+ SequenceHeader:
+ Tag0:
+ PvNo: 5
+ Tag1:
+ MsgType: KRB_TGS_REP (13)
+ Tag3:
+ Crealm: CONTOSO.MSFT
+ Tag4:
+ Cname: Yuri
+ Tag5:
+ Ticket: Realm: CONTOSO.MSFT, Sname: HTTP/isacontn1.contoso.msft
+ Tag6:
+ EncPart:
7. Client sends the GET request to ISA now with the new ticket:
Host: www.microsoft.com
- Authorization: Negotiate YIIFDQYGKwYBBQUCoIIFATCCBP2gJDAiBgkqhkiC9xIBAgIGCSqGSIb3EgECAgYKKwYBBAGCNwICCqKCBNMEggTPYIIEywYJKoZIhvcSAQICAQBuggS6MIIEtqADAgEFoQMCAQ6iBwMFACAAAACjggPjYYID3zCCA9ugAwIBBaEOGwxDT05UT1NPLk1TRlSiKTAnoAMCAQKhIDAeGwRIVFRQGxZpc2Fjb250bjEuY
8. ISA allows the traffic:
10.20.20.1 10.20.20.201 HTTP HTTP:Response, HTTP/1.1, Status Code = 302, URL: http://www.microsoft.com/
Conclusion
As you could see on this post this is not really an issue, neither on ISA nor on Internet Explorer, this is actually a behavioral change that Internet Explorer 7 and higher had. This is a good improvement, because Kerberos is much lighter and doesn’t put too much pressure on the DC. One classic problem with NTLM is that the authentication request against the DC is huge, read the article Improving Web Proxy Client Authentication Performance on ISA Server 2006 that I wrote couple of years ago and you will see the advantage that IE7 has over IE6.
But, if it is a requirement for your business that changes to group membership immediately reflect on user’s Internet browsing experience than you can use ISA’s IP address on the IE setting rather than the name, this way you are forcing to use NTLM rather than Kerberos. Just keep in mind that when too much pressure is added on the DC you can face the infamous “random prompt for authentication” on ISA due the 5783 and 5719 events.
The first part of this article explained how to configure the IAG Server to allow full network access using the Network Connector. Now it comes the time to connect the external computers to the network. This second part will focus on what needs to be done on the client side to allow that to happen.
2. Evaluating your Needs
As I mention in all my blogs about IAG, the granularity for endpoint compliance is really a differential on IAG. At the same time that we can create tight and restrictive policies, we also can create more relaxed policies. One thing that you will notice when you try to access the SSL VPN for the first time on a computer that is not compliance with the policy is the window below:
Figure 1 – Warning about computer’s compliance.
This is due the tight default policy that we have for the Network Connector. The figure below shows the default policy and what we need to be compliant to be able to access the remote resource:
Figure 2 – Default Network Connector Policy.
For compliance configuration on the client side, the recommendation is to follow the instructions IAG User Guide, chapters 5 and 7. This document can be downloaded from Microsoft Download Center.
3. Client Connectivity
Once the endpoint compliance has been met, the client can finally get access to the resources. When this happen the following items will be notice in the client side:
- The Network Connector balloon will show up on the right corner of the.
- If you double click on the icon that appears on the right corner of the task bar you will launch the portal activity that will show the Network Connector started.
- If you open the command prompt and type ipconfig you will see that the client received the IP configured on the IAG server.
At this point the client is already a node that belongs to the internal network and can access the resources where it has permissions to.
4. Troubleshooting Network Connector
The Network Connector troubleshooter is composed of two sides: client and server. You can enable logs on the client side as well as on the server side. To enable client logging add the following keys:
HKEY_LOCAL_MACHINE\SOFTWARE\WhaleCom\Client\NetworkConnector
"log"=dword:00000004
"log\sniff"=dword:00000001
To disable it, just change the log value to 00000000. It is important to emphasize that this is used only for troubleshooting purpose due the high intensive log creation. The file will be created at %programfiles%\Whale Communications\Client Components\3.1.0\whliocsv.log.
Here an example of a successfully connection:
Part 1 – Service starting up
19.05.08 04:42:30.254 ::_tWinMain enter
19.05.08 04:42:30.254 Create service sync event (0)
19.05.08 04:42:30.284 CNcDialer::CreateSession enter
19.05.08 04:42:30.284 CNcOS::CNcOS enter
19.05.08 04:42:30.284 mj = 5
19.05.08 04:42:30.284 mi = 2
19.05.08 04:42:30.284 build = 3790
19.05.08 04:42:30.284 platform = 2
19.05.08 04:42:30.284 cs ver = Service Pack 2
19.05.08 04:42:30.284 eType = 200302
19.05.08 04:42:30.284 CNcOS::CNcOS leave {}
19.05.08 04:42:30.284 CNcAdditionalRoutes::Cleanup enter
19.05.08 04:42:30.284 CNcAdditionalRoutes::Cleanup leave {}
19.05.08 04:42:30.284 CNcDialer::CreateSession leave {0}
19.05.08 04:42:30.294 CNcSession::Start enter
19.05.08 04:42:30.294 Validating IP address - 0XED00007F
19.05.08 04:42:30.294 Validating IP address - 0X4600A8C0
19.05.08 04:42:30.294 CNcSession::_Init enter
19.05.08 04:42:30.294 Create event {0}
19.05.08 04:42:30.294 ::WU_LoadCommonConfiguration enter
19.05.08 04:42:30.294 cfg.device_threads = 1 {0x2}
19.05.08 04:42:30.294 cfg.device_buffers_size = 0 {0x2}
19.05.08 04:42:30.294 cfg.device_buffers_size = 4 {0x2}
19.05.08 04:42:30.294 cfg.tunnel_threads = 1 {0x2}
19.05.08 04:42:30.294 cfg.tunnel_buffers_size = 64 {0x2}
19.05.08 04:42:30.294 cfg.timeouts.net = 20000 {0x2}
19.05.08 04:42:30.294 cfg.timeouts.reg = 5000 {0x2}
19.05.08 04:42:30.294 cfg.timeouts.dev = 20000 {0x2}
19.05.08 04:42:30.294 cfg.timeouts.route = 15000 {0x2}
19.05.08 04:42:30.294 cfg.timeouts.maintenance = 20000 {0x2}
19.05.08 04:42:30.294 ::WU_LoadCommonConfiguration leave {}
19.05.08 04:42:30.294 CWioSession::Init enter
19.05.08 04:42:30.294 check status {0}
19.05.08 04:42:30.294 check sanity {0}
19.05.08 04:42:30.294 input flags: 0
19.05.08 04:42:30.294 input tun cfg: 64 KB x 1 tunnel threads (per cpu)
19.05.08 04:42:30.294 input device cfg: 4 KB x 1 device threads
19.05.08 04:42:30.294 input timeouts: 20000 (dev) / 20000 (net) / 5000 (reg) / 15000 (route) / 20000 (maintenance)
19.05.08 04:42:30.294 check input {0}
19.05.08 04:42:30.294 CWioSession::_OpenDmEvent enter
19.05.08 04:42:30.294 CWioSession::_OpenDmEvent leave {0}
19.05.08 04:42:30.294 CWioSection::Open enter
19.05.08 04:42:30.294 pfExists = 0
19.05.08 04:42:30.294 CWioSection::Open leave {0}
Part 2 – Starting Network Connector and open the session with the server
19.05.08 04:42:30.304 CWioWinsock::Load enter
19.05.08 04:42:30.304 CWioWinsock::Load leave {0}
19.05.08 04:42:30.304 CWioNIC::Open enter
19.05.08 04:42:30.304 type = Virtual
19.05.08 04:42:30.304 get list size {0x6f: 1280 bytes}
19.05.08 04:42:30.304 allocate list {0}
19.05.08 04:42:30.304 enum (list) {0, 1280 bytes}
19.05.08 04:42:30.304 set name {{B6EB736A-A20B-4C52-A13F-0D10575C90A1}}
19.05.08 04:42:30.304 set index {0x2}
19.05.08 04:42:30.304 set mac {00-ff-08-01-19-47-}
19.05.08 04:42:30.304 open key System\ControlSet001\Services\TcpIp\Parameters\Interfaces\{B6EB736A-A20B-4C52-A13F-0D10575C90A1} {0}
19.05.08 04:42:30.304 control set = 1
19.05.08 04:42:30.304 open WINS key {0 for 6bf340}
19.05.08 04:42:30.304 open CPL key {0 for System\ControlSet001\Control\Network\{4D36E972-E325-11CE-BFC1-08002BE10318}\{B6EB736A-A20B-4C52-A13F-0D10575C90A1}\Connection}
19.05.08 04:42:30.304 open NET key {0 for System\ControlSet001\Control\class\{4D36E972-E325-11CE-BFC1-08002BE10318}}
19.05.08 04:42:30.304 enum NET loop starts:
19.05.08 04:42:30.304 -> open \0000 {0}
19.05.08 04:42:30.304 -> open \0001 {0}
19.05.08 04:42:30.304 -> open \0002 {0}
19.05.08 04:42:30.304 -> open \0003 {0}
19.05.08 04:42:30.304 -> open \0004 {0}
19.05.08 04:42:30.304 -> open \0005 {0}
19.05.08 04:42:30.304 -> open \0006 {0}
19.05.08 04:42:30.304 -> open \0007 {0}
19.05.08 04:42:30.304 -> open \0008 {0}
19.05.08 04:42:30.304 -> open \0009 {0}
19.05.08 04:42:30.304 -> -> matched id (9 iterations)
19.05.08 04:42:30.304 CWioNIC::QueryReg enter
19.05.08 04:42:30.304 name = EnableDhcp
19.05.08 04:42:30.304 DWORD data read {1}
19.05.08 04:42:30.304 CWioNIC::QueryReg leave {0}
19.05.08 04:42:30.304 -> -> setting member flags (enabled = 1 ; dhcp = 1}
19.05.08 04:42:30.304 <- enum NET loop exits
19.05.08 04:42:30.304 CWioNIC::Open leave {0}
The registry key emphasize in red on the log above is the Whale Network Connector NIC Driver that is created. Although this NIC doesn’t appear on the Control Panel/Network Connections, it will be available on the Device Manager. You just need to enable the Show Hidden Devices, as show in the figure below:
Figure 3 – Whale Network Connector NDIS Driver.
Continue the log reading, we have the sessions:
· open virtual device
· verify virtual device properties
· allocate thread pools
· open device control
· device thread starts
· check status
· check input
· init srv peer
· connect
Those sessions will show minor details since they are smaller process in the logging. The next verbose session will be following one:
Part 3 – Building routing table
19.05.08 04:42:30.835 input ip = 10.1.1.151
19.05.08 04:42:30.835 input class = 10.0.0.0 / 255.0.0.0
19.05.08 04:42:30.835 input policies (servers) = 0x1 (arps); 0xff (dhcps, with static fallback)
19.05.08 04:42:30.835 input policies (other) = 0x3 (nsplit); 1 (anti-spoof)
19.05.08 04:42:30.835 CNcRouteTable::Load enter
19.05.08 04:42:30.835 CNcRouteTable::_AllocArrays enter
19.05.08 04:42:30.835 ::WU_RouteLoadTable enter
19.05.08 04:42:30.845 get size {0x7a, 1524 bytes}
19.05.08 04:42:30.845 alloc {0}
19.05.08 04:42:30.845 get array {0}
19.05.08 04:42:30.845 log entries {7}
19.05.08 04:42:30.845 ..01 00000001> 127.0.0.0/255.0.0.0 <> 127.0.0.1 (1)
19.05.08 04:42:30.845 ..02 00010004> 192.168.0.0/255.255.255.0 <> 192.168.0.80 (20)
19.05.08 04:42:30.845 ..03 00000001> 192.168.0.80/255.255.255.255 <> 127.0.0.1 (20)
19.05.08 04:42:30.845 ..04 00010004> 192.168.0.255/255.255.255.255 <> 192.168.0.80 (20)
19.05.08 04:42:30.845 ..05 00010004> 224.0.0.0/240.0.0.0 <> 192.168.0.80 (20)
19.05.08 04:42:30.845 ..06 00000002> 255.255.255.255/255.255.255.255 <> 192.168.0.80 (1)
19.05.08 04:42:30.845 ..07 00010004> 255.255.255.255/255.255.255.255 <> 192.168.0.80 (1)
19.05.08 04:42:30.845 ::WU_RouteLoadTable leave {0}
19.05.08 04:42:30.845 set base len {7}
19.05.08 04:42:30.845 ..loop: add ip interface {192.168.0.80 / 255.255.255.0}
19.05.08 04:42:30.845 ..check entry 127.0.0.0 / 255.0.0.0 -> 127.0.0.1 | 0x1
19.05.08 04:42:30.845 -> IPlbk
19.05.08 04:42:30.845 ..check entry 192.168.0.0 / 255.255.255.0 -> 192.168.0.80 | 0x10004
19.05.08 04:42:30.845 -> LAN
19.05.08 04:42:30.845 ..check entry 192.168.0.80 / 255.255.255.255 -> 127.0.0.1 | 0x1
19.05.08 04:42:30.845 -> IP
19.05.08 04:42:30.845 ..check entry 192.168.0.255 / 255.255.255.255 -> 192.168.0.80 | 0x10004
19.05.08 04:42:30.845 -> BC1
19.05.08 04:42:30.845 ..check entry 224.0.0.0 / 240.0.0.0 -> 192.168.0.80 | 0x10004
19.05.08 04:42:30.845 -> MC
19.05.08 04:42:30.845 ..check entry 255.255.255.255 / 255.255.255.255 -> 192.168.0.80 | 0x2
19.05.08 04:42:30.845 -> BC
19.05.08 04:42:30.845 ..check entry 255.255.255.255 / 255.255.255.255 -> 192.168.0.80 | 0x10004
19.05.08 04:42:30.845 CNcRouteTable::_AllocArrays leave {0}
19.05.08 04:42:30.845 CNcRouteTable::Load leave {0}
19.05.08 04:42:30.845 CWioClient::_CheckRoutingConflicts enter
19.05.08 04:42:30.845 check route: 10.0.0.0 / 255.0.0.0 {type = virtual interface}
19.05.08 04:42:30.855 CWioClient::_CheckRoutingConflicts leave {0}
Part 4 – Verifying RRAS status for potential collision
19.05.08 04:42:30.855 CWioService::QueryStatus enter
19.05.08 04:42:30.855 CWioService::_OpenService enter
19.05.08 04:42:30.855 open SCM {0}
19.05.08 04:42:30.855 CWioService::_OpenService leave {0}
19.05.08 04:42:30.855 control interrogate {0x426}
19.05.08 04:42:30.855 close handle {0}
19.05.08 04:42:30.855 close SCM handle {0}
19.05.08 04:42:30.855 CWioService::QueryStatus leave {0x426}
19.05.08 04:42:30.855 CWioSession::_ConnectVirtualDevice enter
19.05.08 04:42:30.855 IP / Mask = 10.1.1.151 / 255.0.0.0
19.05.08 04:42:30.855 DNS (Primary) = 10.1.1.6
19.05.08 04:42:30.855 DNS (2nd) = 0.0.0.0
19.05.08 04:42:30.855 Wins (Primary) = 0.0.0.0
19.05.08 04:42:30.855 Wins (2nd) = 0.0.0.0
19.05.08 04:42:30.855 GW = 0.0.0.0
19.05.08 04:42:30.855 DHCP (V) = 10.1.1.150
19.05.08 04:42:30.855 Alloc Type = dhcps = 0xff, sfallback = 1
19.05.08 04:42:30.855 CWioNIC::ReleaseDhcpAddress enter
19.05.08 04:42:30.855 CWioNIC::ReleaseDhcpAddress leave {0x2}
19.05.08 04:42:30.855 control interrogate {0}
19.05.08 04:42:30.855 CWioService::QueryStatus leave {0}
Now that we know how the good log looks like, let’s see where we can find errors in the log:
Fictitious Scenario – After launch the Network Connector, I see the balloon on the right corner, however after some seconds I receive an error message saying: Whale Network Connector session failed. Server not responding (0x80004005).
In this case you can enable the tracing on the client side and track it up the error message. Open the Log and search for 0x80004005. After that, keep going up and see where we have the first failed during the connection.
The error was appearing on “device thread exits” session:
19.05.08 05:11:46.359 wait {0}
19.05.08 05:11:46.359 close {0}
19.05.08 05:11:46.359 CWioThread::Stop leave {0}
19.05.08 05:11:46.359 CWioNIC::Close enter
19.05.08 05:11:46.359 CWioNIC::Close leave {}
19.05.08 05:11:46.359 CWioIoBuffersPool::Term enter
19.05.08 05:11:46.359 CWioIoBuffersPool::Term leave {}
19.05.08 05:11:46.359 CWioSession::Term leave {0}
19.05.08 05:11:46.359 CNcSession::Start leave {0x80004005}
19.05.08 05:11:46.359 NcDialer::_StopService enter
19.05.08 05:11:46.359 OpenSCManager succeeded.
19.05.08 05:11:46.359 OpenService succeeded.
As we know, during the connection the client first starts Network Connector on its own side, and then tries to connect to the server. Therefore the initial parts of the logs will not tell too much about the issue. But, if you keep looking up in the log you will see the following on the “wait for reply” session:
19.05.08 05:11:45.357 CWioSession::_SetSessionError enter
19.05.08 05:11:45.357 input msg = 'Server not responding'
19.05.08 05:11:45.357 input code = 0xf0
19.05.08 05:11:45.357 inidicated = 0
19.05.08 05:11:45.357 CWioSession::_SetSessionError leave {0}
19.05.08 05:11:45.357 input msg = 'generic error'
19.05.08 05:11:45.357 inidicated = 1
19.05.08 05:11:45.357 CWioSession::_SetSessionError leave {0xb7}
19.05.08 05:11:45.357 CWioClient::StopSession enter
19.05.08 05:11:45.357 sync = 0
19.05.08 05:11:45.357 CWioVaProxy::Set enter
19.05.08 05:11:45.357 CWioVaProxy::Set leave {0x40}
19.05.08 05:11:45.357 CWioPeer::Term enter
19.05.08 05:11:45.357 state = 2
19.05.08 05:11:45.357 reason = 2
19.05.08 05:11:45.357 CWioPeer::Term leave {0}
19.05.08 05:11:45.858 CWioPeer::Term enter
19.05.08 05:11:45.858 state = 1
19.05.08 05:11:45.858 reason = 2
19.05.08 05:11:45.858 CWioPeer::Term leave {0x6}
19.05.08 05:11:45.858 CWioClient::_FullTunnelingUNSET enter
19.05.08 05:11:45.858 nothing to undo..
19.05.08 05:11:45.858 CNcRouteTable::Export enter
19.05.08 05:11:45.858 data flag = 0
19.05.08 05:11:45.858 CNcRouteTable::Export leave {0}
19.05.08 05:11:45.858 CWioClient::_FullTunnelingUNSET leave {0}
19.05.08 05:11:45.858 CWioClient::StopSession leave {0}
19.05.08 05:11:45.858 CWioClient::StartSession leave {0xf0}
Well, looks like server is not responding. But, why it is not responding? Keep going up in the log and look into the “init srv peer” session:
19.05.08 05:11:44.356 CWioPeer0::Connect enter
19.05.08 05:11:44.356 check input: <0> / 127.0.0.233:6004 {0}
19.05.08 05:11:44.356 CWioDTunnel::Connect enter
19.05.08 05:11:44.356 h = 532
19.05.08 05:11:44.356 CWioDTunnel::Connect leave {0}
19.05.08 05:11:44.356 CWioPeer0::Connect leave {0}
Here we know that the client was trying to connect to the Network Connector on the server side using the port 6004. However, by default the port is 6003, therefore there is a mismatch in the port. You can ask now: how come there is a mismatch if it is the server who provides this port to the client? That’s true, but there are two places where this port is defined, on the Network Connector Server / Advanced Tab and on the Network Connector Application Publishing / Server Settings / Port. In this case, the application publishing is telling the client to use this port, so we can pretty much conclude that the mismatch is on the server side and most likely in the publishing rule.
The above troubleshooting scenario was really simplistic, but the main goal was to get you familiarized with the error messages and how to track it the root cause for a problem. If you still don’t have try IAG yet, go ahead and download the demos and play with it.
As announced last week at TechED North America, here it is the book cover for the upcoming book that me and Tom Shinder are writing for Syngress:
We are having an amazing time writing this book and we are looking forward to release it next year.
Sounds familiar? Interesting though is that in this case it was not an ISA issue (really??? J) and was actually fixed using the solution below from IE Team:
http://blogs.msdn.com/askie/archive/2009/07/17/slow-performance-in-internet-explorer-8-after-installing-the-skype-v4-1-application.aspx
One more issue that is not caused by ISA Server…and counting.
You might not have noticed but this month (last January 10th) ISA Server 2006 Mainstream Support ended as shown in the table below:
Source: http://support.microsoft.com/lifecycle/?p1=11928
The question that you might have is: what about this extended support that goes until January 2017? Extended support means the following:
The Extended Support phase follows Mainstream Support for Business and Developer products. At the supported service pack level, Extended Support includes:
Source: http://support.microsoft.com/lifecycle/
Better to start planning your migration as part of your new year’s resolution.
Nowadays information is much easier to find than it was in the past, you can find information about pretty much everything by just searching on the web (try http://www.letmebingthatforyou.com/?q=dns%20atacck for example). For this reason it is even more important to be diligent while protecting your company’s data by avoiding data leakage and data enumeration.
One of the pre-attack phases is exactly the enumeration of data for the target system (or company), during this phase the attacker will try to gather as much information as possible about the target, going from the basic stuff that it can find about the target on the company’s web site to a more detail data enumeration using tools and techniques. One of the steps for data enumeration is query the target DNS system in order to obtain more information about hosts and IPs. This process can start by simply using the nslookup command. Assuming a successfully query, the attacker can try to transfer all the information for that domain by attempting a zone transfer operation. This post will explain how to configure your DNS Publishing rule on TMG to avoid zone transfer.
2. What you want to avoid?
The scenario that I described above is an unfortunate common scenario, many times because the system administrator forgot to securely configure the DNS Server on the internal network to not allow zone transfer for all systems. Here an example of a DNS that is wide open for zone transfer:
When this configuration is place the following result will happen if the client tries to transfer a zone using nslookup:
Microsoft Windows [Version 6.0.6002] Copyright (c) 2006 Microsoft Corporation. All rights reserved.
C:\>nslookup Default Server: mysrv Address: 192.168.1.10 > server 192.168.1.154 Default Server: TMGFW Address: 192.168.1.154
> set q=all
> contoso.com
Server: TMGFW Address: 192.168.1.154 contoso.com internet address = 10.20.20.10 contoso.com nameserver = dc01.contoso.com contoso.com primary name server = dc01.contoso.com responsible mail addr = hostmaster.contoso.com serial = 494 refresh = 900 (15 mins) retry = 600 (10 mins) expire = 86400 (1 day) default TTL = 3600 (1 hour) dc01.contoso.com internet address = 10.20.20.10
> ls -d contoso.com
[TMGFW.home] contoso.com. SOA dc01.contoso.com hostmaster.contoso.com. (494 900 600 86400 3600) contoso.com. A 10.20.20.10 contoso.com. NS dc01.contoso.com _msdcs NS dc01.contoso.com _gc._tcp.Default-First-Site-Name._sites SRV priority=0, weight=100, port=3268, dc01.contoso.com _kerberos._tcp.Default-First-Site-Name._sites SRV priority=0, weight=100, port=88, dc01.contoso.com _ldap._tcp.Default-First-Site-Name._sites SRV priority=0, weight=100, port=389, dc01.contoso.com _gc._tcp SRV priority=0, weight=100, port=3268, dc01.contoso.com _kerberos._tcp SRV priority=0, weight=100, port=88, dc01.contoso.com _kpasswd._tcp SRV priority=0, weight=100, port=464, dc01.contoso.com _ldap._tcp SRV priority=0, weight=100, port=389, dc01.contoso.com _kerberos._udp SRV priority=0, weight=100, port=88, dc01.contoso.com _kpasswd._udp SRV priority=0, weight=100, port=464, dc01.contoso.com casrv A 10.20.20.9 dc01 A 10.20.20.10 DomainDnsZones A 10.20.20.10 _ldap._tcp.Default-First-Site-Name._sites.DomainDnsZones SRV priority=0, weight=100, port=389, dc01.contoso.com _ldap._tcp.DomainDnsZones SRV priority=0, weight=100, port=389, dc01.contoso.com EXSRV A 10.20.20.11 farm CNAME exsrv.contoso.com ForestDnsZones A 10.20.20.10 _ldap._tcp.Default-First-Site-Name._sites.ForestDnsZones SRV priority=0, weight=100, port=389, dc01.contoso.com _ldap._tcp.ForestDnsZones SRV priority=0, weight=100, port=389, dc01.contoso.com TMGFW A 10.20.20.1 TMGSTD A 10.20.20.31 UAG10 A 10.20.20.12 UAG11 A 10.20.20.13 WKSVista A 10.20.20.152 contoso.com. SOA dc01.contoso.com hostmaster.contoso.com. (494 900 600 86400 3600)
Well, this is not nice at all…here it is my whole internal domain zone exposed to all Internet users just because the system administrator forgot to disable one checkbox.
3. Protecting Internal Resources
If system administrator didn’t do his homework, you as firewall admin MUST do it ASAP. However it is very important to emphasize that this band aid solution on the perimeter shouldn’t be the main one, we always need to fix the root cause of the problem, which in this case is to securely configure the DNS Server. The fact that you are protecting the DNS Publishing on your perimeter is just a countermeasure to avoid a potential attempt to transfer the internal DNS zone to an external (non trusted) resource.
The DNS Filter on TMG is enabled by default as shown below:
This filter has the following capabilities pre-configured by default:
As zone transfer can or cannot be a legitimate type of access the parameter DNS_Zone_Transfer is set to 0 by default. In other words: if you want to allow zone transfer you should leave this parameter as is. The recommendation is to make sure to allow zone transfer only to trusted DNS servers, which means that you should configure your DNZ zone to allow transfer only for the server’s IPs that you trust. If all the servers that you trust are behind TMG (on the internal network) then you can safely block zone transfer attempts from traffic coming from the external (Internet) network. To do that you need to run the following script:
Set root = CreateObject("FPC.Root") Set isaArray = root.GetContainingArray() Set attackDetection = isaArray.ArrayPolicy.AttackDetection Set dnsFilter = isaArray.Extensions.ApplicationFilters.Item("{49FE2B2F-3BB4-495C-87C8-3890C3C35756}")
dnsFilter.Enabled = True Set vpSets = dnsFilter.VendorParametersSets
On Error Resume Next vpSets.Add "{D96C5E7F-5B13-4E1A-94A1-36CA7B54604E}", False, False On Error Goto 0 ' If a vendor parameters sets for DNS Filter already exists
vpSets.Item("{D96C5E7F-5B13-4E1A-94A1-36CA7B54604E}").Value("DNS_Zone_Transfer") = "1"
vpSets.Save
…OR you can also use the UI to make this change using the option below:
Note: if you run the script while the console is open and you try to see if this option is enabled on the UI, the option might still be disabled. To refresh it, close TMG console and open it again.
After running this script (or changing it on the UI), make sure to restart Firewall Service and then try again to run the nslookup ls command from an external computer. The expected result is shown below:
[TMGFW]
*** Can't list domain contoso.com: Unspecified error
The DNS server refused to transfer the zone contoso.com to your computer. If this is incorrect, check the zone transfer security settings for contoso.com on the DNS server at IP address 192.168.1.154.
Now this for sure is a better result for an external user to get it.