website statistics
September, 2010 - Yuri Diogenes's Blog - Site Home - TechNet Blogs

Yuri Diogenes's Blog

Thoughts from a Senior Technical Writer @ Microsoft Server and Cloud Division (Solutions Group) - Information Experience

September, 2010

Posts
  • Yuri Diogenes's Blog

    Software Update 1 for Microsoft Forefront Threat Management Gateway (TMG) 2010 Service Pack 1

    • 3 Comments

    We just release a new update for Forefront TMG 2010 called Update 1, this update one includes:

    SafeSearch Enforcement. Forefront TMG can enforce blocking adult text, images and videos from search results by popular search engines. SafeSearch can be enforced on specific groups or to the entire organization.
    Including non-primary URL filtering categorizations. Forefront TMG uses an algorithm to select a URL’s “primary” category from among up to four categorizations provided by Microsoft Reputation Services (MRS). In Update 1 you can control access to sites that match any of the non-primary categorizations provided by MRS. For example, a URL with a primary categorization of News can now match a rule by any of its non-primary categorizations (such as Web Mail).
    Support for Exchange 2010 SP1
    Bug fixes and various other improvements. Details will be soon available at http://go.microsoft.com/fwlink/?LinkId=201151.

    Go get it here.

  • Yuri Diogenes's Blog

    The Curious Case of TMG Stopping Responding in Random days but always during the Morning

    • 2 Comments

    1. Introduction

     

    Talking about curious case to troubleshoot, this was a very good one. First because this was a new deployment, a well planned hardware for the deployment and the amount of traffic hitting TMG was not that huge. Everything seems to be okay, other than the fact that at least twice a week TMG was stopping responding and it was necessary to restart the whole server. To make things even more bizarre, such issue was happening in random days but always within the range of 8AM and 11AM.

     

    2. Gathering Data

     

    On this scenario TMG administrator couldn’t really get a dump of the wspsrv.exe process because he couldn’t even log in locally.  It was necessary to prepare the machine for a kernel dump capture, for that I used the procedures from KB969028.

     

    3. Analyzing the Data

     

    On a scenario where the machine completely stops responding one of the most useful commands to type while analyze a kernel dump is the !locks command. By definition (windbg help), the !locks command will:

     

     “Display all locks held on resources by threads. A lock can be shared or exclusive, which means no other threads can gain access to that resource. This information is useful when a deadlock occurs on a system. A deadlock is caused by one non-executing thread holding an exclusive lock on a resource that the executing thread needs.”

     

    In this case here it is the result of this command:

     

    0: kd> !locks

    **** DUMP OF ALL RESOURCE OBJECTS ****

    KD: Scanning for held locks...........................................................................................................................................

     

    Resource @ 0xfffffa800dbad748    Shared 1 owning threads

        Contention Count = 493

         Threads: fffffa800e09db63-01<*> *** Actual Thread fffffa800e09db60

    KD: Scanning for held locks......

     

    Resource @ 0xfffffa800bc2a7b8    Exclusively owned

         Threads: fffffa800e09db60-01<*>

    KD: Scanning for held locks..........................................................................................

    7461 total locks, 2 locks currently held

     

    Next step is to verify which thread is that:

     

    0: kd> !thread fffffa800e09db60

    THREAD fffffa800e09db60  Cid 0004.0680  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (UserRequest) KernelMode Non-Alertable

        fffffa800cfc4060  SynchronizationEvent

    IRP List:

        fffffa8010231bd0: (0006,0430) Flags: 00060901  Mdl: fffffa8010287000

    Not impersonating

    DeviceMap                 fffff8a000008b30

    Owning Process            fffffa8009a3a040       Image:         System

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      26737768       Ticks: 3 (0:00:00:00.046)

    Context Switch Count      940461            

    UserTime                  00:00:00.000

    KernelTime                00:13:42.562

    Win32 Start Address vhdmp!VhdmpiAsyncOpThread (0xfffff880076bcdb0)

    Stack Init fffff88008267db0 Current fffff88008267440

    Base fffff88008268000 Limit fffff88008262000 Call 0

    Priority 9 BasePriority 8 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5

    Child-SP          RetAddr           : Args to Child                                                           : Call Site

    fffff880`08267480 fffff800`01ac9992 : 00000000`0000000c fffffa80`0e09db60 fffffa80`00000000 fffffa80`10231bd0 : nt!KiSwapContext+0x7a

    fffff880`082675c0 fffff800`01acbcff : 00000000`0000139c fffffa80`0cfc4060 00000000`00000000 fffffa80`0bf1a9a0 : nt!KiCommitThreadWait+0x1d2

    fffff880`08267650 fffff800`01dbd1d2 : 00000000`00000000 ffffffff`00000006 00000000`00000000 00000000`00000000 : nt!KeWaitForSingleObject+0x19f

    fffff880`082676f0 fffff800`01ac1993 : fffffa80`0e09db60 00000000`00200200 00000000`00000000 fffffa80`0cfc4060 : nt!NtWaitForSingleObject+0xb2

    fffff880`08267760 fffff800`01abdf30 : fffff880`07694640 fffffa80`0cde80e0 fffffa80`0bc2a000 ffffffff`8000139c : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`08267760)

    fffff880`082678f8 fffff880`07694640 : fffffa80`0cde80e0 fffffa80`0bc2a000 ffffffff`8000139c 00000000`000001d1 : nt!KiServiceLinkage

    fffff880`08267900 fffff880`076b7ee2 : 00000000`00000000 fffff880`076b7e5c fffffa80`0cde80e0 fffffa80`0cde80e0 : vhdmp!VhdmpiReadVhdFileAtPassiveLevel+0x50

    fffff880`08267970 fffff880`076bd9e0 : 00000000`00000000 00000000`000001d1 fffff8a0`0c2f2000 00000000`00001d10 : vhdmp!VhdmpiCopyRawSectorsSync+0x72

    fffff880`082679c0 fffff880`076bdf11 : fffff8a0`01f26000 00000000`00022dbe fffff880`08267b90 fffff880`08267ac8 : vhdmp!VhdmpiRegisterWithRootEnumerator+0x220

    fffff880`08267a60 fffff880`076c05e4 : 00000000`00000001 00000000`00000000 00000000`00022dbe fffffa80`0bc2a000 : vhdmp!VhdmpiShutdownNestingLevels+0x3b1

    fffff880`08267b60 fffff880`076c083a : 00000000`00000000 00000000`00000001 fffffa80`0e573c00 fffffa80`0e573c00 : vhdmp!VhdmpiDoCompaction+0x124

    fffff880`08267ca0 fffff880`076bce02 : ffffffff`ff676980 fffffa80`0e09db60 00000000`00000080 fffffa80`0cde80e0 : vhdmp!VhdmpiCompactThread+0x13a

    fffff880`08267d00 fffff800`01d66c06 : fffffa80`0e09db60 00000000`00000080 fffffa80`09a3a040 00000000`00000001 : vhdmp!VhdmpiAsyncOpThread+0x52

    fffff880`08267d40 fffff800`01aa0c26 : fffff880`009bf180 fffffa80`0e09db60 fffffa80`09a3ab60 fffff880`01452534 : nt!PspSystemThreadStartup+0x5a

    fffff880`08267d80 00000000`00000000 : fffff880`08268000 fffff880`08262000 fffff880`08267450 00000000`00000000 : nt!KxStartSystemThread+0x16

     

    Notice that this thread has an I/O request packet (IRP) and for scenarios like this (server freezes) it is very important to review the IRP for that thread that seems to be causing the issue. To do that you need to use the command !irp, here it is the result for this case:

     

    0: kd> !irp fffffa8010231bd0

    Irp is active with 12 stacks 9 is current (= 0xfffffa8010231ee0)

     Mdl=fffffa8010287000: No System Buffer: Thread fffffa800e09db60:  Irp stack trace. 

         cmd  flg cl Device   File     Completion-Context

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                         Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                         Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                         Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                         Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                         Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                         Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                         Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000    

     

                         Args: 00000000 00000000 00000000 00000000

    >[  3,34]  10 e1 fffffa800a03f060 00000000 fffff88000e34180-fffffa800a04de20 Success Error Cancel pending

                  \Driver\Disk  volmgr!VmpReadWriteCompletionRoutine

                         Args: 00200200 00000000 3bd4e26200 00000000

     [  3, 0]   0 e0 fffffa800a04dcd0 00000000 fffff88001792ee0-fffffa800de75450 Success Error Cancel

                  \Driver\volmgr      volsnap!VspReadCompletionRoutine

                         Args: 00200200 00000000 3bd4d26200 00000000

     [  3, 0]   0 e1 fffffa800a047040 00000000 fffff8800145ef70-fffffa800dabd550 Success Error Cancel pending

                  \Driver\volsnap     Ntfs!NtfsMasterIrpAsyncCompletionRoutine

                         Args: 00200200 00000000 3bd4d26200 00000000

     [  3, 0]   0  1 fffffa800a069030 fffffa800df75880 00000000-00000000    pending

                  \FileSystem\Ntfs

                         Args: 00200200 00000000 21ed477200 00000000

     

    The major function 3,34 has the pending status set, which appears to be stuck on \Drover\Disk (Volume Managr – volmgr). Let’s take a closer look on the value fffffa800a03f060, which we can see by using the command !devobj as shown below:

     

    0: kd> !devobj fffffa800a03f060

    Device object (fffffa800a03f060) is for:

     DR2 \Driver\Disk DriverObject fffffa8009e53e70

    Current Irp 00000000 RefCount 0 Type 00000007 Flags 01002050

    Vpb fffffa8009fd50b0 Dacl fffff9a10043d7f0 DevExt fffffa800a03f1b0 DevObjExt fffffa800a03f858 Dope fffffa8009feb450

    ExtensionFlags (0x00000800) 

                                 Unknown flags 0x00000800

    AttachedDevice (Upper) fffffa800a03fb90 \Driver\partmgr

    AttachedTo (Lower) fffffa8009fe9b70 \Driver\USBSTOR

     

    This is a USB driver, which is something that I was not expecting to have a this TMG box at all, so I decided to before even move forward ask to the TMG administrator what a USB driver is doing on TMG. TMG Administrator then clarified the scenario as follow: “We have a USB external driver attached to the system to store the full backup of the system that we perform every night.”

     

    The dump is quiet clear, in other words: server is waiting for the disk. At this point I can conclude that the server was hanging waiting for this USB driver, which for some reason was not responding, as a consequence the whole box stop responding (including TMG, but clearly not a TMG issue).

     

    4. Conclusion

     

    After start reviewing the logs from the backup software we could conclude that: the backup job was supposed to finish at 6AM, however some days was extending beyond that, when the backup was running ahead of 8AM and the production started on TMG the system presented this behavior. Backup software was performing a full backup of the system, including TMG cache folder (which by itself was 4GB). We do not recommend making a backup of TMG cache folder, this folder needs to be excluded from backup. Making backup of the cache folder can cause other issue of this nature, such as crashing Firewall Service as I mentioned in this post.  To resolve this problem we excluded the TMG cache folder from the backup and re-schedule the backup job to start at 11PM and let it run until 7AM in an attempt to avoid backup job to go beyond commercial business hours.

     

  • Yuri Diogenes's Blog

    New Books Hitting the Market

    • 0 Comments

    I would like to take some time today to bring to y’all attention some new books that are out there on Forefront UAG and TMG technologies. My buddies Ben Ari and Ran Dolev (both from CSS UAG Team) are finishing writing the new Microsoft Forefront UAG 2010 Administrator's Handbook, since those guys are very sharp on UAG you surely can expect a great content. Another new book out there about TMG is from the Forefront MVP Tamás Gál, more info about the book (in Hungarian) is available here. Now that me and Tomas pretty much finished writing the Forefront books series (see previous blog post) I can take some time to finish another project that I’m working since December last year, which is the Security+ Certification Guide book (in Portuguese only). This one I should be done writing by next month (October) and the book will be released in Brazil by April 2011, more info will be available at www.securityplusbr.org.

  • Yuri Diogenes's Blog

    Forefront Books available for Pre-order

    • 0 Comments

    Just a quick update on the Forefront eBooks,

    ebooks

     

     

    Me and Tom just finished writing and reviewing all three books, today we wrote the About the Authors and Acknowledgements. The content is now on the final stage with MSPress and the books are available for pre-order at Amazon.com, check it out at:

     

     

  • Yuri Diogenes's Blog

    TMG Managed Control Service taking too much time to start after applying TMG 2010 SP1

    • 3 Comments

    Recently I assisted an IT Pro on the Forum thread http://social.technet.microsoft.com/Forums/en-US/Forefrontedgesetup/thread/c7606ead-5957-4ef8-a4e9-e5aa85493581 where the problem was related to this behavior described in the title of this post. The problem happens very precisely, after applying TMG SP1 the TMG Managed Control Service takes too long to start, although it does start. As I describe in the Forum, if you are facing this issue and wants to use the private hotfix for that you will need to open a support case with Microsoft. If the hotfix resolves the problem the case won’t be charged, however if the hotfix does not fix the issue this means that the problem is not the same although the symptom could be similar. In other words, if further troubleshooting is necessary the case will be normally charged. To open a support case use this link http://support.microsoft.com/select/Default.aspx?target=assistance , select Forefront Threat Management Gateway (under security) and follow the wizard.

  • Yuri Diogenes's Blog

    TMG E-Mail Protection Feature x Exchange 2010 SP1

    • 0 Comments

    Whatta busy week here with this problem. As you might be aware, Exchange 2010 SP1 breaks the integration that TMG has with Exchange Edge when using E-Mail Protection feature, the issue causes the TMG Managed Control Service to fail to start. This issue is documented on our team blog http://blogs.technet.com/b/isablog/archive/2010/09/01/problems-when-installing-exchange-2010-service-pack-1-on-a-tmg-configured-for-mail-protection.aspx and on Exchange Team blog at http://msexchangeteam.com/archive/2010/09/01/456094.aspx. We are working on a hotfix for this issue and more news will be posted on the TMG Team Blog as soon as we test the fix in our lab and we confirm that works.

    One thing that it is important to keep in mind is the following: don’t make changes directly on Exchange or FPE installed on the TMG Server when this issue happens. What I mean with that? Here it is the issue flow so you can better understand:

    1. TMG Email Protection is working fine.
    2. You install Exchange 2010 SP1
    3. TMG Managed Control Service fails to start.
    4. === Here you identified that you are facing this problem. At this point don’t try to:
    1. Make changes via Powershell (or console) on Exchange Edge.
    2. Make changes via Powershell (or console) on FEP.

    The reason behind that is because if you make changes, there is a possibility that once we fix the original issue (with the hotfix that will come up soon), TMG Managed Control Service will still failing to start with the error message below:

    image 

    …and on Event Viewer you might see the error below:

    Log Name:      Application

    Source:        Microsoft Forefront TMG Control

    Event ID:      31308

    Task Category: None

    Level:         Error

    Keywords:      Classic

    Description:

    The Forefront TMG Managed Control service failed to initialize. Error information: Value does not fall within the expected range.

    This error has nothing to do with the issue introduced by Exchange 2010 SP1, this is another problem that can be caused by changes that are made directly on those products and when TMG Managed Control Service tries to synchronize the config, it fails. It is important to mention that this failure doesn’t always happen, the normal behavior is to TMG detect the changes and undo it. When TMG is able to do that you will see the following alert:

    Solution? Undo the manual changes that were done via those products' console and make sure to always use TMG to manage the configuration among those products (at least this is the behavior now TMG RTM and SP1). In this particular case that TMG Managed Control Service is down, you can’t make changes via TMG console because it will not sync with Exchange Edge and FEP, therefore you should hold any change to be done only after the hotfix for this issue (Exchange 2010 SP1 problem) is released and you had applied on the system.

Page 1 of 1 (6 items)