July, 2012

  • Troubleshooting Pool Leaks Part 1 – Perfmon

    Over the years the NTDebugging Blog has published several articles about pool memory and pool leaks.  However, we haven’t taken a comprehensive approach to understanding and troubleshooting pool memory usage.  This upcoming series of articles ...read more
  • Simplifying printing in Windows 8

    Hello AskPerf Blog Readers!  Steven Sinofsky published the below blog last week just in case you missed it.  It was authored by one of our lead Program Managers on the Printing team.  It talks about some of the new exciting Printing features included in Windows 8.  Check it out!

    Simplifying printing in Windows 8

    -Blake

  • Kerberos errors in network captures

    Hi guys, Joji Oshima here again. When troubleshooting Kerberos authentication issues, a network capture is one of the best pieces of data to collect. When you review the capture, you may see various Kerberos errors but you may not know what they mean ...read more
  • Common DFSN Configuration Mistakes and Oversights

    Hello all, Dave here again. Over a year ago, Warren created an ASKDS blog post covering common DFS Replication (DFSR) mistakes and oversights . Someone asked me recently where the common DFS Namespaces (DFSN) mistakes article was located. Well, here it ...read more
  • Windows Server 2012 Failover Cluster Sessions at TechEd

    Start getting familiar with Windows Server 2012 Failover Clustering by viewing the sessions that were delivered at TechEd 2012. 

    These sessions are posted online now for your viewing pleasures.  For those not familiar with what TechEd is, I offer this.

    TechEd is Microsoft's premier technology conference for IT professionals and developers, offering the most comprehensive technical education across Microsoft's current and soon-to-be-released suite of products, solutions, tools, and services. TechEd offers hands-on learning, deep product exploration and countless opportunities to build relationships with a community of industry and Microsoft experts that will help your work for years to come.

    For over 20 years industry professionals have found TechEd to be the best opportunity to stay aligned with Microsoft’s current technologies and new product opportunities. You and your colleagues come to TechEd to discuss critical technology issues, gain practical advice, and network with Microsoft and industry experts. Whether you are an IT Professional or a Developer, TechEd has much to offer you.

    This year’s North America event included over 11,000 customers, partners, speakers, and staff. 

    Each session lasts about 1:15 and includes the PowerPoint and presentation that you can view online or download to view at a later time.  The audio formats you can view the sessions are:

    MP3 = audio only

    Mid Quality WMV = lo-band, mobile

    High Quality MP4 = Ipad, PC

    Mid Quality MP4 = WP7, HTML5

    MP4 = Ipod, Zune HD

    High Quality WMV = PC, Xbox, MCE

    The sessions from the Clustering team at TechEd North America were:

    WSV324 - Building a Highly Available Failover Cluster Solution with Windows Server 2012 from the Ground UP

    Windows Server 2012 delivers innovative new capabilities that enable you to build dynamic availability solutions in which workloads, networks, and storage become more flexible, efficient, and available than ever before. This session covers creating a Windows Server 2012 highly available Failover Cluster leveraging the new technologies in Windows Server 2012. This session walks through a demo leveraging a highly available Space, encrypting data with shared BitLocker disks, asymmetrical storage configurations with CSV I/O redirection… from the bottom up to a highly available solution.

    WSV430 - Cluster Shared Volumes Reborn in Windows Server 2012: Deep Dive

    This session takes a deep technical dive into the new Cluster Shared Volumes (CSV) architecture and new features coming in Windows Server 2012. CSV is now a full-blown clustered file system, and all of the challenges of the past have been addressed, along with many enhancements. This is an in-depth session that covers the CSV architecture, CSV backup integration, and integration with a wealth of new features that enhance CSV and its performance.

    WSV411 - Guest Clustering and VM Monitoring in Windows Server 2012

    In Windows Server 2012 there will be new ways to monitor application health state and have recovery inside of a virtual machine. This session details the new VM Monitoring feature in Windows Server 2012 as well as discusses Guest Clustering and changes in Windows Server 2012 (such as virtual FC), along with pros and cons of when to use each.

    WSV322 - Update Management in Windows Server 2012: Revealing Cluster-Aware Updating and the New Generation of WSUS

    Today, patch management is a required component of any security strategy. In Windows Server 2012, the new Cluster-Aware Updating (CAU) feature delivers Continuous Availability through automated self-updating of failover clusters. In Windows Server 2012, Windows Server Update Services (WSUS) has evolved to become a Server Role with exciting new capabilities. This session introduces CAU with a discussion of its GUI, cmdlets, remote-updating and self-updating capabilities. And then we proceed to highlight the main functionalities of WSUS in Windows Server 2012 including the security enhancements, patch deployment automation, and new Windows PowerShell cmdlets to perform maintenance, manage and deploy updates

    VIR401 - Hyper-V High-Availability and Mobility: Designing the Infrastructure for Your Private Cloud

    Private Cloud Technical Evangelist Symon Perriman leads this session discussing Windows Server 2012 and Windows Server 2008 R2 Hyper-V and Failover Clustering design, infrastructure planning and deployment considerations for your highly-available datacenter or Private Cloud. Do you know the pros and cons of how different virtualization solutions can provide continual availability? Do you know how Microsoft System Center 2012 can move the solution closer to a Private Cloud implementation? This session covers licensing, hardware, validation, deployment, upgrades, host clustering, guest clustering, disaster recovery, multi-site clustering, System Center Virtual Machine Manager 2008 and 2012, and offers a wealth of best practices. Prior clustering and Hyper-V knowledge recommended.

    The sessions from the Clustering team at TechEd Europe were:

    WSV324 - Building a Highly Available Failover Cluster Solution with Windows Server 2012 from the Ground UP

    Windows Server 2012 delivers innovative new capabilities that enable you to build dynamic availability solutions in which workloads, networks, and storage become more flexible, efficient, and available than ever before. This session will cover creating a Windows Server 2012 highly available Failover Cluster leveraging the new technologies in Windows Server 2012. This session will walk through a demo leveraging a highly available Space, encrypting data with shared BitLocker disks, asymmetrical storage configurations with CSV I/O redirection… from the bottom up to a highly available solution.

    WSV430 - Cluster Shared Volumes Reborn in Windows Server 2012: Deep Dive

    This session will do a deep technical dive of the new Cluster Shared Volumes architecture and new features coming in Windows Server 2012. CSV is now a full blown clustered file system, and all of the challenges of the past have been addressed, along with many enhancements. This will be an in-depth session that will cover the CSV architecture, CSV backup integration, and integration with a wealth of new features that enhance CSV and their performance. CSV backup integration, and integration with a wealth of new features that enhance CSV and it's performance

    All other Tech Ed content regarding Windows Server 2012 and Windows 8 can be viewed at the same locations:

    North America

    Europe

    I hope that you can take some time to get to know the new products and features that are coming and get as excited about it as Microsoft is.

    Happy Clustering !!

    John Marlin
    Senior Support Escalation Engineer
    Microsoft Enterprise Platforms Support

  • Dynamic Access Control and ISV Goodness

    Hey all, Ned here with a quickie: Robert Paige just published an interesting read on Windows Server 2012 Dynamic Access Control over at the Windows Server blog: http://blogs.technet.com/b/wincat/archive/2012/07/20/diving-deeper-into-windows-server-2012 ...read more
  • Friday Mail Sack: I Don’t Like the Taste of Pants Edition

    Hi all, Ned here again. After a few months of talking about Windows Server 2012 to other ‘softies from around the globe, I’m back with the sack. It was great fun – and not over yet, it turns out – but I am finally home for a bit ...read more
  • How To Deadlock Yourself (Don’t Do This)

    Some APIs should come with a warning in big red letters saying “ DANGER! ”, or perhaps more subtly “ PROCEED WITH CAUTION ”.  One such API is ExSetResourceOwnerPointer . Although the documentation contains an explanation of what limited activity ...read more
  • From our PowerShell folks – Windows PowerShell Web Access

    Hello AskPerf!  Not sure if you’ve heard about this new feature with Windows 8 Server beta or not, but you can now manage your machines via Windows PowerShell in a web browser.  This means, you can use Windows PowerShell from a large variety of devices such as mobile phones, tables, and computers that do not have Windows PowerShell installed.

    Check out their blog post here:

    Introducing Windows PowerShell Web Access in Windows Server 8 Beta

    -Blake Morrison

  • Standardizing Dynamic Access Control Configuration – Exporting and Importing Dynamic Access Control objects between Active Directory Forests

    [This is a guest post from Joe Isenhour, a Senior Program Manager in Windows Server. You may remember him from his previous ADFS claims rule post . If you are not yet up to speed on the DAC security suite in Windows Server 2012, I recommend our own Mike ...read more
  • Managing the Recycle bin with Redirected Folders with Vista or Windows 7

    Hi, Gary here, and I have been seeing a few more questions regarding the recycle bin on redirected folders . With the advent of Windows Vista there was a change in redirected folders and the support for the Recycle bin. Now each redirected folder has ...read more
  • RSA Key Blocking is Coming

    Hey all, Ned here again with one of my rare public service announcement posts: In August 2012, Microsoft will issue a software update for Windows XP, Windows Server 2003, Windows Server 2003 R2, Windows Vista, Windows Server 2008, Windows 7, and Windows ...read more
  • I’m Baaaaaccccck

    Hey all, Ned here again. After a few months of training room huffing, airline food loathing, and PowerPoint shilling, I’m back in Charlotte. I’ve got a backlog of legacy product posts to share from colleagues, and with Windows 8 and Windows Server 2012 ...read more
  • Windows 7 client machines show printers offline on Windows Server 2008 R2

    Hello AskPerf! I’m Craig Marcho, a Senior Support Escalation Engineer in the Microsoft Platforms Core Team. There has been an increase in cases lately with Windows 7 Clients and Windows Server 2008 R2 Print Servers where Clients will show print queues as being offline, while at the same time, other Clients can print just fine and the Print Server shows the queue as online. While there are a few things that can cause this behavior, all of our normal troubleshooting steps were not providing relief for this particular issue.

    We found that this issue occurs because a restricted client thread that runs in the spooler detects an offline print server. When a client thread detects that a print server is offline, Windows registers a polling loop to check the status of the print server. After the polling loop is registered, Windows queries the print server periodically to check whether it is back online.

    In rare cases, a client thread that has a restricted token detects that a print server is offline. In this situation, Windows registers the polling loop in the context of this thread. However, the thread does not have sufficient rights to query the print server. Therefore, the polling request fails. Restarting the spooler or the client machine will obviously kill this thread running under the restricted context and will query the server with the correct security.

    The hotfix became available on July 11th and you may download it here:

    A network printer is displayed as offline incorrectly on a computer that is running Windows 7 or Windows Server 2008 R2

    So if you have been experiencing this issue, or know someone who has, please spread the word that a fix is now available.

    -Craig Marcho

  • Having a problem with nodes being removed from active Failover Cluster membership?

    Welcome to the AskCore blog. Today, we are going to talk about nodes being removed from active Failover Cluster membership randomly. If you are having problems with a node being removed from membership, you are seeing events like this logged in your System Event Log:

    image

    This event will be logged on all nodes in the Cluster except for the node that was removed. The reason for this event is because one of the nodes in the Cluster marked that node as down. It then notifies all of the other nodes of the event. When the nodes are notified, they discontinue and tear down their heartbeat connections to the downed node.

    What caused the node to be marked down?

    All nodes in a Windows 2008 or 2008 R2 Failover Cluster talk to each other over the networks that are set to Allow cluster network communication on this network. The nodes will send out heartbeat packets across these networks to all of the other nodes. These packets are supposed to be received by the other nodes and then a response is sent back. Each node in the Cluster has its own heartbeats that it is going to monitor to ensure the network is up and the other nodes are up. The example below should help clarify this:

    image

    If any one of these packets are not returned, then the specific heartbeat is considered failed. For example, W2K8-R2-NODE2 sends a request and receives a response from W2K8-R2-NODE1 to a heartbeat packet so it determines the network and the node is up.  If W2K8-R2-NODE1 sends a request to W2K8-R2-NODE2 and W2K8-R2-NODE1 does not get the response, it is considered a lost heartbeat and W2K8-R2-NODE1 keeps track of it.  This missed response can have W2K8-R2-NODE1 show the network as down until another heartbeat request is received.

    By default, Cluster nodes have a limit of 5 failures in 5 seconds before the connection is marked down. So if W2K8-R2-NODE1 does not receive the response 5 times in the time period, it considers that particular route to W2K8-R2-NODE2 to be down.  If other routes are still considered to be up, W2K8-R2-NODE2 will remain as an active member.

    If all routes are marked down for W2K8-R2-NODE2, it is removed from active Failover Cluster membership and the Event 1135 that you see in the first section is logged. On W2K8-R2-NODE2, the Cluster Service is terminated and then restarted so it can try to rejoin the Cluster.

    For more information on how we handle specific routes going down with 3 or more nodes, please reference “Partitioned” Cluster Networks blog that was written by Jeff Hughes.

    Now that we know how the heartbeat process works, what are some of the known causes for the process to fail.

    1. Actual network hardware failures. If the packet is lost on the wire somewhere between the nodes, then the heartbeats will fail. A network trace from both nodes involved will reveal this.

    2. The profile for your network connections could possibly be bouncing from Domain to Public and back to Domain again. During the transition of these changes, network I/O can be blocked. You can check to see if this is the case by looking at the Network Profile Operational log. You can find this log by opening the Event Viewer and navigating to: Applications and Services Logs\Microsoft\Windows\NetworkProfile\Operational. Look at the events in this log on the node that was mentioned in the Event ID: 1135 and see if the profile was changing at this time. If so, please check out the KB article “The network location profile changes from "Domain" to "Public" in Windows 7 or in Windows Server 2008 R2”.

    3. You have IPv6 enabled on the servers, but have the following two rules disabled for Inbound and Outbound in the Windows Firewall:

    • Core Networking - Neighbor Discovery Advertisement
    • Core Networking - Neighbor Discovery Solicitation

    4. Anti-virus software could be interfering with this process also. If you suspect this, test by disabling or uninstalling the software. Do this at your own risk because you will be unprotected from viruses at this point.

    5. Latency on your network could also cause this to happen. The packets may not be lost between the nodes, but they may not get to the nodes fast enough before the timeout period expires.

    6. IPv6 is the default protocol that Failover Clustering will use for its heartbeats. The heartbeat itself is a UDP unicast network packet that communicates over Port 3343. If there are switches, firewalls, or routers not configured properly to allow this traffic through, you can issues like this.

    7. IPsec security policy refreshes can also cause this problem. The specific issue is that during an IPSec group policy update all IPsec Security Associations (SAs) are torn down by Windows Firewall with Advanced Security (WFAS). While this is happening, all network connectivity is blocked. When re-negotiating the Security Associations if there are delays in performing authentication with Active Directory, these delays (where all network communication is blocked) will also block cluster heartbeats from getting through and cause cluster health monitoring to detect nodes as down if they do not respond within the 5 second threshold.

    These are the most common reasons that these events are logged, but there could be other reasons also. The point of this blog was to give you some insight into the process and also give ideas of what to look for. Some will raise the following values to their maximum values to try and get this problem to stop.

     

    Parameter

    Default

    Range

    SameSubnetDelay

    1000 milliseconds

    250-2000 milliseconds

    CrossSubnetDelay

    1000 milliseconds

    250-4000 milliseconds

    SameSubnetThreshold

    5

    3-10

    CrossSubnetThreshold

    5

    3-10

    Increasing these values to their maximum may make the event and node removal go away, it just masks the problem. It does not fix anything. The best thing to do is find out the root cause of the heartbeat failures and get it fixed. The only real need for increasing these values is in a multi-site scenario where nodes reside in different locations and network latency cannot be overcome.

    I hope that this post helps you!

    Thanks,
    James Burrage
    Senior Support Escalation Engineer
    Windows High Availability Group