ADFS 2012 R2 provides an interesting feature called Extranet Lockout Protection, where the intent is to protect AD accounts from malicious lockout from external access attempts. Previous versions of ADFS had no native mechanism to protect AD from such hammering attempts. For details on the feature please review this post.
One issue that can occur when extranet lockout protection is enabled is around how it deals with AD accounts that have had no bad passwords submitted against them. Bad password attempts are stored in the BadPwdCount attribute in AD, and are stored on the server that processed the failed logon request.
In this post we will look at the account called Vanilla-1 - this is an account that has not had a single wonky password submitted to it. Making sure there were no password typos was the most stressful part of writing this post! To see BadPwdCount on the Windows 2012 R2 DC, we can use the Get-ADDomainController cmdlet to enumerate all domain controllers. This lab has two domain controllers, which is why there are two lines returned. This collection is then used in the ForEach loop to enumerate the user’s properties on each DC passed to it as shown below:
Get-ADDomainController -Filter * | ForEach { Get-ADuser "Vanilla-1" -Properties * -Server $_ } | Format-Table Name, PasswordLastSet, BadPwdCount
For comparison, note that a separate account User-2 has 4 and 1 BadPwdCount reported from different DCs.
In the above example we are looking at one specific account, if you wanted to dump all user objects, then change the filter for the Get-ADUser cmdlet:
Get-ADDomainController -Filter * | ForEach { Get-ADuser -Filter {(ObjectClass -eq "user")} -Properties * -Server $_} | Format-Table Name, PasswordLastSet, BadPwdCount
Now that we have verified that the BadPwdCount is not set for Vanilla-1, let’s try and logon to ADFS 2012 R2 using this account. The URL we will hit is:
https://adfs.tailspintoys.ca/adfs/ls/idpinitiatedsignon.htm
And we get the lovely error below:
An error occurred An error occurred. Contact your administrator for more information Activity ID: 00000000-0000-0000-0b00-0080000000d2 Error time: Tue, 15 Jul 2014 19:08:55 GMT Cookie: enabled User agent string: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.3; WOW64; Trident/7.0; .NET4.0E; .NET4.0C; .NET CLR 3.5.30729; .NET CLR 2.0.50727; .NET CLR 3.0.30729; BRI/2; MS-RTC EA 2; MS-RTC LM 8; InfoPath.3)
An error occurred
An error occurred. Contact your administrator for more information
Activity ID: 00000000-0000-0000-0b00-0080000000d2
Error time: Tue, 15 Jul 2014 19:08:55 GMT
Cookie: enabled
User agent string: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.3; WOW64; Trident/7.0; .NET4.0E; .NET4.0C; .NET CLR 3.5.30729; .NET CLR 2.0.50727; .NET CLR 3.0.30729; BRI/2; MS-RTC EA 2; MS-RTC LM 8; InfoPath.3)
(I’ll come back to why the Activity ID is highlighted in a moment)
Since ADFS auditing is enabled, post on that coming up soon, we can see the below in the security event log on the ADFS server:
Looking at the details of the failed ADFS events we can see EventId 300 stating that there was an error enumerating the AD account.
Then EventID 413 is logged when processing the request:
How do I know for sure that these events map to the failed logon shown in the IE screen capture? Well apart from the fact that this is my lab and no-one else is using it? Remember the Activity ID that was highlighted in red? Go back and look at the IE screen capture. Notice that they are the same? This is how we can correlate between a client and lots of logged events.
Now that we have the background how to address this? Brian Reid who has forgotten more about transport than I know, kindly added a comment to the initial blog post. As Brian points out this issue is discussed in 2971171 - A new Active Directory user cannot log on from the AD FS server when the server is running from a GMSA account. Before this can be installed, the April 2014 2012 R2 must be installed – this is KB 2919355. This lab machine was updated with this when the April update was first released:
What is not totally clear though is the fact that this lab is *NOT* using a GMSA account – it is using a standard service account. However, installing the update did resolve the issue.
After installing the ADFS 2012 R2 update, let’s try this again. Things are much better, and the account Vanilla-1 is able to logon to ADFS.
In the security event log, we now see successful events when Vanilla-1 logs on to ADFS.
EventID 4624 shows Vanilla-1 as a successful logon:
And to make sure that I did not fnangle anything in the background (like making a typo in the password), note that the bottom command was taken 3 minutes after the screenshot above. The BadPwdCount is the same as the start. Phew – I did not typo the password!!
Just like all software components ADFS needs maintenance. In the Office 365 portal, the notification page has been alerting admins that a performance issue exists with ADFS 2012 R2 and that a hotfix must be installed. This is one of the items below.
Not related to ADFS 2012 but the same maintenance is also needed for ADFS 2.0 – Updates are also available for those older builds, for example Update Rollup 3 for ADFS 2.0.
In addition to break fix, KB 2927690 also lights up the alternative logon capability. It pays to stay current on updates!
Cheers,
Rhoderick
After a recent video driver update, my corporate Outlook client started to do some strange things. Within Office 2013, the screen output would be distorted. Menu bars were not painted properly until I mouse-over them again, or moved Office programs around. Other times the display would look corrupted and the navigation tree would not be properly rendered.
Update 16-7-2014: Adding link to Office 2013 known issues. This discusses implications of disabling the hardware acceleration and some issues with certain drivers.
Update 5-11-2014: Adding link to Windows 10 Preview display issue.
I would see issues like this in Outlook -- note that the left hand navigation tree is unreadable in places
Then when composing/reading an email, the ribbon would be corrupted and/or distorted:
The underlying issue is discussed in KB 2768648 - Performance and display issues in Office 2013 client applications.
While the underlying issue is with the video card driver, there is a workaround until the video driver is updated.
On an individual machine, the user can open up the Outlook advanced properties and select the option to “Disable hardware graphics acceleration”
Doing so immediately fixed up my errant display issues. I’ll update this post when I see a video card release that resolves the issue.
There are some alternative methods of implementing this via the registry and GPO.
We can also set a registry key to disable the feature:
HKEY_CURRENT_USER\Software\Microsoft\Office\15.0\Common\GraphicsREG_DWORD DisableHardwareAcceleration Value: 0x1
To query this via cmd prompt:
REG.exe Query HKCU\Software\Microsoft\Office\15.0\Common\Graphics /V DisableHardwareAcceleration
To set this via cmd prompt:
REG.exe Add HKCU\Software\Microsoft\Office\15.0\Common\Graphics /T REG_DWORD /V DisableHardwareAcceleration /V 0x1 /F
(Note that the above is one line that may wrap)
We can retrieve the current configuration using the first command, whilst the second sets the value:
Get-ItemProperty -Path HKCU:\Software\Microsoft\Office\15.0\Common\Graphics -Name DisableHardwareAcceleration | Select-Object DisableHardwareAcceleration | FT –AutoSize New-ItemProperty -Path HKCU:\Software\Microsoft\Office\15.0\Common\Graphics -Name DisableHardwareAcceleration -PropertyType DWORD -Value "0x1" –Force
Get-ItemProperty -Path HKCU:\Software\Microsoft\Office\15.0\Common\Graphics -Name DisableHardwareAcceleration | Select-Object DisableHardwareAcceleration | FT –AutoSize
New-ItemProperty -Path HKCU:\Software\Microsoft\Office\15.0\Common\Graphics -Name DisableHardwareAcceleration -PropertyType DWORD -Value "0x1" –Force
(Note that the above are all one line that may wrap)
In the Office 2013 Administrative Template filesthere is an option to disable hardware acceleration. To do this:
When resolving issues with on-premises Exchange sometimes the issue may be directly within Exchange, other times the root cause may lie outside Exchange. Depending upon the exact nature of the case we may have to investigate network switches, load balancers or storage. When Exchange is virtualized then the hypervisor and it’s configuration also may require attention.
This was the case with a recent customer engagement. Initially the scope was upon Exchange with symptoms including Exchange servers dropping out of the DAG, databases failing over and poor performance for users. As with most cases that get escalated to me, there is rarely only a single issue in play and multiple items have to be addressed. The customer was using ESX 5 update 1 as a hypervisor solution, and Exchange 2010 SP3. Exchange was deployed in a standard enterprise configuration with a DAG, CASArray and a third party load balancer.
In this case, one of the biggest issues was that of the hypervisor discarding valid packets. Within this environment an Exchange DAG server that was restarted had discarded ~ 35,000 packets during the restart. Exchange servers that had been running for a couple of days had discarded 500,000 packets. That’s a whole lot of packets to lose. This was the cause of servers dropping out of the cluster and generating EventID 1135 errors. This is issue is discussed in detail in this previous post, which also contains a PowerShell script that will easily retrieve the performance monitor counter from multiple servers. The script allows you to monitor and track the impact of the issue easily.
Yay – we found the issue and all was well. Time to close the case? NO!
There were multiple other issues involved here and not all of them were immediately obvious when troubleshooting so I wanted to share these notes for awareness purposes. All software needs maintenance, Exchange itself is no exception and it is critical to keep code maintained with the vendors' updates. This ensures that you address known issues, and proactively maintain the system. As always this must be tempered with adequately testing any update in your lab prior to deploying it in production.
This post is only to raise awareness of the below issues and is not intended to be negative to the hypervisor in question. As stated above Exchange, Windows and Hyper-V all require updates. Hyper-V experienced network connectivity issues previously and required an update.
The customer reported that the DAG IP address was causing conflicts on the network. The typical cause for this is for the administrator to manually add the DAG IP to one or more cluster nodes manually. This is an IP address that can be bound to any node and the cluster service will perform the required steps, and the administrator should only add it as a DAG IP address and do no more. The DAG was correctly configured and servers only had their unique host IP address assigned.
Initially there seemed to be a correlation with the duplicate DAG IP address and backups. However this was quickly discarded as the duplicate IP issue would only happen once every several weeks and could not be reproduced on demand by initiating a backup.
There is an issue documented in KB 1028373- False duplicate IP address detected on Microsoft Windows Vista and later virtual machines on ESX/ESXi when using Cisco devices on the environment. This issue occurs when the Cisco switch has gratuitous ARPs enabled or the ArpProxySvc replied to all ARP requests incorrectly
This was the initial issue discussed above and is covered here.
It is always prudent to keep working an issue until it is proven that the root cause has been addressed. In this case additional research was done to investigate networking issues on the hypervisor and the below links are included for reference.
The symptom of large guest OS packet loss can include servers being dropped from the cluster. When a node is removed from cluster membership, EventID 1135 is logged into the system event log.
To report on such errors, I wrote a script to enumerate the instances of this EventID. Please see this post for details on the script.
KB 2055853- VMXNET3 resets frequently when RSS is enabled in a Windows virtual machine
Disabling RSS within the guest OS is not ideal for high volume machines as this could lead to CPU contention on the first core. Please work to install the requisite update for the hypervisor.
KB 2058692- Possible data corruption after a Windows 2012 virtual machine network transfer
Modern versions of Windows will typically not be using this virtual NIC – currently they will typically use VMXNet3. However be aware of the other issues on this page affecting VMXNet3 vNICs.
When installing the VMware tools in ESXi5, selecting the FULL installation option will also install the vShield filter driver. There is a known issue with this filter driver that is discussed in KB 2034490- Windows network file copy performance after full ESXi 5 VMware Tools installation.
Starting with ESXi 5.0, VMware Tools ships with the vShield Endpoint filter driver. This driver is automatically loaded when VMware Tools is installed using the Full option, rather than the Typical default.
I also saw this TechNet forum post with a related issue to what was observed onsite. Servers would discard a very high number of packets which would severely impact the application users were trying to access.
There are some important items to review when configuring NLB on VMware.
It is critical to discuss the NLB implementation with the hypervisor team and also the network team. Be very specific with what is being implemented and what is expected of both of these teams. Some network teams do not like NLB unicast as it leads to switch flooding, whilst others do not appreciate having to load static ARP entries into routers to ensure remote users can access the NLB VIP. Cisco has Catalyst NLB documentation here. Avaya has some interesting documentation on this page.
For this and other reasons Exchange recommends the use of a third party load balancer. This could be a physical box in a rack or a VM which can run inside Hyper-V or ESX. Please consult with your load balancer vendor so they can best meet your business, technical and price requirements.
Whilst working on a customer’s Exchange 2010 DAG issue, I wrote a quick script to quickly grab some performance monitor counters from all of their Exchange servers. The issue that we were investigating was related to discarded packets when the VM was running on a certain hypervisor host. The customer had moved their Exchange VMs to a new host and after doing so they were experiencing cluster issues. Randomly nodes would be dropped from cluster membership which would impact the Exchange 2010 DAG as any active copies of those Exchange databases would then have to be mounted on another server. The activation was happening automatically (as expected) but it is still not a desired state.
On the Exchange servers we observed EventID 1135 – Cluster node was removed from the active failover cluster membership.
At this point we did not do the typical knee jerk reaction that normally happens -- which is to simply rack up the cluster timeout values. Why you ask? Well that does not address the root cause, and only masks the symptom.
Update 18-11-2014: Please see this post for a script to retrieve the number of 1135 EventId errors on multiple servers.
We quickly checked the basics, and made sure that the Exchange 2010 recommended DAG update (it’s a cluster update but Exchange recommends it strongly) was installed, and also the generic updates recommended for the version of the OS Exchange was installed onto. They are discussed in this post along with other Exchange 2010 deployment tweaks.
None of this made a difference. The cluster still experienced EventID 1135 cluster disconnects. Since this only started after the VMs were moved to the new host, known issues for those hosts were then reviewed. In VMware KB 1010071 and 2039495 these symptoms are discussed and Exchange 2010 is specifically tagged in the second article.
While the hypervisor admins have their own tools to report and investigate such issues, we can use Performance Monitor to see how Windows perceives the lay of the land.
The counter that we were looking at was “Packets Received Discarded”. The sample image below is from Hyper-V and shows the location:
From the Perfmon description: Packets Received Discarded is the number of inbound packets that were chosen to be discarded even though no errors had been detected to prevent their delivery to a higher-layer protocol. One possible reason for discarding packets could be to free up buffer space.
This is great – we can use this counter to look at the issue, but how to do it easily across multiple servers? And then potentially across every single VM that the customer has since if we are hitting the issue on one set of VMs what other VMs are affected? We could:
To see if we were experiencing the issue across multiple Exchange servers, and to gauge the severity I wrote a quick PowerShell script that would pull in the required performance counters from multiple servers quickly and easily. This uses the Get-Counter cmdlet as shown here:
Get-Counter "\Network Interface(*)\Packets Received Discarded" -ComputerName $Server
The script will get a collection of NICs from the specified server, and then loop through them and remove the pseudo ones. For example do not want to see Teredo, ISATAP or 6to4 interfaces. For the purposes of this script we are concerned with the physical ones, and that includes the "physical" NICs that are made visible in virtual guest Operating Systems. NIC names are not hardcoded into the script else it would not be portable across physical server types and hypervisors.
You can obtain this from the TechNet Gallery under the Get Perfmon Counter Packets Received Discarded On Multiple Servers.
Update 22-10-2014: Updated script to also include OS uptime and OS installation date
Using the script, we were able to quickly check all of the customer’s servers and quickly pinpoint trends in the environment. One half of the DAG servers were experiencing discarded packets many times higher that the others, and the trends were noted on both MAPI and REPL networks. This allowed us to focus on particular hypervisor hosts.
Armed with this data, we then could ask why specific VMs were more impacted than others and prove it. It turns out that the Exchange VMs had 64GB of RAM assigned and had been placed onto hosts which had 64GB of physical memory. Since there was no free memory for the hypervisor, this was placing pressure onto the hypervisor and exacerbating the issue.
This is an issue that has received attention in the past from the Exchange community. In addition to the other great posts out there on this topic, I posted this to set the context around the script as we all want an easy to check lots of servers and potentially monitor for this issue.
In a little under 6 months, multiple products will experience a support life cycle change. As indicated by the space shuttle being readied, the countdown has started and the dates are set. Make sure you are aware of what is happening on the 13th of January 2015.
2014 has already been a busy year for product transitions, with Exchange 2003, Office 2003 and Windows XP Pro all exiting out of extended support.
While not changing with the products below, please also make sure that the Windows Server 2003 and 2003 R2 end of extended support date is also on your calendars and project plans. Please plan to move off Windows Server 2003 by the 14th of July 2015. I also blogged about the upcoming change for Office 2010, where SP2 will be required after the 14th October 2014 since SP1 will no longer be supported.
There are multiple products that we need to be aware of that will experience support status changes in January 2015. They include:
Mainstream support for Exchange 2010 will end on the 13th of January 2015, and Exchange 2010 will then enter its extended support phase.
Windows 7 will also experience a state change. As with Exchange 2010, it will leave mainstream support on the 13th of January 2015 and enter extended support phase.
Windows Server 2008’s mainstream support will also end on the 13th of January 2015.
As will Windows Server 2008 R2
Forefront Unified Access Gateway (UAG) customers need to update to SP4 since UAG 2010 SP3 support will end on the 13th of January 2015.
Finally, not that I see this much in the wild since Hyper-V was launched, Virtual Server 2005 and Virtual Server 2005 R2 support will end on the 13th of January 2015.
The Lifecycle site’s FAQhas more information and details on support options if you are not able to complete your migration prior to the end of support dates.
Make sure that you are able to migrate to a supported product prior to the support expiration date. Security updates will notbe provided for products that are not supported.
Want to grow and expand your technical, customer and soft skills? Do you dream of working with the product groups at Microsoft to influence future product updates? Want to have fun while delivering true mission critical support to top tier Microsoft customers? Then you are in luck!
Microsoft Canada GBS is looking for Exchange and Office 365 experts. Microsoft's Global Business Support (GBS) group is part of our Customer Service and Support team which in turn is part of our worldwide Services Organisation.
You may also know us as Premier Field Engineering (PFE) #MSPFE
As an example of what we do please see:
If you have reached that point in your career where you are craving to expand, grow your sphere of influence and opportunities are limited this is a great opportunity to join the team at Microsoft Canada.
The job posting can be found on the Microsoft Canada careers site.
Update 20-7-2014: Replaced placeholder text with the exact job description link.