• Public Folders – Folders Assistant Not Forwarding Messages

    A recent issue occurred where users were trying to create Public Folders Folder Assistant that would forward the post/inbound message to another Public Folder.  The users were able to configure this, however the message was never forwarded.  This is a short synopsis of how we resolved this problem.

    PREREQUISITES

    • Public Folders – both were mail enabled and visible in the GAL
    • Exchange 2010 SP2
    • Outlook 2010 SP1

     

    CONFIGURATION
    Within Outlook, open the Public Folder properties and select Folder Assistant (General Tab).
    image

    Add a rule:
    image

    Set that rule to forward to another Public Folder
    image

    NOTE: All other settings were optional, and did not impact this issue.

    SYMPTOMS
    • Get-PublicFolder shows that this folder does have HaveRules set to True
    • Changing the delivery to a mailbox, rather than a Public Folder, works just fine
    • Event Viewer’s Application Log contained the following event:

    Source: MSExchangeIS Public Store
    Event ID: 2028
    Task Category: Transport Delivering
    Level: Error
    Description: The delivery of a message sent by public folder 0000013456F2 has failed.
    To: MyPublicFolder
    The non-delivery report has been deleted.

    RESOLUTION
    As you can, the above event is related to Transport Delivering, from the MSExchangeIS Public Store.  Since delivery worked to a mailbox, but not the 2nd Public Folder, this led me to think permissions. 
    After modifying the Anonymous Client Permissions to include CREATE ITEMS, the Folder Assistant worked just fine.  By default, Anonymous only had Folder Visible, not Create Items.


    I don’t know how many people will ever run into this, but I figured this post may help reduce troubleshooting time for others.
    Good Luck!
    Doug

  • A Few Recommendations for Exchange 2010


    The following is a partial list of items that I recommend be reviewed for all Exchange 2010 server deployments.  The focus is to ensure that the environment is consistently configured, reliable, and performing optimally.  This is not an official, just something that I've been using for a while.

    Server Build
    - Confirm that hardware has been updated to the latest driver and firmware builds
    - Verify that the latest software builds have been installed, to include for Exchange, antivirus, monitoring agents, filterpacks, etc.
    - Operating System is running the latest build and has the recommended OS hotfixes


    Server Network Interfaces
    - Know if your environment explicitly denies IPv6 network traffic.  If so, then you may need to disable IPv6 on the NICs
    - NIC teaming is great for the MAPI/Public adapters - but should be configured to use Fault Tolerance (not automatic or load balance)
    - Network settings should be consistent on ALL servers, to include driver, TCP/IP Settings (i.e. DNS), and Binding order


    System Settings
    - Server's Page File should be moved off of the system partition
    - Server System Failure should be using Kernel Memory Dump
    - Proper file level antivirus exclusions should be configured - include for the file share witness, monitoring agents, cluster, IIS, and Exchange


    Active Directory
    - Verify that Active Directory has been properly configured (i.e. AD site links, no RODC, use 64-bit GS/DC running 2008 R2 is preferred, etc.)
    - AD Replication time should be optimally configured, documented, and confirmed that there are no replication errors occurring
    - All domain controllers are responsive (i.e. none are offline) and pass DCDIAG and other AD related tests
    - Subnets should be properly defined within the AD Site design


    Other Dependencies
    - Confirm that the hardware (server, storage, network, etc.) is working properly without any errors or warnings being generated
    - Network performance and reliability should be evaluated.  If network is slow or unreliable, users will feel that pain!
    - DNS should be reviewed for proper records and replication/configuration.  Remove any old records that may impact messaging.


    Client Access
    - All AD sites are defined within your AutoDiscoverSiteScope, including client-only sites
    - Enable Kerberos for the CAS Array
    - Enable logging on IIS and the CAS and track which clients are accessing your environment
    - Have recommended minimum client builds for your environment and know how to parse the logs to determine builds


    Transport
    - Confirm that EWS and OWA are properly configured to allow for your organization's message size limits
    - Verify that message limits are consistently configured (server, global, connectors, etc.)
    - Routing components should be evaluated and remove any unnecessary transport settings (ex: Accepted Domains, Connectors, etc.)


    Public Folders
    - If using dedicated PF servers, PF should be configured to replicate to all of those servers (min of 2 copies)
    - Does you Exchange aware antivirus software scan Public Folder replication messages? Should it?
    - To improve Public Folder access performance, remove deleted security objects from the client permissions


    Security
    - Should Administrator Audit Logging be enabled?
    - Windows Firewall should be enabled and properly configured to work with all applications installed on the server
    - Rarely should you modify the default RBAC groups.  Rather make new groups and manage the permissions thru that model
     

    Some other things...
    - Go thru the Exchange Best Practice Analyzer health check
    - Be sure to follow the Mailbox Storage Calculator - either provided by MSFT or by your storage vendor
    - Determine your requirements for custom Client Throttling Policies (ex: service accounts)
    - Have you set the External Post Master Address?


    Hope this helps!
    Doug

  • Exchange 2010 DAG - NetworkManager has not yet been initialized

     

    Recently, in two separate occasions, I had to assist in resolving an issue where a member of an Exchange 2010 database availability group (DAG) failed to participate in the DAG's Cluster Communications and therefore were unable to bring any database on those servers online.  In both instances, this occurred after the server was rebooted.  While each issue had a slightly different resolution, I am fairly confident that they are related.  And since it took awhile to isolate and resolve these issues, I'd thought I would share this experience regarding these issues.

    Before I begin, in neither scenario did we lose quorum of the DAG.  Also, the symptoms of both scenarios were nearly identical. 

     

    SYMPTOMS

    • Viewing these servers from Failover Cluster Manager show them with a STATUS of DOWN.
    • Network Connections for these members are listed as UNAVAILABLE
    • Cluster Services Starts on these servers, however the following event is logged in the Event’s System Log
      Log Name:      System
      Source:        Microsoft-Windows-FailoverClustering
      Event ID:      1572
      Task Category: Cluster Virtual Adapter
      Level:         Critical
      Description:  Node 'SERVER' failed to join the cluster because it could not send and receive failure detection network messages with other cluster nodes. Please run the Validate a Configuration wizard to ensure network settings. Also verify the Windows Firewall 'Failover Clusters' rules.
    • Attempt to view Exchange DAG status or network returns error:
      A server-side administrative operation has failed. 'GetDagNetworkConfig' failed on the server. Error: The NetworkManager has not yet been initialized. Check the event logs to determine the cause. [Server: SERVER5.Contoso.inc]
          + CategoryInfo          : NotSpecified: (0:Int32) [Get-DatabaseAvailabilityGroup], DagNetworkRpcServerException
          + FullyQualifiedErrorId : A6AA817A,Microsoft.Exchange.Management.SystemConfigurationTasks.GetDatabaseAvailabilityGroup
    • Cluster Log Shows:
      WARN  [API] s_ApiOpenGroupEx: Group Cluster Group failed, status = 70
      DBG   [HM] Connection attempt to SERVER01 failed with error WSAETIMEDOUT(10060): Failed to connect to remote endpoint 1.2.3.45:~3343~.
      INFO  [JPM] Node 7: Selected partition 33910(1 2 3 4 5 6 9 10 11 12 13 14) as a target for join
      WARN  [JPM] Node 7: No connection to node(s) (10 12). Cannot join yet
    • Cluster Validation Report shows:
      Node SERVER01.Contoso.inc is reachable from Node SERVER5.Contoso.inc by only one pair of interfaces. It is possible that this network path is a single point of failure for communication within the cluster. Please verify that this single path is highly available or consider adding additional networks to the cluster.
      The following are all pings attempted from network interfaces on node SERVER5.Contoso.inc to network interfaces on node SERVER05.Contoso.inc.
    • Network Trace was showing that cluster communication was in fact going thru to all other nodes on port 3343 and responses were returned. 
    • There was no change in errors even after disabling Windows Firewall and removing file level antivirus and security products from the servers.
    • Removing NIC Teaming from the server did not work


    RESOLUTION #1
    In this scenario, this occurred within our lab running on Hyper-V.  Based on hyper-V's network summary output, I could see that the servers really were not communicating properly.  Yes, they could ping and they could authenticate with the domain, but cluster communication was failing. 
    The resolution was to consistently configure the network settings on all DAG members & to reset the hyper-v network properties.  This meant:

    • Confirm that the networks were identically configured between all DAG node members (i.e. REPL / MAPI Networks, TCP/IP settings, Binding Order, Driver versions, etc)
    • Disabled IPv6 from the servers [NOTE: It is recommended to leave IPv6 enabled, even if you do not have an IPv6-enabled network!  In most scenarios, disabling IPv6 on an Exchange 2010 should be a last option.]
    • Once rebooted, all was working fine.
    • Edit the Hyper-V Network Properties Page for this VM


    RESOLUTION #2
    In this scenario, this occurred in production.  Ultimately we decided to change the IP address of the 'broken' DAG member and reboot the server again.  This allowed the server to properly register its network connections with the cluster DB (ClusDB) and all other nodes were able to talk properly.  This allowed the DAG member to rejoin the DAG and then all databases were able to mount and/or replicate their copy successfully. 

    We found that not all of the production DAG members were identically configured with their network settings (i.e. 2 DAG members did not have a REPL network configured).  Per http://technet.microsoft.com/en-us/library/dd638104.aspx#NR, "each DAG member must have the same number of networks".  We fixed the networks and updated the servers to include the recommended hotfixes - http://blogs.technet.com/b/dblanch/archive/2012/02/27/a-few-hotfixes-to-consider.aspx


     

    Questions/Answers
    Why did changing the IP address of the DAG member work?   Well, not exactly sure but we believe that this was either a stale TCP route or something in the CLUSDB was preventing any server with that IP address from joining the cluster.
    Did you reboot all of the DAG member server before or after changing the IP address?  No, we did not want to risk losing another server within the DAG (had already lost 2 of the 12 members).  We did, however, reboot all of the servers in the lab scenario.
    Did you ever lose quorum of the DAG? Nope.
    Do you think that you could have prevented this?  Maybe, if we had applied all of the hotfixes outlined here & confirmed all network settings were identical on all DAG members, then maybe servers might not have caused this issue.   There may be other things causing this, but it is always recommended to resolve the known issues first.


    Good Luck.
    Doug

  • Exchange Databases Failover Due to Low Memory


    Recently I had the (un)fortunate experience of troubleshooting an issue with Exchange 2010 DAG Database copies failing over to another servers.  This happened in several different environments that I was supporting, so I know it can happen to anyone.  Here is a short synopsis of this issue (not all symptoms are listed):

    • SCOM Alert: A significant portion of the database buffer cache has been written out to the system paging file.  This may result in severe performance degradation.
    • SCOM Alert: Hard I/O error will dismount or terminate replication on a database copy.
    • Event Log contained the following:  The database could not allocate memory. Please close some applications to make sure you have enough memory for Exchange Server. The exception is Microsoft.Exchange.Isam.IsamOutOfMemoryException: Out of Memory (-1011)
    • Per perfmon, the disk subsystem itself was not having performance issues.  However the server was clearly not able to sustain proper performance, leading us to believe that there was not enough memory within the system:
    • Process Explorer showed a number of processes with high memory consumption.  Upon further review, identified known memory issues with some of these processes.
    • Disabling applications and services (like antivirus, backup, monitoring, etc) did not significantly free up the consumed memory
    • Low memory issues would occur when importing content into mailboxes.  This may be from the additional work load required by Exchange aware antivirus (background scan) and content indexing for this new content.
    • Antivirus exclusions were not properly configured against the SCOM Monitoring services and File Share Witness directory


    Our Resolution:
    Step 1: Confirmed that all servers within the DAG were consistently and properly configured, specifically network configuration and antivirus exclusions
    Step 2: Installed latest drivers, firmware, and recommended hotfixes (2 fixes resolved memory leak issues)
    Step 3: Added additional RAM in the servers (amount may vary on environment/need)
    Step 4: Reboot the server


    NOTE: Prior to modifying the RAM of an Exchange 2010 server, understand how that will directly impact database cache. Review Understanding the Mailbox Database Cache.  Also understand that other factors may need to be adjusted (ex: paging file config).

    Some may ask, “didn’t you follow the RAM guidance within the mailbox storage calculator?”  Yes we did but there were several factors that changed after we completed that phase of the design, including mailbox configuration, additional processes running on the servers, and user profiles/load. 


    Good Luck!
    Doug

  • A Few Hotfixes to Consider…

     UPDATED: June 2013

    I decided to collect my own short list of hotfixes for that I have recommended for an Exchange 2010/Windows 2008 R2 environment. 
      NOTE: This does not include security hotfixes (WSUS) and is NOT an official list of hotfixes or build recommendation - this is my list of hotfixes that I have recommended for newly deployed Exchange environments as a means to prevent known issues from occurring.

    Exchange 2010 Hotfixes
      Exchange 2010 Service Pack 3 – http://support.microsoft.com/kb/2808208
      KB2803727 Update Rollup1 for Exchange Server 2010 Service Pack 3 - http://support.microsoft.com/kb/2803727


    Office 2010 Hotfixes
      Office 2010 Service Pack 1 - http://support.microsoft.com/kb/2460049
      KB2597090 Description of the Outlook 2010 update: February 2013 - http://support.microsoft.com/kb/2597090
      KB2553391 Description of the Outlook 2010 hotfix package: December 11, 2012 - http://support.microsoft.com/kb/2553391
      KB2832226 Office 2010 cumulative update for April 2013 - http://support.microsoft.com/kb/2832226

    Office 2013 Hotfixes 
      KB2738013 Description of the Outlook 2013 Update: April 2013 - http://support.microsoft.com/kb/2738013

    Office for Mac 2011 Hotfixes 
      KB2830450 Description of the Microsoft Office for Mac 2011 14.3.4 Update - http://support.microsoft.com/kb/2830450

    Base Windows 2008 R2 SP1 Hotfix
    KB 2775511 – An Enterprise Hotfix rollup for Windows Server 208 R2 SP1 – http://support.microsoft.com/kb/2775511


    Windows 2008 R2 Post-SP1 Hotfixes 
      KB2754704 http://support.microsoft.com/kb/2754704 -- MPIO
      KB2471472 http://support.microsoft.com/kb/2471472 -- NDIS
      KB2769369 http://support.microsoft.com/kb/2769369 -- WIN32SPL
      KB2494016 http://support.microsoft.com/kb/2494016 -- CSVFILTER  
      KB2578113 http://support.microsoft.com/kb/2578113 -- CLUSRES 
      KB2614892 http://support.microsoft.com/kb/2614892 -- MOUNTMGR  
      KB2536275 http://support.microsoft.com/kb/2536275 -- SYS  
      KB2619234 http://support.microsoft.com/kb/2619234 -- RPCHTTP
      KB2724197 http://support.microsoft.com/kb/2724197 -- Kernel   
      KB2779069 http://support.microsoft.com/kb/2779069 -- CLUSSVC 
      KB2699780 http://support.microsoft.com/kb/2699780 -- Registry 
    * Updated from previous hotfix notification release 


    Lync Server 2010 Hotfixes
      Updates Resource Center for Lync  - http://technet.microsoft.com/en-us/lync/gg131945

    NOTE: If you cannot download the hotfix from the KB, then first, determine if you really need that hotfix and if so, consider contacting MSFT support for that build or look for another, related hotfix with similar files and build.

    Have Fun...
    Doug