Doug's Think-Tank

An Eruption of Pondering...

Doug's Think-Tank

  • Perform Your Own IT Operational Assessment

    During my time as a Microsoft PFE, I contributed to numerous IT Operational Assessments.  While there are many tasks within an Operational Assessment, I wanted to provide a ‘simple version’ by reviewing past tickets/outages over a specified time period.  Of course, there are other, more formal reviews, such as Microsoft Operations Strategic Review, but this one is simple and anyone can do it. 

    Why should I care? If done properly, the results allow the (management, operations, engineering, etc) teams to better prepare their staff (ex: Training), anticipate problems (ex: Identifying underlying issues), and manage operations (ex: Improve processes).  As a PFE, I have used this to gain insight around:

    • Which IT Service (Product/Solution) is generating the most tickets and consuming the most man-hours to manage
    • Determine how well you align to the service level or operating level agreements (SLA & OLA)
    • If services are trending in terms of resiliency and availability
    • What areas can be improved to reduce time-to-resolution

    I recommend this as a quarterly report with an end-of-year review, but you can do it more/less frequently based on your needs and if you have the data available. 

    Now there are many ways to do such a review and I ALWAYS encourage engaging the experts in this area because they will often find things that you won't. Besides another set of eyes rarely hurts. But this review is great do-it-yourself starter kit (so to speak) and can often help you identify easily-resolved items which have a meaningful impact and provide justification to accomplish operational health tasks.  Of course this really only works well if the data that you are using is accurate and available.

    First, let me explain what this will NOT do:

    • Performance analysis of any kind
    • How well any specific application, product, or solution is performing in isolation (I prefer to look holistically)
    • Provide HR-related fodder if you are trying to build a case to hire/fire someone
    • Quantify Service availability numbers (i.e. did you achieve 99.999% availability)

    OK, let's get started...

    STEP 1: DEFINE YOUR REQUIREMENTS

    With any project, you should define your requirements, scope, and definitions to provide those core elements necessary for a comprehensive operational assessment. This may include the following: (see attachments for how I used them)

    • Intent of the document/OAR
    • Data Collection Frequency: Monthly/Quarterly
    • Report Generation Frequency: Quarterly & Yearly
    • Scope of Data Collection: Organization vs. specific Product/Solution
    • Service Management Categories: People, Process, Environment, Technology, Other, Unknown, etc.
    • Severity Levels: 1-Critical/High Impact, 2-Severe/Significant Impact, 3-Moderate/Impact, 4-Nominal/No Impact
    • Service Desk Common Resolution Classifications

    NOTE: These items will vary between each organization, so be sure to document what each means to you.

    STEP 2: DATA COLLECTION

    Typically I recommend collecting the data monthly as it provides a good timeline structure without overwhelming me with data, but each person or environment may have their own preference. Start by collecting all trouble tickets, incidents, change/work requests, unscheduled/scheduled maintenance notifications, etc. generated during the time specified. For each item collected, document the following types of data:

    • Highest Severity Level
    • Current Status (ex: Open-OnHold, Open-Active, Closed-Unresolved, Closed-Resolved, etc.)
    • Impacted Technology (ex: Exchange, Active Directory, SQL Server, etc.)
    • Impacted Services (ex: Messaging, Directory Services, Database Services, etc.)
    • Average time (in hours) to acknowledge/react, resolve, & closure (ex: Ack:1hr, Res:.5hr, CL:1hr)
    • Categorize the item based on solution/root cause
    • Resources Used (ex: 2 Teams / 2 Staff )
    • Scheduled / Expected: Y/N (only applies to approved changes and project implementations)

    NOTE: Again, these will vary within your organization and you might include more/less information. For example, some may include Uptime/Downtime, Perf Metrics, Storage Consumption based on department/office/technology, etc. Just don't get garbage data that might 'fudge' the numbers or get lost in mounds of too much data.

    STEP 3: INPUT DATA AND PERFORM SUBJECTIVE ANALYSIS

    Input the data into the spreadsheet (see attached) and then apply some subjective decisions on the information. For example, the Service Management Category might mean one thing to 1 person and something else to another. Just try to stay consistent and broad. Try not to get too narrow or restrictive, otherwise you'll have 50-100 different paths to choose from.

    STEP 4: GENERATE A REPORT BASED ON THE DATA

    Consolidate the data into a single spreadsheet and report and provide an analysis of your findings. Attached is a sample report. The key is to not be too subjective, try to keep to the facts. However, when you need to be subjective, try to maintain consistency.

    I hope this helps!  Good luck!

    Da

  • Active Directory Permissions and PowerShell

    So what about Active Directory Permissions on an Object using PowerShell?  There are a number of options and methods to manage Active Directory permissions, but here are some common tasks that I might perform using PowerShell.

    NOTE: This blog uses PowerShell with the Active Directory Module (Import-Module ActiveDirectory)
    To use Get-ACL, you may want to set the location to Active Directory ( Set-Location AD: ), otherwise you may have to call AD: within the command.


    FIND IF USER ACCOUNT HAS ANY DENY PERMISSIONS SET
    Using DSACLS:
    Get-ADUser UserName | ForEach { DSACLS $_.DistinguishedName } | Where {$_.Contains("Deny")}

    Using Get-ACL:
    Set-Location AD:
    (Get-Acl (Get-ADUser UserName)).access | Where {$_.AccessControlType -eq 'Deny'} | FT IdentityReference, AccessControlType, IsInherited -
    Autosize



    FIND ALL USERS WHO HAVE NON-INHERITED DENY RIGHTS ASSIGNED
    Get-ADUser -Filter * | ForEach {$X = $_.Name ; (Get-ACL $_.DistinguishedName).Access | Where {($_.AccessControlType -eq 'Deny') -AND ($_.IsInherited -eq $FALSE)}| Select {$X}, IdentityReference, AccessControlType, IsInherited}



    FIND ALL USERS WHO HAVE NON-INHERITED DENY WRITEPROPERTY SET
    Get-ADUser -Filter * | ForEach {$X = $_.Name ; (Get-ACL $_.DistinguishedName).Access | Where {($_.AccessControlType -eq 'Deny') -AND ($_.IsInherited -eq $FALSE) -AND ($_.ActiveDirectoryRights -eq "WriteProperty")}| Select {$X}, IdentityReference, AccessControlType, IsInherited}



    FIND ALL USERS WHO HAVE SPECIFIC GROUP/USER LISTED WITH PERMISSIONS
    Get-ADUser -Filter * | ForEach {$X = $_.Name ; (Get-ACL $_.DistinguishedName).Access | Where {$_.IdentityReference -like "DOMAIN\USERNAME"}| Select {$X}, IdentityReference, AccessControlType, IsInherited -Unique}



    VIEW PERMISSIONS OF NON-INHERITED USERS ON SPECIFIC ORGANIZATIONAL UNIT (OU)
    (Get-ACL "AD:CN=Joe User,OU=Users,DC=Contoso,DC=com").Access | Where {$_.IsInherited -eq $FALSE}| Select IdentityReference, AccessControlType, IsInherited


     
    VIEW ACCESS RIGHTS ON GROUP OBJECT
    (Get-ACL (Get-ADGroup GroupName)).Access


    RESTRICT GROUPX USERS FROM MODIFYING AD ATTRIBUTE ON ALL USERS
    Get-ADUser –Filter * | ForEach { DSACLS $_.DistinguishedName /D 'Contoso\GroupX:WP;employeeID'}


    There are many other items that you can do with Active Directory permissions but I’d thought that I would start with the above items.  If you want something more, try another blog Smile

    Thanks!

    Da

  • Managing Exchange Public Folder Permissions

    Over the years, there has been a request for finding various permissions on Public Folder objects within Exchange.  I figured that I would share how to do some of these tasks, specific to Exchange 2010 and 2013.

    NOTE: The following commands use the Exchange Management Shell

    Exchange 2010
    List All Top Level Public Folders Default Permissions
    Get-PublicFolder \ -GetChildren | Get-PublicFolderClientPermission | Where {$_.User.IsDefault -eq $True} | FT Identity, User, AccessRights -auto -wrap

    List All Top Level Public Folders Anonymous Permissions
    Get-PublicFolder \ -GetChildren | Get-PublicFolderClientPermission | ?{$_.User.IsAnonymous -eq $True} | FT Identity, User, AccessRights -auto -wrap

    List All Public Folders Where Anonymous is set to Owner
    Get-PublicFolder \ -Recurse | Get-PublicFolderClientPermission | ?{($_.User.IsAnonymous -eq $True) -AND ($_.AccessRights -eq 'Owner')} | FT Identity, User, AccessRights -auto -wrap

    List All Public Folders Where Default is NOT Author
    Get-PublicFolder \ -Recurse | Get-PublicFolderClientPermission | ?{($_.User.IsDefault -eq $True) -AND ($_.AccessRights -ne 'Author')} | FT Identity, User, AccessRights -auto -wrap

    List All Public Folders Where JoeUser is set to Owner
    Get-PublicFolder \ -Recurse | Get-PublicFolderClientPermission | ?{($_.User -like "*JoeUser*") -AND ($_.AccessRights -eq 'Owner')} | FT Identity, User, AccessRights -auto -wrap

    List All Public Folders Containing Old/Deleted Users with Permissions
    Get-PublicFolder \ -Recurse | Get-PublicFolderClientPermission | ?{$_.User -like "*NT User:*"} | FT Identity, User, AccessRights -auto -wrap

    Remove Old/Deleted Users from Public Folders (w/ WhatIf)
    Get-PublicFolder \ -Recurse | Get-PublicFolderClientPermission | ?{$_.User -like "*NT User:*"} | ForEach {Remove-PublicFolderClientPermission -Identity $_.Identity -User $_.User -AccessRights $_.AccessRights -WhatIf

     

    Modify/Add JoeUser to be an Owner of a Folder
    Add-PublicFolderClientPermission -Identity "\MyPublicFolder\Reports" -User JoeUser -AccessRights Owner

    Exchange 2013
    List All Top Level Public Folders Default Permissions
    Get-PublicFolder \ -GetChildren | Get-PublicFolderClientPermission | Where {$_.User.UserType -eq 'Default'} | FT Identity, User, AccessRights -auto -wrap

    List All Top Level Public Folders Anonymous Permissions
    Get-PublicFolder \ -GetChildren | Get-PublicFolderClientPermission | ?{$_.User.UserType -eq 'Anonymous'} | FT Identity, User, AccessRights -auto -wrap

    List All Public Folders Where Anonymous is set to Owner
    Get-PublicFolder \ -Recurse | Get-PublicFolderClientPermission | ? {($_.User.UserType -eq 'Anonymous') -AND ($_.AccessRights -eq 'Owner')} | FT Identity, User, AccessRights -auto -wrap

    List All Public Folders Where Default is NOT Author
    Get-PublicFolder \ -Recurse | Get-PublicFolderClientPermission | ?{($_.User.UserType -eq 'Default') -AND ($_.AccessRights -ne 'Author')} | FT Identity, User, AccessRights -auto -wrap

    List All Public Folders Where JoeUser is set to Owner
    Get-PublicFolder \ -Recurse | Get-PublicFolderClientPermission | ?{($_.User -like "*JoeUser*") -AND ($_.AccessRights -eq 'Owner')} | FT Identity, User, AccessRights -auto -wrap

    List All Public Folders Containing Old/Deleted Users with Permissions
    Get-PublicFolder \ -Recurse | Get-PublicFolderClientPermission | ?{$_.User.UserType -like "Unknown"} | FT Identity, User, AccessRights -auto -wrap

    Remove Old/Deleted Users from Public Folders (w/ WhatIf)
    Get-PublicFolder \ -Recurse | Get-PublicFolderClientPermission | ?{$_.User.UserType -like "Unknown"} | ForEach {Remove-PublicFolderClientPermission -Identity $_.Identity -User $_.User -AccessRights $_.AccessRights -WhatIf}


     

    Modify JoeUser to be an Owner of a Folder
    Add-PublicFolderClientPermission -Identity "\MyPublicFolder\Reports" -User JoeUser -AccessRights Owner

    More information on managing Public Folders can be found on TechNet for Exchange.

    Good Luck

    Da

  • 8dot3 and the Exchange 2010 SP3 LAG Copy

    Recently I deployed Exchange 2010 SP3 and experienced a few headaches when it came to the LAG copy.  It turns out that this was probably related to the known issue with Exchange 2010 SP3 and 8dot3Name, even though we never activated these copies and with no transaction logs on the active copy containing any 8.3 names (weird!). 

    My configuration:

    • Exchange 2010 SP2 RU4 on Windows 2008 R2
    • Circular Logging enabled (backup-less environment)
    • LAG copy containing more than 2 weeks of transaction logs

    First, what is 8.3 name and why was it enabled?  This is a legacy naming convention, from the old MS-DOS days.  By default, our Windows 2008 R2 build does not have 8.3 enabled.  However, we discovered that a GPO had been set which overrode that value as a requirement for a government compliance program.  It turns out that lots of older government compliance programs required this to be enabled. 

    An example of files with and without 8.3 naming convention:  (dir /x)

    clip_image001

    How did we discover was enabled?  In our testing, we had a number of LAG database copies go to a FailedandSuspended state.  Our troubleshooting led us to the known issue listed above and we confirmed using the FSUTIL command and DIR /X

    What problems did we experience? If we upgraded a server from SP2 to SP3 that contained either the LAG copy or was the owner of the Active copy and if the LAG copy contained at least 1 log that had an 8.3 naming convention, then intermittently the LAG copy would go to a failedandsuspended state.  Not all copies failed all of the time.  Nor did we have to activate the DB for it to fail – it just did it whenever the server was rebooted.

    What did we do to fix it? We knew that if the LAG copy contained any transaction logs with 8.3 naming convention, the DB would fail.  So we made a change to the server using FSUTIL (FSUTIL 8dot3Name Set 1).  It took us a day later to discover that the setting reverted, thus leading us to an old GPO entry.  After changing the GPO and forcing the update to occur, we could see that newly created transaction logs were not getting 8.3 names. 

    Next, we wait for all database copies to get cycle thru the old logs (those containing the 8.3 names) before making any server reboots or significant changes.  We could fail a database over to another copy, this did not do anything. 

    We verified that all database copies contained no transaction logs containing these 8.3 files names by running a PowerShell command per server:

    $GetDatabase = Get-MailboxDatabase -Server $Env:Computername
    foreach ($DB in $GetDatabase){$LogPath = "$($DB.logfolderpath)"+"\*~1.log" ; If((cmd /c dir $LogPath) -ge 1){write-host $DB.Name " - 8dot3 Log Files Found" -ForegroundColor Yellow} Else{Write-Host $DB.Name}}

     

    Basically, before you upgrade to SP3, check that your server does not have 8.3 naming convention enabled (FSUTIL 8dot3Name Query).  If so, set that to disabled and cycle through all your transaction logs before deployment. 

    As you can see, you don’t have to actually activate the DB for this to cause issues. 

    Good Luck!

    D

  • Lync Control Panel–401.1 Unauthorized

    Recently I had to install Lync Server 2010 on a repurposed Windows 2008 R2 server within a lab environment. One of the issues that I ran across was preventing me from accessing the Lync Control Panel from the Lync Front End server.

    Attempting to open the Lync Control Panel from the Lync 2010 Front End server displayed the following error:

    clip_image002

    I started by confirming that all of the prerequisites were installed, followed TechNet: Troubleshooting Lync Server 2010 Control Panel, and confirmed the Kerberos was configured and working properly. None of these changed the error.

    In the end, the resolution was to enable the DisableLoopBackCheck registry key.

    WARNING: This setting should be carefully considered as this can change the security of a server. This scenario was a lab environment with no Internet access. For production environments, you may want to consider using the BackConnectionHostNames registry.

    Good Luck!
    Doug

  • Public Folders – Folders Assistant Not Forwarding Messages

    A recent issue occurred where users were trying to create Public Folders Folder Assistant that would forward the post/inbound message to another Public Folder.  The users were able to configure this, however the message was never forwarded.  This is a short synopsis of how we resolved this problem.

    PREREQUISITES

    • Public Folders – both were mail enabled and visible in the GAL
    • Exchange 2010 SP2
    • Outlook 2010 SP1

     

    CONFIGURATION
    Within Outlook, open the Public Folder properties and select Folder Assistant (General Tab).
    image

    Add a rule:
    image

    Set that rule to forward to another Public Folder
    image

    NOTE: All other settings were optional, and did not impact this issue.

    SYMPTOMS
    • Get-PublicFolder shows that this folder does have HaveRules set to True
    • Changing the delivery to a mailbox, rather than a Public Folder, works just fine
    • Event Viewer’s Application Log contained the following event:

    Source: MSExchangeIS Public Store
    Event ID: 2028
    Task Category: Transport Delivering
    Level: Error
    Description: The delivery of a message sent by public folder 0000013456F2 has failed.
    To: MyPublicFolder
    The non-delivery report has been deleted.

    RESOLUTION
    As you can, the above event is related to Transport Delivering, from the MSExchangeIS Public Store.  Since delivery worked to a mailbox, but not the 2nd Public Folder, this led me to think permissions. 
    After modifying the Anonymous Client Permissions to include CREATE ITEMS, the Folder Assistant worked just fine.  By default, Anonymous only had Folder Visible, not Create Items.


    I don’t know how many people will ever run into this, but I figured this post may help reduce troubleshooting time for others.
    Good Luck!
    Doug

  • A Few Recommendations for Exchange 2010


    The following is a partial list of items that I recommend be reviewed for all Exchange 2010 server deployments.  The focus is to ensure that the environment is consistently configured, reliable, and performing optimally.  This is not an official, just something that I've been using for a while.

    Server Build
    - Confirm that hardware has been updated to the latest driver and firmware builds
    - Verify that the latest software builds have been installed, to include for Exchange, antivirus, monitoring agents, filterpacks, etc.
    - Operating System is running the latest build and has the recommended OS hotfixes


    Server Network Interfaces
    - Know if your environment explicitly denies IPv6 network traffic.  If so, then you may need to disable IPv6 on the NICs
    - NIC teaming is great for the MAPI/Public adapters - but should be configured to use Fault Tolerance (not automatic or load balance)
    - Network settings should be consistent on ALL servers, to include driver, TCP/IP Settings (i.e. DNS), and Binding order


    System Settings
    - Server's Page File should be moved off of the system partition
    - Server System Failure should be using Kernel Memory Dump
    - Proper file level antivirus exclusions should be configured - include for the file share witness, monitoring agents, cluster, IIS, and Exchange


    Active Directory
    - Verify that Active Directory has been properly configured (i.e. AD site links, no RODC, use 64-bit GS/DC running 2008 R2 is preferred, etc.)
    - AD Replication time should be optimally configured, documented, and confirmed that there are no replication errors occurring
    - All domain controllers are responsive (i.e. none are offline) and pass DCDIAG and other AD related tests
    - Subnets should be properly defined within the AD Site design


    Other Dependencies
    - Confirm that the hardware (server, storage, network, etc.) is working properly without any errors or warnings being generated
    - Network performance and reliability should be evaluated.  If network is slow or unreliable, users will feel that pain!
    - DNS should be reviewed for proper records and replication/configuration.  Remove any old records that may impact messaging.


    Client Access
    - All AD sites are defined within your AutoDiscoverSiteScope, including client-only sites
    - Enable Kerberos for the CAS Array
    - Enable logging on IIS and the CAS and track which clients are accessing your environment
    - Have recommended minimum client builds for your environment and know how to parse the logs to determine builds


    Transport
    - Confirm that EWS and OWA are properly configured to allow for your organization's message size limits
    - Verify that message limits are consistently configured (server, global, connectors, etc.)
    - Routing components should be evaluated and remove any unnecessary transport settings (ex: Accepted Domains, Connectors, etc.)


    Public Folders
    - If using dedicated PF servers, PF should be configured to replicate to all of those servers (min of 2 copies)
    - Does you Exchange aware antivirus software scan Public Folder replication messages? Should it?
    - To improve Public Folder access performance, remove deleted security objects from the client permissions


    Security
    - Should Administrator Audit Logging be enabled?
    - Windows Firewall should be enabled and properly configured to work with all applications installed on the server
    - Rarely should you modify the default RBAC groups.  Rather make new groups and manage the permissions thru that model
     

    Some other things...
    - Go thru the Exchange Best Practice Analyzer health check
    - Be sure to follow the Mailbox Storage Calculator - either provided by MSFT or by your storage vendor
    - Determine your requirements for custom Client Throttling Policies (ex: service accounts)
    - Have you set the External Post Master Address?


    Hope this helps!
    Doug

  • Exchange 2010 DAG - NetworkManager has not yet been initialized

     

    Recently, in two separate occasions, I had to assist in resolving an issue where a member of an Exchange 2010 database availability group (DAG) failed to participate in the DAG's Cluster Communications and therefore were unable to bring any database on those servers online.  In both instances, this occurred after the server was rebooted.  While each issue had a slightly different resolution, I am fairly confident that they are related.  And since it took awhile to isolate and resolve these issues, I'd thought I would share this experience regarding these issues.

    Before I begin, in neither scenario did we lose quorum of the DAG.  Also, the symptoms of both scenarios were nearly identical. 

     

    SYMPTOMS

    • Viewing these servers from Failover Cluster Manager show them with a STATUS of DOWN.
    • Network Connections for these members are listed as UNAVAILABLE
    • Cluster Services Starts on these servers, however the following event is logged in the Event’s System Log
      Log Name:      System
      Source:        Microsoft-Windows-FailoverClustering
      Event ID:      1572
      Task Category: Cluster Virtual Adapter
      Level:         Critical
      Description:  Node 'SERVER' failed to join the cluster because it could not send and receive failure detection network messages with other cluster nodes. Please run the Validate a Configuration wizard to ensure network settings. Also verify the Windows Firewall 'Failover Clusters' rules.
    • Attempt to view Exchange DAG status or network returns error:
      A server-side administrative operation has failed. 'GetDagNetworkConfig' failed on the server. Error: The NetworkManager has not yet been initialized. Check the event logs to determine the cause. [Server: SERVER5.Contoso.inc]
          + CategoryInfo          : NotSpecified: (0:Int32) [Get-DatabaseAvailabilityGroup], DagNetworkRpcServerException
          + FullyQualifiedErrorId : A6AA817A,Microsoft.Exchange.Management.SystemConfigurationTasks.GetDatabaseAvailabilityGroup
    • Cluster Log Shows:
      WARN  [API] s_ApiOpenGroupEx: Group Cluster Group failed, status = 70
      DBG   [HM] Connection attempt to SERVER01 failed with error WSAETIMEDOUT(10060): Failed to connect to remote endpoint 1.2.3.45:~3343~.
      INFO  [JPM] Node 7: Selected partition 33910(1 2 3 4 5 6 9 10 11 12 13 14) as a target for join
      WARN  [JPM] Node 7: No connection to node(s) (10 12). Cannot join yet
    • Cluster Validation Report shows:
      Node SERVER01.Contoso.inc is reachable from Node SERVER5.Contoso.inc by only one pair of interfaces. It is possible that this network path is a single point of failure for communication within the cluster. Please verify that this single path is highly available or consider adding additional networks to the cluster.
      The following are all pings attempted from network interfaces on node SERVER5.Contoso.inc to network interfaces on node SERVER05.Contoso.inc.
    • Network Trace was showing that cluster communication was in fact going thru to all other nodes on port 3343 and responses were returned. 
    • There was no change in errors even after disabling Windows Firewall and removing file level antivirus and security products from the servers.
    • Removing NIC Teaming from the server did not work


    RESOLUTION #1
    In this scenario, this occurred within our lab running on Hyper-V.  Based on hyper-V's network summary output, I could see that the servers really were not communicating properly.  Yes, they could ping and they could authenticate with the domain, but cluster communication was failing. 
    The resolution was to consistently configure the network settings on all DAG members & to reset the hyper-v network properties.  This meant:

    • Confirm that the networks were identically configured between all DAG node members (i.e. REPL / MAPI Networks, TCP/IP settings, Binding Order, Driver versions, etc)
    • Disabled IPv6 from the servers [NOTE: It is recommended to leave IPv6 enabled, even if you do not have an IPv6-enabled network!  In most scenarios, disabling IPv6 on an Exchange 2010 should be a last option.]
    • Once rebooted, all was working fine.
    • Edit the Hyper-V Network Properties Page for this VM


    RESOLUTION #2
    In this scenario, this occurred in production.  Ultimately we decided to change the IP address of the 'broken' DAG member and reboot the server again.  This allowed the server to properly register its network connections with the cluster DB (ClusDB) and all other nodes were able to talk properly.  This allowed the DAG member to rejoin the DAG and then all databases were able to mount and/or replicate their copy successfully. 

    We found that not all of the production DAG members were identically configured with their network settings (i.e. 2 DAG members did not have a REPL network configured).  Per http://technet.microsoft.com/en-us/library/dd638104.aspx#NR, "each DAG member must have the same number of networks".  We fixed the networks and updated the servers to include the recommended hotfixes - http://blogs.technet.com/b/dblanch/archive/2012/02/27/a-few-hotfixes-to-consider.aspx


     

    Questions/Answers
    Why did changing the IP address of the DAG member work?   Well, not exactly sure but we believe that this was either a stale TCP route or something in the CLUSDB was preventing any server with that IP address from joining the cluster.
    Did you reboot all of the DAG member server before or after changing the IP address?  No, we did not want to risk losing another server within the DAG (had already lost 2 of the 12 members).  We did, however, reboot all of the servers in the lab scenario.
    Did you ever lose quorum of the DAG? Nope.
    Do you think that you could have prevented this?  Maybe, if we had applied all of the hotfixes outlined here & confirmed all network settings were identical on all DAG members, then maybe servers might not have caused this issue.   There may be other things causing this, but it is always recommended to resolve the known issues first.


    Good Luck.
    Doug

  • Exchange Databases Failover Due to Low Memory


    Recently I had the (un)fortunate experience of troubleshooting an issue with Exchange 2010 DAG Database copies failing over to another servers.  This happened in several different environments that I was supporting, so I know it can happen to anyone.  Here is a short synopsis of this issue (not all symptoms are listed):

    • SCOM Alert: A significant portion of the database buffer cache has been written out to the system paging file.  This may result in severe performance degradation.
    • SCOM Alert: Hard I/O error will dismount or terminate replication on a database copy.
    • Event Log contained the following:  The database could not allocate memory. Please close some applications to make sure you have enough memory for Exchange Server. The exception is Microsoft.Exchange.Isam.IsamOutOfMemoryException: Out of Memory (-1011)
    • Per perfmon, the disk subsystem itself was not having performance issues.  However the server was clearly not able to sustain proper performance, leading us to believe that there was not enough memory within the system:
    • Process Explorer showed a number of processes with high memory consumption.  Upon further review, identified known memory issues with some of these processes.
    • Disabling applications and services (like antivirus, backup, monitoring, etc) did not significantly free up the consumed memory
    • Low memory issues would occur when importing content into mailboxes.  This may be from the additional work load required by Exchange aware antivirus (background scan) and content indexing for this new content.
    • Antivirus exclusions were not properly configured against the SCOM Monitoring services and File Share Witness directory


    Our Resolution:
    Step 1: Confirmed that all servers within the DAG were consistently and properly configured, specifically network configuration and antivirus exclusions
    Step 2: Installed latest drivers, firmware, and recommended hotfixes (2 fixes resolved memory leak issues)
    Step 3: Added additional RAM in the servers (amount may vary on environment/need)
    Step 4: Reboot the server


    NOTE: Prior to modifying the RAM of an Exchange 2010 server, understand how that will directly impact database cache. Review Understanding the Mailbox Database Cache.  Also understand that other factors may need to be adjusted (ex: paging file config).

    Some may ask, “didn’t you follow the RAM guidance within the mailbox storage calculator?”  Yes we did but there were several factors that changed after we completed that phase of the design, including mailbox configuration, additional processes running on the servers, and user profiles/load. 


    Good Luck!
    Doug

  • A Few Hotfixes to Consider…

     UPDATED: June 2013

    I decided to collect my own short list of hotfixes for that I have recommended for an Exchange 2010/Windows 2008 R2 environment. 
      NOTE: This does not include security hotfixes (WSUS) and is NOT an official list of hotfixes or build recommendation - this is my list of hotfixes that I have recommended for newly deployed Exchange environments as a means to prevent known issues from occurring.

    Exchange 2010 Hotfixes
      Exchange 2010 Service Pack 3 – http://support.microsoft.com/kb/2808208
      KB2803727 Update Rollup1 for Exchange Server 2010 Service Pack 3 - http://support.microsoft.com/kb/2803727


    Office 2010 Hotfixes
      Office 2010 Service Pack 1 - http://support.microsoft.com/kb/2460049
      KB2597090 Description of the Outlook 2010 update: February 2013 - http://support.microsoft.com/kb/2597090
      KB2553391 Description of the Outlook 2010 hotfix package: December 11, 2012 - http://support.microsoft.com/kb/2553391
      KB2832226 Office 2010 cumulative update for April 2013 - http://support.microsoft.com/kb/2832226

    Office 2013 Hotfixes 
      KB2738013 Description of the Outlook 2013 Update: April 2013 - http://support.microsoft.com/kb/2738013

    Office for Mac 2011 Hotfixes 
      KB2830450 Description of the Microsoft Office for Mac 2011 14.3.4 Update - http://support.microsoft.com/kb/2830450

    Base Windows 2008 R2 SP1 Hotfix
    KB 2775511 – An Enterprise Hotfix rollup for Windows Server 208 R2 SP1 – http://support.microsoft.com/kb/2775511


    Windows 2008 R2 Post-SP1 Hotfixes 
      KB2754704 http://support.microsoft.com/kb/2754704 -- MPIO
      KB2471472 http://support.microsoft.com/kb/2471472 -- NDIS
      KB2769369 http://support.microsoft.com/kb/2769369 -- WIN32SPL
      KB2494016 http://support.microsoft.com/kb/2494016 -- CSVFILTER  
      KB2578113 http://support.microsoft.com/kb/2578113 -- CLUSRES 
      KB2614892 http://support.microsoft.com/kb/2614892 -- MOUNTMGR  
      KB2536275 http://support.microsoft.com/kb/2536275 -- SYS  
      KB2619234 http://support.microsoft.com/kb/2619234 -- RPCHTTP
      KB2724197 http://support.microsoft.com/kb/2724197 -- Kernel   
      KB2779069 http://support.microsoft.com/kb/2779069 -- CLUSSVC 
      KB2699780 http://support.microsoft.com/kb/2699780 -- Registry 
    * Updated from previous hotfix notification release 


    Lync Server 2010 Hotfixes
      Updates Resource Center for Lync  - http://technet.microsoft.com/en-us/lync/gg131945

    NOTE: If you cannot download the hotfix from the KB, then first, determine if you really need that hotfix and if so, consider contacting MSFT support for that build or look for another, related hotfix with similar files and build.

    Have Fun...
    Doug

  • Exchange 2007 Server Data Collection Script

    Recently I had experienced an issue where an Exchange 2007 server was having some ‘issues’.  However before I was engaged to assist in resolving the issue, the admin rebooted the server and the issue went away…but only for a brief period.  This led me to create a script to collect data from an Exchange 2007 server running Windows 2008 SP2.  This is that script.

    The script uses the following tools:

     

    Here are some of the tasks the script performs:

    • Determine Exchange 2007 Role (Hub, CAS, and Mailbox are supported)
    • Capture Performance Monitor, based on role
    • MPSReports (requires user interaction)
    • DCDiag against servers in local AD Site
    • Collect a 3 minute network trace using Network Monitor
    • NLTEST
    • Capture TCP Connections using TCPView
    • DNSLint
    • Run Exchange Best Practice Analyzer (EXBPA) for this server
    • Run various Exchange cmdlets based on server role
    • Collect Cluster Log (if clustered)
    • Collect minidumps using ProcDump of several common processes (if running)

     

    Some user interaction may be required (specifically MPSReports) and the data should all be copied to the current working directory.  To run this script, you must:

    1. Run the PS1 file within Exchange management shell (Run as Administrator)
    2. Have all of the tools within the directory that you want to run the script from
    3. Install Network Monitor
    4. Perfmon XML templates (the attached zip file has a few examples)

     

    Here is picture of the directory with the tools required:

    clip_image002

     

    Attached is the script & XML perfmon examples (right click and save target as for proper formatting)

     

    Enjoy!

    Doug

  • Preparing for the Microsoft Interview

    Occasionally I am asked what it takes for someone to become 'ready' for an interview with Microsoft.  When it comes to a technical position, I often give the same advise.  So I thought that I would share that advise with the world.  The following are my personal recommendations for preparing for the interview process:
      NOTE: These recommendations can be applied across a wide range of technologies/products

    1. Understand the position

    • Search for the position on the Careers website and read the description.
    • If you can, talk with others who work in or with that position and find out the day-to-day tasks and what they did to prepare for their interview.
    • Try to understand what is expected of the position and prepare a few questions that you can ask during the interview process.

     

    2. Assuming that this is a technical position, there is a list of things that you should know with the technology/product, no matter what the technology/product is.  Thing that include:

    • Know how to deploy and configure the technology. 
    • Know the dependencies required to deploy the technology.  Be familiar with how to deploy those dependencies if asked.
    • Be comfortable with performing a disaster recovery of the technology and associated data.
    • Know how to troubleshoot and isolate a problems that may happen within the product.
    • Know how to validate whether or not the dependencies are working properly. How would you go about confirming if they are working properly?
    • Find and review the technology/product's team blogs.  At a minimum, know the topics that are discussed and how to access the site.
    • Identify the top tools that can help you administrate or troubleshoot the technology/product.  Become comfortable talking about their purpose and when to use each.
    • Know what steps to take if the performance of the technology/product is not up to par.
    • Know the difference between Site resiliency, High availability, and Redundancy and how you would deploy each within your technology/product.
    • Know several differences between product versions.  Be prepared to answer the question of "why should I upgrade?".

     

    3. Here are some not-so-technical considerations:

    • Be familiar with some best practices when deploying the product.  Either the specific best practices themselves or where you would go to locate those best practices.
    • What lessons have you learned when operating the product and what would you differently? Does that align with the industry's best practices?
    • Know your strengths and know your weaknesses.  Know where to locate information for when you are asked a question that you cannot answer.
    • Know your resources.  This includes websites, papers, books, blogs, tools, and people.
    • Be confident in your skills and answers.
    • Take and use feedback to improve yourself. 


    FAQ:
    How will I know if I am ready for the interview?
      One common answer that I give is that if you can answer the questions that are posted on forums, then you are probably ready for the next step.

     What path should I take to improve my IT skills?  Everyone takes a different path.  But what you should focus on is improving everyday, such as learning something new as well as improving upon how you do something now.

     Where should I start? The product's TechNet Library and team blogs are good places to start.  Most products will allow you to download and install a trial version of the software.  Setup a virtual lab and 'play' with the software.

     Must I have Microsoft Certifications to interview? No, but it can help you communicate your thoughts better by knowing the terminology and baseline product.


    REFERENCES:
     Microsoft Careers Website
     Blog: How Not to Interview
     Blog: How to Interview like a pro
     Blog: Interview Advice Part 2
     Blog: Myths about Working at Microsoft
     Blog: Post-Graduate Active Directory Studies
     Blog: SQL Interview Q&A
     Blog: SharePoint Interview Questions

  • There's something about 2007 Dynamic Distribution Groups

    Recently I ran into some issues with creating dynamic distribution lists or groups (DDL or DDG) in an Exchange 2007 environment. Let me share some things that I uncovered:

      • Outlook users will not be able to see the membership of the DL using the GAL
      • To view the membership, see http://technet.microsoft.com/en-us/library/bb232019(EXCHG.80).aspx
      • Per http://technet.microsoft.com/en-us/library/aa996561(EXCHG.80).aspx, "a dynamic distribution group includes any recipient in the Active Directory directory service with attributes that match its filter"
      • While the preview may show all users, mail sent to the DL may not include all of those users.  This will happen if you do not use the –RecipientContainer option when you initially create the DDL.  By default, when Transport expands the DDL for mail delivery, it will only deliver to the recipients that are within the same OU with the DDL.  To include recipients outside of that OU, you MUST use the –RecipientContainer option.
      • In our scenario, attempting to modify the –RecipientContainer option after the DDL was already created, using the Set-DynamicDistributionGroup cmdlet, failed to update and change the mail flow behavior. So I would recommend using this option when you first create the dynamic group.
      • If using a custom recipient filter using the Exchange Management Shell, do not place " " around the filter itself and a space must be placed before and after the { } brackets.

    Here are some examples of creating a dynamic distribution group:

    New-DynamicDistributionGroup -Name "All OfficeX Members" -RecipientFilter { (((RecipientType -eq 'UserMailbox') -and (Office -eq 'OfficeX'))) } -OrganizationalUnit "DOMAIN.COM/Distribution Lists" -RecipientContainer "DOMAIN.COM"

    New-DynamicDistributionGroup -Name "Accouting Team" -RecipientFilter { (((RecipientType -eq 'UserMailbox') -and (Company -eq 'X') -and (Department -eq 'Accounting'))) } -OrganizationalUnit "DOMAIN.COM/Distribution Lists" -RecipientContainer "DOMAIN.COM"

    New-DynamicDistributionGroup -Name "All non-HQ Managers" -RecipientFilter { ((((RecipientType -eq 'UserMailbox') -and (Title -like '*Manager'))) -and (-not(Office -like 'HQ'))) } -OrganizationalUnit "DOMAIN.COM/Distribution Lists" -RecipientContainer "DOMAIN.COM"

    Once these groups were created, I would consider changing the following, depending on the group’s requirements and restrictions:
    - Managed By
    - Additional Email Addresses
    - Hidden from GAL or not
    - To send or not send delivery reports
    - To send or not send OOF
    - Message size or message delivery restrictions

    Good luck!

    Doug

  • Exchange Daily Messaging Report

    I was tasked with creating a daily messaging report that would meet the following requirements:

    • Collect essential information about the Exchange 2007 environment
    • Save the data in HTML format
    • Email that report to users

    So I created a powershell script that would collect the following information:

    • Mailbox Stats (total #, over quota, over 5GB, etc)
    • Message Stats (number of msg sent/rcd, NDRs, total size, etc)
    • Identify servers who have volumes with less than 20% of free space available
    • Database Information (last full backup, is mounted?, # of mbx, size, etc)
    • Identify if any SMTP queues have more than 50 msgs pending delivery
    • Collect errors from event log, generated during past 24 hours

    Once collected, it will send a message with the information as an attached and within the body of the message.


    Thanks to http://blogs.technet.com/b/gary/ & http://gsexdev.blogspot.com/ for their contribution

    Attached is the script (right click and save target as for proper formatting)

    Have fun!

    Doug

  • Exchange 2007 Summary Script

    I was asked to post a blog on my script that collects summary information from an Exchange 2007 organization.  Here is what the script does:

    • Places the information in C:\Exchdata directory
    • Pulls information about Active Directory
    • Collects Server information
    • Collects Routing information
    • Pulls statistical information, including mailboxes, backup status, etc

    Attached is the script (right click and save as for proper formatting)

    Doug

  • Not-So-Common Forefront Registry Keys

    Forefront’s TechNet site covers the common registry keys but I wanted to provide a little (very little) information about some not-so-common registry keys.  By default, these keys will be listed under the Exchange Server folder within the following registry values:

    32-bit: HKLM\Software\Microsoft\Forefront Server Security\
    64-bit: HKLM\Wow6432Node\Software\Microsoft\Forefront Server Security\

     

    Advanced Logging Registry Keys

    These keys are related to advanced logging when troubleshooting Forefront issues.  However, by enabling these keys, there is a performance hit that could impact how quickly email is processed by Forefront.

    • EventLogTraceCategory (Set to “2”)
    • TextFileTraceCategory (Set to “2”)
    • TraceEventLog (setting of 0 = off)
    • TraceTextWriter (setting of 0 = off)

    Engine Timeout Keys (often related to event 6014)

    • EngineDownloadTimeout – Controls the time it takes to download engines (default is 300 sec – 5min)
    • GetFileHTTPTimeout – Controls timeout value for WinHttp calls

    Other Registry Keys

    • DisablePurgeOnMultipleScanFailures – Used as a means to get around memory error deletions that might occur on Forefront.  Consult with Microsoft Support prior to implementing.
    • DoNotScanIPMReplicationMessages - Setting this DWORD registry key to 1 prevents the Transport Scan Job from scanning IPM (Public Folder) replication messages
    • ExpirationNotifications - Setting to 0 turns Expiration Notifications off (default is 1 - on)
    • InternetTimeout - Used to prevent timeouts during Transport scanning (http://technet.microsoft.com/en-us/library/bb795154.aspx)

    Good Luck
    Doug

  • Forefront Obsolete Notifications

    After upgrading your Antigen or Forefront for Exchange to the SP2, you might start seeing notifications relating to obsolete engines, even though you've disable the engines in the management tool.

    The Ahnlab Virus Detection Engine scan engine is now obsolete and no longer supported. Updates are no longer available for this engine, and therefore the update check for this engine has been disabled.  Please review the scan engines chosen for your scan jobs and make another selection to ensure up-to-date protection. For more information, see http://go.microsoft.com/fwlink/?LinkId=152864

    My suggestion would be to start by reading http://blogs.technet.com/fss/archive/2009/11/16/how-do-i-disable-these-engine-end-of-life-notifications-i-am-receiving-from-antigen-and-forefront.aspx

    If that does fix your issue, know that a solution from Microsoft should be out in the coming weeks.  However if you need a solution today, then take a look at the EngineList registry key on your server located at HKLM/Software/Wow6432Node/Microsoft/Forefront Server Security/Exchange server.

    Then use the information below to determine if that key has an old engine enabled.  If you find that an old engine is enabled, here are the steps that you can use to reset them.

       NOTE: There is a risk that other settings will be impacted (like file filter lists).  Document and/or backup the configuration so that you can restore values if needed prior to proceeding.
        1.  Stop the FSCcontroller services
        2.  Modify the EngineList key to a proper value (ex: 0x00008243)
        3.  Rename the Scanjobs.fdb and templates.fdb 
        4.  Start the services. 

    New Scanjobs and templates should be recreated.  This change enables the COMMAND engine so be sure that you’ve configured the engine to pull updates.

    MORE INFORMATION

    These are the bit values for the obsolete engines:
      SOPHOS          (0x00000008)
      CA_VET           (0x00000020)
      AHNLAB           (0x00000080)
      SPAMCURE      (0x00001000)

    These are the bit values for the current active engines:
      NORMAN          (0x00000001)
      MICROSOFT     (0x00000002)
      COMMAND       (0x00000040)
      SYBARILIST     (0x00000100)
      VBUSTER         (0x00000200)
      KASPERSKY5   (0x00008000) 

    If you take the value from the EngineList key, you can determine which engines are currently enabled.
    Example 1: Current value is 0x0000820b.  Engines enabled = SOPHOS, NORMAN, MICROSOFT, VBUSTER, & KASPERSKY5
    Example 2: Current value is 0x00008223.  Engines enabled = CA_VET, NORMAN, MICROSOFT, VBUSTER, & KASPERSKY5
    Example 3: Current value is 0x000080e2.  Engines enabled = KASPERSKY5, MICROSOFT, COMMAND, AHNLAB, & CA_VET

    Example Proper Value 1: 0x00008342   Engines enabled = KASPERSKY5, VBUSTER, SYBARILIST, COMMAND, MICROSOFT
    Example Proper Value 2: 0x00008243   Engines enabled = NORMAN, MICROSOFT, COMMAND, VBUSTER, & KASPERSKY5

    Doug

  • Understanding the right terminology…

    Over the past 3 months, I've had several discussions around planning of messaging environments.  These dicussions always lead to discussions around availability and the similar.  But what I've discovered is that many people confuse some of the terminology which makes the planning phase more difficult. So I wanted to clarify some of these terms...


    AVAILABILITY
    Availability is the degree to which an application, service, or system is perceived by users to be available.  Availability typically consists of redundancy and fault tolerance as a means to eliminate any single point of failure.  A high-availability solution masks the effects of a hardware or software failure and maintains the availability of applications so that the perceived downtime for users is minimized. A good solution should be able to take appropriate action with little to no user involvement.  Availability is not data protection and recovery, nor is it disaster recovery.


    REDUNDANCY
    Redundancy is a key part of availability.  This is the use of multiple components, services, or systems to ensure that if one fails, another can carry the workload.  Examples of redundancy include the use of multiple servers in a load-balanced environment to improve farm performance or to scale out to accommodate additional users. Redundancy may also be the use of identical backup components, such as power supplies or networking equipment, to provide continued functionality in the event of the failure of the primary component.


    RECOVERABILITY
    Recoverability is recovering from an outage for an application, service, system.  This includes understanding the process to recover, the time needed to recover, how much data/productivity can be lost, etc.  Basically, this is disaster recovery


    SITE RESILIENCY
    Site Resiliency is when a physical location or datacenter has experienced an issue that may impact user productivity or data.  This might involve only one or a set of applications, systems, or services.  In either case, if the current datacenter is not capable of providing the necessary resources to bring the resource(s) online at 100%, then you might need to fail over to an alternate location.  Often failing over to an alternate site is a manual process.


    So when considering an IT solution, be aware of the differences in these terms and that each may require a different solution. 

    Exchange Server 2010: High Availability and Site Resilience
    http://technet.microsoft.com/en-us/library/dd638121.aspx

    Exchange Server 2010: Disaster Recovery
    http://technet.microsoft.com/en-us/library/dd876874.aspx

    Exchange Server 2007: High Availability
    http://technet.microsoft.com/en-us/library/bb124721(EXCHG.80).aspx

    Exchange Server 2007: Disaster Recovery
    http://technet.microsoft.com/en-us/library/aa998848(EXCHG.80).aspx

     

    Doug

  • Have You Considered?

    When working with Exchange, there are some factors that you may need to consider to get the most out of the product. Here are some aspects that may help you…

    Online Mode

    • Better for mailboxes over 2GB mailbox size (this is due to the desktop client system resource limitations – CPU, Memory, Disk speed, etc)
    • Secure in that all messages are stored on the server and no content is downloaded to the client
    • Mail delivery on the client and viewing of new users in the GAL is instant (no polling or OAB download required)
    • Requires active connection to the server to work. So any service (server) disruptions are felt by Online Clients
    • Cost of the backend mailbox server resources grows when clients are in Online mode (Memory, CPU, Disk IO, etc)
    • Outlook Views limitations apply (if users have more than 5,000 objects in a folder, performance delays occur!!)
    • All work is done on the server (not good)
    • In most cases, Online Client performance is significantly worse than cached mode clients
    • Search is restricted to online client search (all work done on the server)
    • Greater chance of one client impacting the server performance and all other connected client’s

    Cached Mode

    • Best option for remote clients and/or mailboxes under 2GB in size
    • Allows users to work offline. Cached mode clients can still function during service (server) disruptions
    • Much better solution for Disaster Recovery scenarios
    • Able to use EFS to secure local cached files (OST and OAB) on desktops
    • Reduce server-side disk IO
    • Outlook views limitations does not apply
    • Reduced chance of 1 client impacting the overall server’s performance
    • Improved Search on the client!!
    • Most work is done on the client – thus releasing server side resources
    • Recommended configuration by most vendors that interact with Outlook or Exchange
    • Not the best solution for mailboxes over 2GB (see online mode comments)
    • Users have to grow accustomed to how mail is downloaded in Cached Mode (easy work around for this!). This sometimes causes a perception that Blackberry receives the mail faster than Outlook.
    • New Users are not shown in the GAL until the client downloads the Offline Address Book (default is every 24hours – can be changed via GPO)

    RPC over TCP

    • Direct Connection to Mailbox Server by Client which can impact Mailbox Server performance
    • Clients cannot connect from the Internet without a VPN solution, unless firewall ports are opened (NOT RECOMMENDED!)
    • Less Secure communications than HTTPS
    • Cost to support and manage RPC OVER TCP is usually higher (i.e. network design, backend mailbox server configuration, managing client connections, security, etc)

    RPC Over HTTPS (Outlook Anywhere)

    • Connects through Client Access Server (CAS) to obtain mailbox data. This reduces performance impact on the Mailbox Server by offloading conversion to the CAS
    • Most hardware network accelerators work with RPC OVER HTTPS to improve performance
    • More secure method of client to server communication based on SSL Certificate
    • Easier to lock down port security between server and client subnets
    • Clients can connect over the Internet without a VPN connection
    • Works well with ISA and IAG for additional security outside of the organization with requiring VPN
    • Requires SSL for Server and Client communication

    Windows XP

    • Users are familiar with OS
    • Most limitations are well known by now because of product maturity
    • Almost all client hardware works well with this OS
    • Product Lifecycle is ½ through support

    Windows Vista

    • A more secure and stable platform than XP
    • Improved performance and reliability (i.e. SMB, Self-Healing NTFS, etc)
    • New client features
    • Product is still well within its Product Support Lifecycle
    • Most Hardware vendors have appropriate drivers readily available for Vista
    • Improved server to client communication performance
    • Typically requires some end-user and administrator education on the changes of the OS

    Windows 7

    • Same as Vista but includes a more secure, stable platform with improved performance and security than previous versions of Windows
    • Offers new features (when used with Windows Server 2008 R2) such as DirectAccess, and BranchCache
    • Able to secure the applications and computer using Bitlocker and Applocker
    • Improved client productivity with Windows 7 enterprise search functionality and the Windows Troubleshooting Platform
    • Product is just entering its Product Support Lifecycle

    Windows Mobile (ActiveSync)

    • Security of device and connection is secure
    • Certified for government use
    • Does not require a service account to be able to access all mailboxes
    • Activesync support is included in Microsoft Premier contract – no additional contract required
    • Cost is minimal – device & device service only
    • No performance impact to Exchange – equivalent to OWA user
    • Users are able to manage devices in OWA or desktop
    • Use of ActiveSync & Device policies to manage services
    • High availability of Activesync is available based on Exchange HA design – no additional requirements

    Blackberry

    • Security of device and connection is secure
    • Certified for government use
    • Requires service account access to all mailboxes and root level SQL access
    • Higher cost – device, device service, Blackberry license, BES support
    • Requires additional servers to be installed in environment (SQL & BES)
    • Impact on Exchange is severe (4x IO)
    • Users are familiar with device and technology
    • High Availability is limited - requires 3rd party solution for site resiliency
    • User can manage devices from desktop

    There are many other factors involved but hopefully this provides some insight.

    Doug

  • Using Sharepoint with your Outlook Client

    I find that many people have deployed Sharepoint but have not fully integrated it with their Outlook client. My advice would be to start integrating the various technologies to make the end-user productivity improve. Here are some links that might help educate on this topic.

    GENERAL
    These sites offer some general information about connecting Outlook to Sharepoint.

    CALENDAR

    Within your Outlook Calendar, you can view and update a SharePoint calendar, set it side-by-side, view an overlay of the calendars, and even copy events between the calendars.

    CONTACTS

    You can also add and remove Sharepoint contacts to your Outlook.

    TASKS

    If you spend a lot of time working with e-mail, you may find it easier to work with a tasks list from a SharePoint site directly in Office Outlook 2007, instead of switching to your site in a Web browser. Within Outlook, you can track, update or categorize a task and even drag or copy tasks back and forth between the folders for Outlook and the SharePoint site.

    SHAREPOINT FILES

    Once you connect your SharePoint library to Outlook, you can browse for and view the file just as you would an e-mail message, without leaving Outlook. Working with files from your SharePoint site in Outlook is best for browsing through and editing routine files that you store in a document library, such as documents, spreadsheets, and presentations. More complex data operations, such as working with custom lists or updating database applications, are better handled directly on the SharePoint site. You can also take the files offline to work on them. When you get back to the office, you can update the versions on your SharePoint site. Your changes to a file are not updated automatically on the server while you edit and save the file. This enables you to work more quickly with offline files, because your computer does not need to connect to the server while you are working. When you close a file, you are prompted to update the changes on the server. Only the changes that you made to specific files are updated, which means that Outlook does not have to synchronize the whole library again with the server.

    TROUBLESHOOTING

    If you find that your interaction with Outlook and Sharepoint is not working as you would expect it, you could try enabling logging on the Outlook client to see if you can identify an error. To enable logging in Outlook 2007, edit the following registry value:

    HKEYCurrentUser\Software\Microsoft\Office\12.0\Outlook\Options\Mail

    RegDWORD: EnableWSSSyncLogging – Set to a value of 1 to enable (0 to disable)

    Once enabled, restart Outlook and try the interaction activity that you are trying to complete between Sharepoint & Outlook. Then look for wss-sync-log.htm logs on the client (%temp%).

    As for understanding the logs, that might be another blog topic.

    Well, that pretty much sums up the basics for Outlook integration with Sharepoint.

    Doug

  • Random Questions on Exchange

    The following are some quick answers to a few random questions that have come my way regarding Exchange:

    1. How often should Online Maintenance (OLM) run? How can I confirm that it completed?

    Configure the Online Maintenance schedule so that each database completes at least once a week. The Application log should have an event 1221 for every database. You may need to alternate the schedule so that other processes, such as backups and antivirus scans, do not impact the OLM process.

     

    2. Can I move the location of the Content Indexing Files in Exchange 2003 or 2007?

    Exchange 2003 – see http://technet.microsoft.com/en-us/library/bb124634(EXCHG.65).aspx
    Exchange 2007 – the catalog files are stored on the same disk as the database and cannot be moved without impacting the database file location

     

    3. What should I know about Exchange 2007 Content Indexing as an Exchange Admin?

    • Content Indexing (CI) is enabled on mailbox databases by default
    • Public Folder databases do not have content indexing
    • Pictures and movies are not indexed
    • CI usually adds a 5% or less overhead when enabled
    • Outlook 2007 – cached local mode using Windows Search 4.0 or later offers the best search performance for users
    • Who uses CI
      • Exchange ActiveSync's "Server search" feature
      • OWA 2007
      • Online Outlook 2007 client
    • To test search for a mailbox, use Test-ExchangeSearch cmdlet
    • To troubleshoot search, reveiw http://technet.microsoft.com/en-us/library/bb123701.aspx

     

    4. How do I ensure that clients always have an end-date for recurring meetings?

    You should stress setting end dates for recurring meetings. You can enforce this by following:

    KB 952144 - You cannot disable the "No end date" option for appointments, meeting requests, tasks, or task requests in Outlook 2003
    KB 955449 - You cannot disable the "No end date" option for appointments, meeting requests, tasks, or task requests in Outlook 2007

     

     

    5. How can I find if the server is Enterprise or Standard Edition in Exchange 2007?

    Open the Exchange Management Shell and run Get-ExchangeServer | FT Name, Edition

    The Output will look like:

         Name        Edition
         --------   -----------
         Server1    Enterprise
         Server2    Standard

     

    6. I just repaired (eseutil /p) my database. Now what?

    If you had to /p your DB, then you’ve already done a number on the DB. To get the DB back to a fully functional state, do the following:

    1. Run ESEUTIL /MH and ensure that the State is Clean (consistent)
    2. Try to mount that database to ensure that the server can mount it
    3. If it is working, take an offline copy of this database as a backup
    4. Run ESEUTIL /D to offline defragment the database
    5. Run ISINTEG –S servername –Fix –Test Alltests against the database
    6. Mount the database and then take a FULL backup of the database

     

    7. What impact to Exchange 2007 does enabling the System Cryptography: Use FIPS compliant Algorithms for encryption, hashing, and signing GPO setting have?

    FIPS may impact services such as Autodiscover & Availability. To work around this impact, locate all web.config files in use by the Exchange Server 2007 services and follow KB 911722 to edit the files to support the changes that the FIPS GPO setting makes.

     

    8. Does the delegate user really need to have the same client version?

    If both the manager and delegate are using Outlook, YES the client should be the same build to include version and service pack. By doing this, you reduce the chance of having incorrect settings or changes made to a calendar object.

    If Entourage is involved, the delegate can use Outlook against the manager’s mailbox however I recommend turning off any search software that might work against the manager’s mailbox. If there is search software running against the same mailbox with different versions, there is a risk of an increase in database search folders and thus increase in DB size.

     

    Enjoy!

    Doug

  • Tracking Down Exchange 2007 Database Bloat

    I recently dealt with an issue of an Exchange 2007 database being physically larger than what was expected. So we took a few actions to find out more about the cause of the bloat. This outlines some of the work we did to isolate that bloat.

    GATHER DATA

    We started by getting more information about the database
    1. Check the most recent 1221 event in Application Log to ensure that Online Maintenance has completed
    2. From the Exchange 2007 Management Shell, run Get-mailboxstatistics
    3. PFDavAmin Item Content report
    4. Run ESEUTIL /MS against that database (ex: Eseutil /ms DBName.edb >C:\MSOutput.txt)
    5. Run ISINTEG –DUMP against that database

    DETERMINE HOW MUCH BLOAT

    Event 1221 showed us how much whitespace the DB reclaimed during online maintenance:
       Event ID : 1221
       Category : General
       Source : MSExchangeIS Mailbox Store
       Type : Information
       Message : The database "MyStorageGroup\MBXDB" has 5178 megabytes of free space after online defragmentation has terminated.
    Added up the Deleted Item Size & Item Size from the Get-mailboxstatistics output. This is rough version of how much “user data” that the database has.
    Noted the physical size of the database (ex: 50GB). Determine how much bloat may exist by adding the event 1221 whitespace (ex: 5GB) and the user data (24GB). In our example, we have a total of 29GB accounted for but 21GB unaccounted.

    NOTICE: Before you dig into the /MS output, you should read through the ESE Database Structure technet article. At a minimum, understand that pages in Exchange 2007 are divided into 8-KB pages, where Exchange 2000 and 2003 (ESE98) use 4KB pages.

    Do not expect to have a DB that is physically the same size as your whitespace and user data. There are many reasons why the database may require additional space. These might include database structure such as indexes, tables, and search folders as well as fragmented pages and unclaimed whitespace (i.e. changes since expiry and online maintenance).

    SAMPLE /MS OUTPUT

        ********************************** SPACE DUMP *************************************
        Name                  Type     ObjidFDP    PgnoFDP     PriExt     Owned      Available
        =====================================================================================
       Dbname.edb             Db          1           1        256-m      3187862      64000
       1-121                  Tbl        112        426          8-s         8             0
       ?B6708?T668f           Idx       1848        431          1-s         1             0
       MsgFolderIndex7        Idx        113        427          1-s         1             0
       MsgFolderIndexPtagDel  Idx        116        430          1-s         1             0
       MsgFolderIndexURLComp  Idx        115        429          1-s         1             0
       RuleMsgFolderIndex     Idx        114        428          1-s         1             0
       1-24                   Tbl         61        142          2-m       695104          3
       1-611BB71A86           Tbl        312        833          8-m        3014           5
       ?B6708?T668f+B67aa+S1  Idx       1850        476          1-s         1             0
       MsgFolderIndex7        Idx        313        834          1-s         1             0
       MsgFolderIndexPtagDel  Idx        316        837          1-s         1             0
       MsgFolderIndexURLComp  Idx        315        836          1-s         1             0
       RuleMsgFolderIndex     Idx        314        835          1-s         1             0
       S-1-28B913B0D4F        Tbl       1862        705          8-s         8             3
       MsgFolderIndexURLComp  Idx       1863        706          1-s         1             0
       ptagFIDIndex           Idx       1865        708          1-s         1             0
       ptagSearchedFIDIndex   Idx       1864        707          1-s         1             0
       - continued -
    ---------------------------------------------------------------------------------------------------------------------------
                                                                                      647540

    MS Output Field information

    • FDP is a special page in the database which indicates which B+tree this page belongs to. ObjidFDP is the Object ID of the FDP
    • PgnoFDP is the page number of the FDP
    • PriExt is the combination of a number and letter. The number before the dash is the initial number of pages when the object was first created in the B-Tree. The letter after the dash indicates whether the space for the B-Tree is currently represented using multiple pages ("m") or a single page ("s").
    • Owned number of pages that contain data and/or are in use
    • Available the number of free pages available
    • Type may include Table (TBL), Index (IDX), and Long-value (LV)
    • LV may be required because a column or a record in ESE cannot span pages in the data B+tree. There are values that break the 8KB boundary of a page; referred to as long-values (LV). A table's long-value B+tree is used to store these large values.

    READING THE /MS OUTPUT

    We decided to look at 4 things within the /MS output:

    Calculate Actual Whitespace: The number at the end of the dump (647540 in the above example) is the summation of the total number of pages that are available throughout all the tables. Take that number and multiply that by the page size value (8KB for Exchange 2007). In our example, we have 5,180MB of Whitespace.

    Attachment Table: Table 1-24 holds all attachments in the database. In our example, we have 695104 Owned Pages for this attachment table. We multiply that number with the page size (8KB) and the total is 5.5 GB of space is for attachments.

    Search Folders: Search folders are listed by the S- value. In the example above, S-1-28B913B0D4F is a search folder. Look for a many S- values in the output and follow DGoldmans blog to identify anyone users has a large number of Search Folders.

    Large Consumption Users: Look through the output and see if there is any object that has a large number of owned pages. In our example, we see that 1-611BB71A86 has 3014 pages.

    NOTE: All user mailbox folder tables are numbered, not named. In the example above, 1-611BB71A86 is a mailbox folder table. But also look at other tables, such as MailboxTombstone or Message Tombstone.

    If you find a numbered table that has a large number of owned pages, you can identify which mailbox that table belongs to by looking at the ISINTEG –DUMP output.

    To do this, copy the numbered value after the dash (611BB71A86) and then search the ISINTEG output file from the bottom up for that value.

    Example:

      [6] RootFID=0001-00611BB71A86
      Owner DN=???
      GUID=D12C30EC 4938E64D 89999899 906A78DA
      Display Name=Mailbox - John Doe
      Comment=
      Sentmail FID=0000-0000309F1E78
      Subtree=0001-00611BB71A87
      Inbox=0001-00611BB71A88
      Outbox=0001-00611BB71A89
      Sentmail=0001-00611BB71A8A
      Finder=0001-00611BB71A8C
      DAF=0001-00611BB71A8D
      Spooler Q=0001-00611BB71A8E
      Size=(ec:ecNotFound-MAPI_E_NOT_FOUND)
      Localized=TRUE
      Locale=0x409
      In some cases, the search results yield something like this:
      Folder FID=0001-00611BB71A86
      Parent FID=0001-00611BB71A92
      Root FID=0001-00611BB71A92
      Folder Type=1
      Msg Count=0
      Msgs Unread=0
      Msgs Submitted=0
      Rcv Count=1
      Subfolders=0
      Name=Shortcuts

    If your results do not show a mailbox name, then this folder may be a subfolder. You can then search the ISINTEG output for the Parent FID value (Ex: 00611BB71A92). You may have to do this several times until you locate the root mailbox name.

    BACK TO OUR ISSUE

    So what we found in our issue was that we had a very large number of Search Folders present in the /MS output. We decided to configure the RESET VIEWS registry key for that database, allow online maintenance to complete for several more times until more whitespace became available. We then perform an offline defrag of the database. This freed up some of the DB bloat.

    NOTE: If the database is continuing to grow in size, you may want to capture the data on a regular basis and see if there are any patterns for the growth (i.e. types of data or specific users). Then try to isolate why that bloat may be occurring.

     

    REFERENCES

     

    Doug

  • Paged and Non-paged Pool Issues on Exchange 2000/2003

    This blog provides some guidance on how to optimize Exchange 2000/2003 on Windows OS for paged and non-paged pool issues.

    Symptoms

    Event log might show
    Event ID 2020 
    Event Type: Error
    Event Source: Srv
    Event ID: 2020
    Description: The server was unable to allocate from the system paged pool because the pool was empty.
    
    Event ID 2019 
    Event Type: Error 
    Event Source: Srv 
    Event ID: 2019 
    Description: The server was unable to allocate from the system NonPaged pool because the pool was empty.

    Other symptoms of pool exhaustion on the system include application or process hangs, out of resource errors reported by drivers or applications, the server becomes slow or refuses additional requests and connections, or all of the above!


    Understanding Pool

    When a machine boots up, the Memory Manager creates two dynamically sized memory pools that kernel-mode components use to allocate system memory. These two pools are known as the Paged Pool and NonPaged Pool. Pool memory is allocated statically during Windows startup. Available pool memory depends on several factors to include boot switches such as /USERVA and /3GB, registry settings, and physical RAM.

    Pool memory is not the amount of RAM on the system. It is a segment of the virtual memory or address space that Windows reserves on boot. These pools are finite because x86 OS can only address 2^32==4GB. By default, Windows uses 2GB for applications and 2GB for kernel.

    These pools are used by either the kernel directly, indirectly by its support of various structures due to application requests on the system (CreateFile for example), or drivers installed on the system for their memory allocations made via the kernel pool allocation functions.

    NonPaged means that this memory when allocated will not be paged to disk and thus resident at all times, which is an important feature for drivers. Paged conversely can be paged out to disk. In the end though, all this memory is allocated through a common set of functions, most common is ExAllocatePoolWithTag.


    System PTEs

    Page Tables are built for each process address space. The Page Table maps logical virtual addresses for a process to physical memory locations. System Page Table Entries (PTEs) are used to map system pages such as I/O space, Kernel stacks, and memory descriptor lists. Tuning the memory by using the USERVA switch in conjunction with the /3GB switch can often stave off PTE depletion issues.


    /USERVA

    A boot.ini switch used for more precise tuning of user and kernel virtual memory space in the Windows Server 2003. Use this switch in conjunction with /3GB switch in the Boot.ini file to tune reduce the User-mode space, allowing the difference to be returned to Kernel mode. Standard configuration is to use /USERVA=3030. However there might be several situations where you need to allow for more PTEs to become available. On an Exchange server, the value of USERVA should not be lower than 2970 without consulting Windows Performance team.


    /3GB

    A boot.ini switch that allocates 1 GB to the kernel and 3 GB to the User-mode space. Using this switch reduces the memory available for Nonpaged Pool, Paged Pool, & System Page Table Entries (PTEs).


    /PAE

    A Boot.ini switch. When more than 4GB of physical memory is used on the system, the process of paging memory to the disk increases dramatically, and performance may be negatively impacted. The Windows memory managers use PAE to provide more physical memory to the operating system. This reduces the need to swap the memory in and out of the page file and results in increased performance. All the memory management and allocation of the PAE memory is handled by the memory manager independently of the running programs. It is now a best practice to allow the PAE kernel to load on an Exchange 2003 server.


    Pool Tags

    A pool tag is a four-byte character that is associated with a dynamically allocated chunk of pool memory. The tag is specified by a driver when it allocates the memory. The routine ExAllocatePoolWithTag is called to allocate pool memory. Pool tags are useful for identifying which drivers are allocating nonpaged and paged pool memory.


    By the Numbers

    Review the chart at the bottom of this document to identify the default boot up of Paged Pool, NonPaged Pool, and Free System PTE as it applies to your system configuration.

    A typical large-scale Exchange 2003 server should use no more than 200 MB of paged pool memory under typical conditions. Paged pool memory use of more than 220 MB requires immediate attention. Under standard load, there should be approximately 50 MB of available paged pool memory. If you have less than 30 megabytes free, you should take immediate steps to reduce the load on the server.

    In general, a system should always have around 10,000 free System PTE’s. If the value gets below 5,000, then the system could hang temporarily.


    64-bit resolves this, right?

    Not quite. While 64-bit programs can take advantage of the 16-TB tuning model (8 terabytes User and 8 terabytes Kernel) that 64-bit OS has, 32-bit programs still use the 4-GB tuning model (2 GB User and 2 GB Kernel). This means that 32-bit processes that run on 64-bit versions of Windows run in a 4-GB tuning model. So by running 32-bit programs on 64-bit OS, you could still run into limitations.


    Troubleshooting Memory Depletion

    Here are some basic troubleshooting steps to identifying the problem with Pool Memory depletion on Exchange 2000/2003 server.

    1. Is the system showing any symptoms of pool memory depletion (event log, hangs, etc)?
    2. Check system performance using Task Manager and performance monitor (Memory Counters: Available Mbytes, Free System Page Table Entries, Pool Nonpaged Bytes, & Pool Paged Bytes)
    3. Use Poolmon to capture the system’s memory allocations of paged and non-paged kernel pools. See the section below on using Poolmon. (poolmon –n –b –g)
    4. Review the poolmon output and identify any tags that are consuming a high amount of Paged or NonPaged memory. Compare the output to your baseline or another system.
    5. To search for a driver that uses a specific tag, use the findstr /m /l /s xxxx *.sys command from the cmd prompt (where xxxx is the tag name).
    6. If you suspect a tag or driver, research that driver (KB, Internet, etc) for any known issues and latest hotfixes.
    7. If needed, review the pool information using debugging tools (see debugging section below)

    Common Causes of Pool Depletion

    • Problem with a driver or application taking more than needed (leak)
    • With Exchange 2000/2003, having more than 4 GB of physical RAM (Each byte of physical RAM that is installed in a server requires some kernel memory to address and manage it. The more RAM that is installed, the more kernel address space must be reserved for it. Address space may be borrowed from paged pool memory to satisfy this demand.)
    • Having a Public Folder store mounted on a mailbox server that has many mailboxes. If mailboxes are connecting to the PF store, the additional connections require additional system resource to be consumed. 
    • User token size (http://support.microsoft.com/kb/912376)

    NOTE: Size of the PF store does not matter – it is the connections to the store


    Data Collection

    If experiencing symptoms, collect the following data:


    Common Best Practices

    The following are some best practices for managing paged and nonpaged pool issues on Exchange 2000/2003.

    NOTE: Before performing these steps, run Poolmon on the server at difference intervals to keep track of system changes

    • Run Exchange Best Practice Analyzer (EXBPA) and resolve all of the issues that it identifies. EXBPA identifies many of the common configuration settings for improving memory.
    • Check for proper Boot.ini configuration (/3GB, /USERVA, /basevideo, /PAE, etc)
    • Within the System BIOS, disable Hot-Add memory -and- set the DynamicMemory registry key to a value of 1 (http://technet.microsoft.com/en-us/library/aa996104.aspx)
    • Disable Windows Scalable Networking Pack and TCP Chimney on the server
    • Disable Checksum Offloading by setting the DisableTaskOffload regkey to a value of 1 (http://support.microsoft.com/kb/904946)
    • Check the Network card settings and make sure that “Number of Receive Descriptors” is set to default (NIC Properties > Advanced)
    • Disable or remove unwanted Applications or drivers from the system
    • Distribute heavy-connection users evenly across multiple servers. Heavy-connection users are likely to be those who have multiple computers or devices and those who are mobile users.  This includes mobile, telephony, archiving, and mail client connections.
    • A Public Folder store can impact paged and nonpaged memory. Consider using a dedicated PF server for large environments.
    • Identify user TOKEN (TOKE) size (http://support.microsoft.com/kb/912376). If this is high, refer to KB 912376 to reduce token size or distribute users with large tokens across multiple servers.
    • Apply relevant hotfixes & updated drivers to the system
    • Restrict unauthorized clients and applications from accessing Exchange server. This can be done by isolating the subnet between clients and servers and by setting the Outlook connection range value on the server (also see KB 288894)
    • Using terminal services in application server mode on an Exchange 2000/2003 server can drastically impact performance on the server. Do not use terminal services in application mode!
    • Reduce File Cache Lifetime to 5 minutes (300 decimal) (http://support.microsoft.com/kb/267551)
    • Set POP3/IMAP Protocol Logging to 0 (http://support.microsoft.com/kb/299778)
    • Implement MaxPercentPoolThreads and AdditionalPoolThreadsPerProc (http://technet.microsoft.com/en-us/library/aa997700.aspx)
    • Move the EXIFS free list (Flst) and auxiliary free list (AuxL) to paged pool memory
      • HKLM\SYSTEM\CurrentControlSet\Services\EXIFS\Parameters
    • Set “AuxFreeListInPagedPool” = REG_DWORD 0x00000001
    • Set “FreeListInPagedPool” = REG_DWORD 0x00000001
    • Set EnableAggressiveMemoryUsage for IIS to 1 (http://support.microsoft.com/kb/820129)

    NOTE: There are other settings and adjustments that can be made to the system to potentially improve Paged and NonPaged Pool performance. Those adjustments should only be made after completing the above steps and consulting with Microsoft Services.


    Using Poolmon

    Poolmon is Windows Support Tool that displays data that the operating system collects about memory allocations from the system paged and non-paged kernel pools, and the memory pools used for Terminal Services sessions. The data is grouped by pool allocation tag.

    NOTE: Pool tagging is permanently enabled on Windows Server 2003. However, in earlier OS versions (including XP), you must use Gflags.exe to enable pool tagging.

    1. Open a command prompt and view the available options: Poolmon /?
      -s Display session pool
      -n [Logfile] Take a pool snapshot (Logfile maybe specified, default is poolsnap.log)
      -c [LocalTagFile] Display driver information using LocalTagFile
      -g [PoolTagFile] Display driver information using PoolTagFile
      -itag Include the tag
      -xtag Exclude the tag
      -e Display totals
      -t Sort by tags
      -a Sort by allocs
      -u | -b Sort by Bytes
      -f Sort by free
      -d Sort by diff
      -m Sort by each
      -l Highlight
      -p First turns on nonpaged, second turns on paged
      -( | -) Increase parenthesis
      -r Print memory summary information
    2. Copy the Pooltag.txt file from the Windows Debugging tools directory to the system running poolmon. This will allow you to map the driver to the tag.
    3. Next run Poolmon -n –b -g
    Review the poolsnap.log. The tags should be sorted by byte usage.

    References
  • Some Public Folder content is not replicating from Exchange 2003 to Exchange 2007

    I’ve recently experienced several different cases where replication between Exchange 2003 to 2007 was not working 100%. Much of the content had come over but some was not.  This blog talks about some of the steps that we took to isolate and resolve this.

    NOTE: Before you go any further, be sure to read the Public Folder Replication Troubleshooting blog: http://msexchangeteam.com/archive/2008/01/10/447843.aspx

    IS THE PROBLEM REAL

    Checking ESM may show you that the size and total items may be mismatched from a Exchange 2003 and 2007 server. If so, determine which folders are experiencing this and what type of content they hold. Log onto the Public Folder store using Outlook or MFCMapi to spot check that the content is in fact out-of-sync with each other.

    Know that Public Folder calendars may not be 100% equal from 1 server to another. This is because when you use OWA or any CDO application to access a calendar Public Folder, an instance of recurring items is created and stored within the Public Folder. While this instance is not visible in Outlook, it does add to the item count on the public folder, as seen in ESM. Additionally, those instances are not replicated between public folders so it is possible that a calendar PF on ServerX has many more items listed than ServerY. And they might have their content in-sync.

    TRY THE BASICS

    Once you have identified the Public Folder(s) that are not in sync, do some of the basic steps to force replication:

    NOTE: See KB 842273 How to troubleshoot public folder replication problems in Exchange 2000 Server and in Exchange Server 2003

    1. Turn up MSExchangeIS\Public Diagnostic Logging related to Replication on the source and target Exchange servers
    2. Enable Messaging Tracking on both source and target servers
    3. Within ESM, right click on the Organization object and make sure that Public Folder replication is not paused.
    4. Ensure that the Exchange 2007 Public Folder store is listed as a Public Folder replica and that the store is mounted
    5. Check the Exchange 2007 server Public Folder Administration tool (toolbox) and verify that the hierarchy has replicated to the server. Hierarchy before data!
    6. Check the configuration of the antivirus software – ensure that it does not include Public Folder messages (DoNotScanIPMReplicationMessages)
    7. Add new content to the source public folder and ensure that the new content has replicated over to the target server
    8. Use PFDavAdmin against the source server to remove item-level permissions
    9. Right click the problem folder and select Resend Changes. Send all changes over the past 1000 days or so.
    10. Review the application log on the source server to see if a replication message for that folder has left the source server.

    If so, check the target server’s application log to see if a replication message came in. If there is no inbound application log message for replication, message track that message from the source server.

    If not, search for warnings and errors related to PF Replication.

    NEXT STEPS

    At this point, you should have isolated which folders work and which do not. Also, you know if the message is leaving the source server or not and if the target server is accepting the message or not. Also, message tracking would tell you if an NDR was returned (ex: 554 5.6.0 STOREDRV.Deliver.Exception propertyValidationException) .

    An NDR may indicate that there is an incorrect property value set (or missing) on the Public Folder. To find that property, try:

    1. On the Hub Transport Server, enabled ContentConversionTracing & PiplineTracing
    2. Modify some items in the source Public Folder
    3. Search the content conversion file for Exception. Here are 3 different issues that we experienced:

    ERROR 1: Microsoft.Exchange.Data.Storage.PropertyValidationException: Property validation failed. Property = [{00020329-0000-0000-c000-000000000046}:'Keywords'] Categories 

    Error = Element 0 in the multivalue property is invalid..

    CAUSE: Content Conversion logging showed us that there was an issue with the Categories field on a Public Folder. So we opened Outlook against the source folder and viewed the content by Category (Views > Current Views). What we found was that some categories had an invalid character listed, such as a @, comma, or space. We edited those objects and removed the invalid character from the category field and then replication worked properly for those objects.

     

    ERROR 2: Microsoft.Exchange.Data.Storage.PropertyValidationException: Property validation failed. Property = [{00062004-0000-0000-c000-000000000046}:0x8092] Email2AddrType

    Error = Email2AddrType is too long: maximum length is 9, actual length is 25.

    CAUSE: Here we see that Email2AddrType is the problem. This particular folder was a Contacts folder. What we found was that some of the contacts had invalid addresses listed. We modified those addresses and replication completed properly for those objects.

     

    ERROR 3: FAIL 554 5.6.0 STOREDRV.Deliver.Exception.ObjectValidation Failed to process message due to a permanent exception

    Microsoft.Exchange.Data.Storage.ObjectValidationException: The object is invalid and cannot be saved.

    CAUSE: With this error, we were trying to replicate recurring appointments. Taking a closer look at those appointments, we realized that some did not have any end-date (EndTimeProperty) set. We edited the endtime of the meeting. In doing so, an update to the meeting was sent out to all of the attendees.

     

    WHAT ELSE

    If ContentConversion or Pipeline Tracing do not help, you could use the tracing features from the Exchange Troubleshooting Assistant (EXTRA) to potentially isolate the problem.

    1. Start > Run > EXTRA
    2. Select Trace Control from the Tasks
    3. Select Manual Trace Tags
    4. Select all types & select the following tags for the STORE component: tagPFReplHier, tagPFReplInbound, tagPFReplInit, TagAccessDenied, tagAccessDeniedDetails, tagDeliverMail, tagDispatchRpcCalls, tagDispatchRpcReturns, tagError, tagPFDLocalDelivery, & tagRpcIntError
    5. Replicate PF content again and then stop tracing
    6. Review the trace logs

    This should get you closer to the problem…

    Doug

  • Considerations for Exchange 2007 CCR Stretch Clusters

    Since Exchange 2007 deployed, I’ve been asked many times what should be considered when designing a stretch Cluster Continuous Replication (CCR) server configuration. A stretch CCR is typically where you have 1 node in 1 physical datacenter and the other CCR node in a separate physical datacenter, typically connected over a private WAN link. This blog provides some of those considerations.

    Consider a stretch CCR environment that spans 2 different data centers.

    Consider the following potential issues when using stretched CCR nodes across 2 datacenters -&- same AD Site:

    Network

    • If using Windows 2003 operating system, each node must be on the same subnet. If using Windows 2008, cluster nodes can span across multiple subnets.
    • With Windows 2003 Cluster server, cluster communication needs to be .5 seconds for round trip communication. With Windows 2008, this is a little more flexible but does not improve performance.
    • For many configurations, testing has shown that that network latency between CCR nodes should be below 50ms.

    Exchange Features

    • CCR allows for a maximum of 2 physical nodes running Exchange 2007 per cluster.  Whereas Standby Continuous Replication (SCR) can have multiple targets
    • Only the Exchange Server 2007 Mailbox Role can be on a CCR server. Since the mailbox role needs to have a HUB and CAS server available in the local AD site, this means that you need to place additional hardware in both datacenters to support the HUB and CAS role. This is in case 1 datacenter fails – the other can continue with all roles.
    • If planning on using Public Folders, know the limitations of PF on CCR (Planning for Cluster Continuous Replication)

    Active Directory

    • Per Planning for Cluster Continuous Replication, all CCR nodes MUST be in the same AD Site.
    • Typically, Active Directory communication is controlled based on the AD Site design. Since both datacenters are using the same AD site, there is a risk that servers and clients in 1 datacenter may be communicating to domain controllers in the secondary datacenter.

    Mail Flow

    • By default, the Mailbox Server’s Microsoft Exchange Mail Submission Service load balances notification events between the Hub Transport servers that are located in the same AD site as the Mailbox Server. In a stretch cluster configuration, the Mailbox Server could be talking to a HUB server in the remote datacenter. This could generate additional network traffic and make it more difficult to control scheduled outages (i.e. HUB server reboot could impact production mailbox servers). To reduce this risk, additional configuration of the Mailbox Server may be required using the Set-MailboxServer –SubmissionServerOverrideList parameter.

    Client Access

    • In most scenarios, a Client Access Server (CAS) server will be required to provide access to Outlook Anywhere, Availability Service, Autodiscovery, POP3/IMAP4, OWA, and Activesync.  For redundancy and load balance purposes, you might need multiple CAS servers in each datacenter.
    • Exchange 2007’s Autodiscovery service will automatically detect which Client Access server is closest to the user's mailbox based on AD site design.  If it selects the CAS server in the secondary datacenter, this would create additional network traffic and might generate client performance issues.

    Maintenance & Monitoring

    • If monitoring both CCR nodes from 1 datacenter, this might generate additional network traffic to be sent over the WAN link.
    • Performing routine maintenance of a node (i.e. install service pack) might require you to failover to the secondary datacenter. This forces you to perform additional tasks for scheduled outages which might cause longer delays in restoring service back to the original datacenter.

    Backups

    • In most environments, the primary backup solution is located in the primary datacenter works on more than just Exchange. Exchange 2007 CCR allows you to take backups of the passive node using an Exchange-aware Volume Shadow Copy Service (VSS)-based solution.
      • If taking a backup of the passive node from the primary datacenter, this will require a lot of additional network traffic over the WAN link during the backup cycle.
      • If you are backing up the passive node from the secondary datacenter but need to restore the data to a Recovery Storage Group (RSG) running in the primary datacenter, additional traffic would be generated. This traffic might require you to perform this restore outside of normal business hours, where in most RSG scenarios, the process can be done during normal business hours with no impact to production.
      • If backing up the active node from the primary datacenter, the backup cycle has much more impact to those online production users and services whereas a backup of the passive node has no impact.

    Those considerations listed above are the common ones that I run into. You may find other considerations for this type of design as it applies to your environment.  At this point, you might be asking, “OK, so what should I consider as a replacement option?”  For most scenarios, the requirements are typically:

    • To provide high availability & redundancy in primary datacenter
    • Provide Site resiliency between 2 physical datacenters with minimal downtime and configuration required
    • Have a simplistic design
    • Reduced cost and overhead wherever possible
    • Perform very few site failovers (only when required or once/twice a year)

    With that in mind, here is a common solution:

    This solution puts the CCR nodes in the primary datacenter and uses SCR to replicate the data to the secondary datacenter.

    SCR can provide the remote datacenter failover capability that most are looking for.  That might be a better solution than trying to geo-cluster a CCR server.

    Doug