Tim McMichael

Navigating the world of high availability...and occasionally sticking my head in the cloud...

MSExchangeRepl 2147 / MSExchangeRepl 2104 / MSExchangeRepl 2127 occurring on Windows 2008 or Windows 2008 R2 with Exchange 2007 Cluster Continuous Replication (CCR)

MSExchangeRepl 2147 / MSExchangeRepl 2104 / MSExchangeRepl 2127 occurring on Windows 2008 or Windows 2008 R2 with Exchange 2007 Cluster Continuous Replication (CCR)

  • Comments 11
  • Likes

When Exchange 2007 CCR is installed on Windows 2008 or Windows 2008 R2 the following error may be noted in the application log of the passive node:

Log Name: Application
Source: MSExchangeRepl
Event ID: 2104
Task Category: Service
Level: Error
Keywords: Classic
User: N/A
Computer: MACHINE
Description:
Log file action LogCopy failed for storage group EXCLUST01\SG2. Reason:
CreateFile(
\\Server\StorageGroupGUID$\LogFile.log) = 2

If the CCR cluster is not utilizing continuous replication host names the following event series may also be noted:

Event ID : 2147
Raw Event ID : 2147
Source : MSExchangeRepl
Type : Error
Machine : SERVER
Message : There was a problem with 'ActiveNode', which is an alternate name for 'ActiveNode'. The list of aliases is now 'ActiveNode', and the alias 'was' removed from the list. The specific problem is 'CreateFile(
\\ActiveNode\StorageGroupGuid$\LogFile.log) = 2'.

ID:       2127
Level:    Information
Provider: MSExchangeRepl
Machine:  SERVER
Message:  The system has detected a change in the available replication networks.  The system is now using network 'ActiveNode' instead of network 'ActiveNode' for log copying from node ActiveNode.

In this situation if the solution is aggressively monitored you may not that replication is temporarily failed and then resumes automatically as healthy.  This occurs due to a temporary pause in replication when the error condition is detected, while the replication service attempts to find other replication paths, and then automatically re-attempts the same copy operation.

If the CCR cluster is utilizing continuous replication host names the following event series may also be noted:

Event ID : 2147
Raw Event ID : 2147
Source : MSExchangeRepl
Type : Error
Machine : SERVER
Message : There was a problem with ‘ReplicationHostName’, which is an alternate name for 'ActiveNode'. The list of aliases is now 'ActiveNode', and the alias 'was' removed from the list. The specific problem is 'CreateFile(
\\ReplicationHostName\StorageGroupGUID$\LogFile.log) = 2'.

ID:       2127
Level:    Information
Provider: MSExchangeRepl
Machine:  SERVER
Message:  The system has detected a change in the available replication networks.  The system is now using network 'ActiveNode' instead of network ‘ReplicationHostName’ for log copying from node ActiveNode.

Error 2 is ERROR_FILE_NOT_FOUND

In this situation the error is detected on the replication host name.  The replication service will temporarily pause replication while other network paths are enumerated.  If other continuous replication host names are in use, the replication serivce will select an alternate replication host name and automatically resume log copying.  If the only path valid is the “public” path, the replication service will begin copying log files over the “public” network.  Eventually this error occurs on the public network, forcing network re-enumeration to occur and replication to automatically switch back to the replication network.  If the solution is aggressively monitored, the replication status may be failed during this switch but will automatically resume healthy.

In almost all incidences these errors are considered benign to the operation of the Exchange Server.

The replication service is extremely aggressive in its attempts to copy log files.  The replication service is always aware of the next log file in the series that requires copying to the passive node.  As part of normal processes the replication service may query multiple times for the presence of this file and make copy attempts.  These attempts may result in the replication service querying for a  log file that is not fully available.  Under Windows 2003 this was not necessarily an issue.  Windows 2008 introduces a component into SMBv2 that may cause this to be a problem.

SMBv2 introduces status caching into the LanManWorkstation service.  When an application requests information from a file share, the workstation service caches the response from the server hosting the share.  Subsequent requests for the same information are returned from cache rather than re-contacting the server hosting the share.  Eventually this cache will expire (in our case it expires by the time replication is failed / resumed <or> a switch between replication host names occur).  The replication service has received feedback that the log file in question should not be available for copy, attempts to copy it, and receives an older return status that the file is not ready (even though the file does exist on the source at the time the attempt is made).  In turn the replication service detects this as an error condition and takes action.

From a Windows 2008 / Windows 2008 R2 perspective this is by design.

To correct these errors on an Exchange 2007 / Windows 2008 <or> Exchange 2007 / Windows 2008 R2 implementation, the following registry keys should be set to a zero (0) value and the nodes rebooted:

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Lanmanworkstation\Parameters

FileInfoCacheLifetime [DWORD]

FileNotFoundCacheLifetime [DWORD]

DirectoryCacheLifetime [DWORD]

If the DWORDs are not present they may need to be created.  The recommended value is HEX / DEC 0.

More information on these keys can be found here: http://technet.microsoft.com/en-us/library/ff686200(WS.10).aspx  (Note that registry path in the article is missing the SERVICES hive – correct path in blog post).

Comments
  • Hello Tim,

    Thanks for the info. Can you please let me know we should create FileInfoCacheLifetime [DWORD 32bit or 64 bit]  for these keys? Thanks in advance!

  • @Harry:

    A 64 bit entry is a QWORD.  We need to use 32 bit or DWORD.

    Tim

  • Is this error likely to occur with SCR replication as well?

  • Hello TIMMCMIC,

    You have mentioned that Error 2 is "ERROR_FILE_NOT_FOUND"

    Do you have any list that has the other error codes.

    can you tell me what are all the error codes & what do they correspond to.

    Daivs

  • @Davis:

    Can you be more specific, the only error code to intepret in this blog post is the error 2.

    TIMMCMIC

  • @Anonymous:

    Your question about whether or not this can apply to SCR.  It can apply to SCR as SCR is monitoring for similar notifications.

    TIMMCMIC

  • Those errors seem to be solved by exchange 2007 SP2 rollup 2 (support.microsoft.com/.../en-us)

  • @Stef:

    If the error is the same as referenced here it will not be fixed in an Exchange rollup.  Editing the keys is the only way to disable the functionaility in windows that causes this condition.

    TIMMCMIC

  • These errors are known to cause any issues with backup and log file truncation ?

    logfiles are not truncating and we only we see these errors once backups are finished.

  • Getting 2026 events along with 2147. After adding the above registry key 2147 events stopped.

    What is this event and how to resolve it.  Exchange 2007 SP2 RU4. The Exchange writer status goes to failed state as well.

    Time:     13-07-2011 15:34:42

    ID:       2026

    Level:    Error

    Source: MSExchangeRepl

    Machine:  

    Message:  The Microsoft Exchange Replication Service VSS writer (instance 5c2e6cec-f200-4714-ad20-37d095e09473) failed with error code C7FF07D7 when preparing for snapshot.

  • Tim, this saved my life today. Thanks.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment