Hi again,

I think the post title is pretty self explanatory.

Just to clarify it a little, the customer who hit this problem found that

  1. A work-around was to run the service as LocalSystem.
  2. Mails between mailboxes on the same server would not be delivered while running the Hub Transport service as NetworkService.
  3. Those messages would move to the Sent Items folder in Outlook running in cached mode and would sit in Drafts and be shown in italics in OWA and Outlook in online mode.
  4. Messages from the internet coming in would always be delivered just fine.

OK, so what could be causing the problem?

Well, let’s first define what the 3 built-in security contexts for running services are and how they differ from each other:

 

Account

Local

Permissions

Can act

on the network

LocalService

Limited

No

NetworkService

Limited

Yes

LocalSystem

Full

Yes

 

So any service running as LocalService cannot use the computers identity on the network and cannot authenticate to domain-joined resources on the network (and networked computers cannot authenticate to this service). A service running as NetworkService can do this authentication with remote resources. Both of these accounts have very limited permissions to access files and registry keys on the local system.

LocalSystem has no restrictions on the local computer and also has the ability to authenticate on the network and have networked computers authenticate with it. This is a bad context to use for the Hub Transport service as it has too much access on the local server.

So my first thought was that because LocalSystem works when sending messages and NetworkService does not work, then it wouldn’t be a problem regarding network authentication because both of these profiles support authenticating on the network. So it would be a local permission problem. Process Monitor from Sysinternals is a great tool for highlighting missing permissions to local resources.

We shutdown all but 1 Hub Transport servers and on the remaining Hub Transport server we started the Hub Transport service as NetworkService. We then started up Process Monitor with a filter to show only events where RESULT = ACCESS DENIED like this:

image

But when sending an email from one mailbox to another, it didn’t record any actions which were very interesting at all.

So, back to the drawing board. What about the scenario we excluded at the start? Network authentication. Well, what is happening with authentication is that the Mailbox server is trying to authenticate to the Hub Transport server to let it know that there are new messages that it needs to process.

To get started we need to know how it’s authenticating and are there any problems during authentication. We looked at the Security event logs on both the Mailbox server and the Hub Transport server focusing on the time the test message was sent. What we saw were “Audit Failure” with an Event ID 4265. The interesting parts of the event were that Kerberos was attempted, the SID authenticating was NULL and the error was “invalid key”.

We need to know which Kerberos tickets were in use for the LocalSystem logon session on the Mailbox server (we know that the information Store service starts as LocalSystem from here). We ran LogonSessions.exe from Sysinternals and got an output like this:

C:\>logonsessions.exe

Logonsesions v1.21
Copyright (C) 2004-2010 Bryce Cogswell and Mark Russinovich
Sysinternals - wwww.sysinternals.com


[0] Logon session 00000000:000003e7:
    User name:    CONTOSO\SERVER-1$
    Auth package: Negotiate
    Logon type:   (none)
    Session:      0
    Sid:          S-1-5-18
    Logon time:   10/10/2012 12:04:25
    Logon server:
    DNS Domain:   contoso.com
    UPN:          SERVER-1$@contoso.com

[1] Logon session 00000000:0000ae9f:
    User name:
    Auth package: NTLM
    Logon type:   (none)
    Session:      0
    Sid:          (none)
    Logon time:   10/10/2012 12:04:25
    Logon server:
    DNS Domain:
    UPN:

[2] Logon session 00000000:000003e4:
    User name:    CONTOSO\SERVER-1$
    Auth package: Negotiate
    Logon type:   Service
    Session:      0
    Sid:          S-1-5-20
    Logon time:   10/10/2012 12:04:26
    Logon server:
    DNS Domain:   contoso.com
    UPN:          CONTOS-1$@contoso.com


[3] Logon session 00000000:000003e5:
    User name:    NT AUTHORITY\LOCAL SERVICE
    Auth package: Negotiate
    Logon type:   Service
    Session:      0
    Sid:          S-1-5-19
    Logon time:   10/10/2012 12:04:26
    Logon server:
    DNS Domain:
    UPN:

The first entry [0] is LocalSystem using Kerberos. Next [1] is NTLM authentication. Then [2] is NetworkService and lastly [3] is LocalService which has no ability to authenticate. So the logon session ID we want to target is 0x3e7 which I’ve highlighted above.

We then ran klist tickets –li 0x3e7 on the Mailbox server to view the Kerberos service tickets held by the LocalSystem logon identity. This service will need a Kerberos ticket which is valid on the Hub Transport server. There was indeed a service ticket which was valid on the Hub Transport server (i.e. the encryption type AES256) was relevant as all Exchange servers are running on Windows Server 2008 SP2, the valid date range was correct and the clocks were in sync. So everything looks OK and the Mailbox server should be able to authenticate with the NetworkService logon session on the Hub Transport server. But it can’t. Why? Because the key which NetworkService on the Hub Transport servers should have been able to use to decrypt the incoming authentication message (the Kerberos service ticket) from the Mailbox server was broken for NetworkService, as explained here:

http://support.microsoft.com/kb/2566059

The domain functional level was at Windows Server 2003 and there was 1 2003 DC remaining in the domain, meaning that the pre-authentication key is encrypted using RC4 as newer AES128 and AES256  are not understood by 2003 DCs. When the first Windows Server 2008 member servers were added, they were these Exchange 2007 servers. The 2003 DCs started logging errors each time one of these 2008 clients requested a TGT or a Service Ticket because they would request it as AES256, which the 2003 OS didn’t understand. It would then negotiate down to RC4 and just work. In the mean time the 2003 DCs logged an error in the System event log about not understanding AES256.

As a workaround to prevent the errors from filling the event logs on the 2003 DCs and from filling the monitoring application window, they implemented a reg key on the Exchange servers to force them to always request RC4 encrypted tickets. They found this hint on a 3rd party user forum site.

We removed this key on the Mailbox servers:

HKLM\System\CurrentControlSet\Control\Lsa\Kerberos\Parameters\DefaultEncryptionType

We then removed all the Kerberos tickets which were cached on the Mailbox server using this command:

klist purge –li 0x3e7

And we then verified that this hotfix was installed on the remaining 2003 DCs so that they wouldn’t log the errors which flooded the event viewer causing them to implement the key we removed:

http://support.microsoft.com/kb/948963

We couldn’t install the hotfix mentioned in KB2566059 on the Mailbox servers as they were running on Windows Server 2008 SP2 and the hotfix was only built and released for Windows Server 2008 R2 as was not back-ported to Windows Server 2008 SP2. So the hotfix was not available to us.

 

 

As a final note, why did internet messages work? Well, those messages are coming from unauthenticated senders – on the internet. Messages travelling from one mailbox to another are coming from one authenticated user to another. So authentication must be working for mailbox-to-mailbox messaging. But unauthenticated messages from the internet just worked.

 

I hope this helps someone else in their troubleshooting the future.