I have been involved in a number of discussions recently regarding Outlook performance troubleshooting in the cloud. Mostly these discussions were in the context of why the customer didn't want to move to the cloud since they figured it would be impossible to troubleshoot Outlook performance afterwards
When we have clients and Exchange on terra firma we can monitor some performance counters such as RPC Average Latency on Exchange and use the Outlook and client performance counters to establish if a poor end user experience is being caused by the Exchange Server, the Network or the Client machine. If we move the messaging service out to the Office 365 cloud we can no longer monitor RPC Average Latency so we don't know if poor performance at the client is being caused by network or the Exchange server.
This started me thinking about how to deal with this situation and what items make up the client experience from an Outlook performance perspective.
The following items can both have a fairly dramatic effect on Outlook client performance and either could cause the end customer to pick up the phone to support and say that "E-mail is slow".
If we make the assumption that our service is running in the Office 365 cloud, how do we go about determining the actual cause of Outlook performance problems?
RPC Latency is made up of two parts
Network latency is probably the easiest to examine on the surface since we really just need to use ping.exe to find out what our TCP round-trip-time (RTT) value is to the target server. There is a snag though, as you might expect…
Here is the ping response from my Office 365 server…
Not exactly useful since ICMP Echo is blocked at the external firewall. So, if we can't use ping.exe how do we determine our Network RTT latency? Well, luckily Outlook has us covered here and keeps a track of some stuff that can help us out…
In your task tray you should see an Outlook Icon. If you hold down CTRL + RIGHT CLICK on this icon it will show the Outlook context menu…
From the Outlook Context menu select "Connection Status"
In the Connection Status dialog box find the columns called Avg Resp and Avg Proc. The difference between these two values represents the network latency for each connection.
In this example you can see that I have two logical connections listed as Mail (To see the physical TCP connections use TCPview). This is normal for a cached mode Outlook 2007+ client. One connection is used for item synchronisation and the other is reserved for sending new messages. This architecture prevents sending a large message from blocking Outlook receiving new items like it did in Outlook 2003.
Generally speaking the connection with the larger Req count is for synchronisation which is the one we will use in this example.
This means that my Network RTT time is 77ms and the Server side RPC processing latency is 10ms.
In my example this makes perfect sense since I am based in the UK and my mailbox is hosted on Office 365 in North America. It also shows that my Network and Server latency are within acceptable limits.
Generally speaking I use the following recommendations to maintain a good client experience in Cached mode for Outlook 2007 and later.
Once armed with these values it is possible to direct troubleshooting more specifically. For example, if Network RTT is high you could look at your network links or firewalls. If the Avg Proc time is high then a call to Office 365 support might be in order.
One final point here is to check the Req/Fail column. A high value for Fail represents high number of network disconnection events. If this is combined with a high Avg Proc time it potentially points to a service issue in Office 365, however if Avg Proc is good then it suggests that you may have a network connectivity problem between the client and the service. A common cause of this is source port exhaustion for environments with more than 2000 users.
So what happens if the Network and Exchange RPC metrics are all good but the end customer is still experiencing poor Outlook performance? Since we have ruled out Network and Exchange performance the most likely culprit is the client workstation.
So what could be causing Outlook performance problems on the local workstation?
For this we need to look at the usual trinity of performance areas within the operating system
To take a look at these further I am going to use Process Explorer.
Outlook is generally not that CPU intensive, however if your CPU is flat out doing other stuff then Outlook will respond slowly. To check this, open Process Explorer and arrange the table in descending CPU order.
We are looking for a few things here. Firstly what is our System Idle Process value? This tells us how much CPU time we have spare. Generally speaking if this value is less than 20% the system will feel sluggish. In my example you can see that I have plenty of CPU time available and so it is unlikely CPU is an issue here.
If Outlook appears at or near the top of this list then the most likely culprits are that you have a faulty add-in installed, (try running Outlook in safe mode), or that your OST file is damaged (Try running the Inbox Repair Tool).
To get a better idea of how Outlook is consuming resources, find OUTLOOK.EXE in the Process list, double click it and then open the Performance Graph tab in the properties dialogue box.
This will show some historical values for CPU Usage for the Outlook process. Even a large Mailbox (mine is 10GB) shouldn't require Outlook to take up a large amount of CPU time.
Insufficient RAM has a number of effects on Outlook. Firstly the process can be starved of physical RAM and so run slowly; secondly the operating system will have to page large chunks of memory to disk which will cause disk I/O problems. Since we are going to look at disk I/O in the next section, I will just look at identifying client memory problems here.
It is important to realise that Windows will page out an amount of memory to the page file and this both normal and advantageous. However, where the system has significantly more committed memory than it can accommodate in physical RAM we may run into performance problems as the process accesses its data in virtual memory and instead has to wait for that data to come from disk. This process is known as hard paging. A sustained high level (>5) of hard page faults is a strong indicator that there is not enough RAM in the system.
Unfortunately Process Explorer can't help us here since it shows a combination of hard and soft paging. For this task we are going to need to break out Performance Monitor (perfmon.exe).
Open, Perfmon and then add in the MEMORY\Page Reads/sec counter.
You can see clearly that this system has to retrieve data from the page file frequently. In fact this is from a virtual machine running Windows 7 and Outlook 2010 in 512MB RAM with a 5GB mailbox. Almost every single action performed within Outlook triggers a spike in Page Reads/sec. The user experience is very slow, however if we look at the Exchange Connection status for this client…
This clearly shows that the poor user experience is being driven by the client and not by the server or network and more importantly that a bit of extra RAM is likely to make this customer happy – clever right? J
This is a bit of a soapbox of mine at the moment. As Exchange professionals we go to great lengths to monitor our messaging service on the basis that we want to provide the best user experience. However, the reality is that the most likely cause of poor user experience is accessing a large Outlook OST file stored on an underperforming client system. Over time these OST files generally reach roughly double the size of your mailbox. If we take Office 365 as an example with a 25GB quota, this means that it's not impossible for a user to have a 50GB OST file on their laptop HDD. Let's think about that for a minute… we have a 50GB file, with data in it that we need to access quickly – if that was a Word document or Access database most users would accept a minute or two's delay as it was opened and yet we expect Outlook to open it in 5 seconds or we think something is broken J
Speedy access to a large OST file access relies on two things…
If the hard disk drive is busy doing other things then Outlook in cached mode will perform slowly and deliver a poor user experience.
This is a measure of how many requests are waiting for the disk. Ideally the disk queue length should be no more than 1.5-2 times the number of disks that make up the volume. For most client workstations this means that disk queue length should be <2. As a general rule don't provide older laptops with 5400rpm HDD's very large mailbox sizes, i.e. don't just give everyone a 25GB mailbox without checking to see if their hardware is going to cope J
Since Outlook makes frequent reads and writes to the OST file it can become very fragmented over time. A heavily fragmented OST file (>1000 frags/file) can lead to poor Outlook client performance.
The easiest way to check and defragment the OST file is via the sysinternals tool contig.exe. First close down any applications that may be accessing the OST file, such as Outlook or Lync.
To check the OST file for fragmentation use the command
Note: You must be running contig.exe with elevated privileges to perform defragmentation.
To defragment the OST file use the command.
Outlook performance is just like any other application. It relies on CPU, Memory, Disk and Network. If any of those resources are performing badly then the end customer is likely to experience poor performance. Exactly the same troubleshooting processes apply on-premises or for Office 365 users. The only real difference that applies to Office 365 performance troubleshooting is that we cannot directly observe Exchange performance counters and so we need to rely on the data that Outlook provides us.
Excellent post neil.