Follow us on Twitter
Follow us on YouTube
Would you like to suggest a topic for the Exchange team to blog about? Send suggestions to us.
EDIT 8/6/2008: Please see this post for additional information related to Hub throughput and different message sizes.
Large message size: effect of transport database cache size on throughput.
Recently one of our support engineers came to us requesting performance data for a client deploying Exchange 2007 SP1 (E2K7 from now on).
The client wanted to know what level of steady state throughput was achievable by a Hub Transport server receiving 4 widely different average message sizes:
We had some of the data but needed to complete the table, so we employed the test bed used to measure transport performance for E2K7 and E2K7 SP1.
Hub Hardware: 2 processors x 2 core, 2.2 GHz, 800 MHz FSB, 1MB L2 cache per core, 4 GB RAM, 400 MHz memory, Ultra 3 SCSI disk controller ("entry level") with 128 MB Write-Back Cache, 3 x Ultra320 Universal SCSI 15K RPM disk.
Optimized E2K7 transport database queue configuration:
Transport dumpster is not being used in this environment: 1 Hub and 1 Mailbox without replication.
Mailbox Hardware: A "good" Mailbox server with enough CPU cycles and storage bandwidth to accept message delivery without slowing down the Hub.
Gigabit network.
The battery of tests was based on the benchmarking automation we used during Exchange 2007 development, changing the "message mix" for each test to inject a different average message size (25KB, 1MB...).
The benchmarking infrastructure is designed to inject messages into transport through a SMTP receive connector, at constant speed, seeking a steady state throughput, while monitoring baseline performance counters.
Ideally the test stabilizes after few minutes of "warm up" flow when the DB cache reaches a stable size (128MB if using the default DatabaseMaxCacheSize setting). Steady state is achieved by looking at:
Yes, I said "ideally," but sometimes the test doesn't stabilize: throughput oscillates reaching 0 frequently or a queue builds-up (Remote, Delivery or Submission queues).
Then you have to work a bit to understand why. Start the investigation by looking at the server EventLogs.
One possibility is heavy resource pressure in which case transport decides to apply back pressure on the system, indicated by Event Log ID 15004. Looking at the event you will find details on what resource is under strain. You can see an example of this in the 3rd test of the suite shown below.
Then you have to diagnose why server went into backpressure, like 3rd test did below. At the end of the post you'll find some more data what to look for when analyzing performance bottlenecks.
Cache Size
128MB DB Cache
512MB DB Cache
Limiting Resource
Test 1
CPUBound
Test 2
IOBound
Test 3
Configuration Bound*
Test 4
Test 5
Message Size
25KB
1MB
5MB
10MB
SMTP Receive Throughput (msg/sec)
159.32
14.05
0.40
2.03
1.34
Aggregate Queue length (MAX)
329
63
29
27
2
Queue size in MB (MAX)
8.65
64.51
148.48
138.24
20.48
%CPU
69.86
56.03
15.37
48.00
40.03
Msg Cost (MCyc/msg)
38.68
351.07
3131.76
2068.69
2591.67
Msg Cost (MCyc/ByteOfMsg)
1470.72
342.84
611.67
404.04
253.09
Disk Writes/sec (log)
92.80
185.00
133.00
181.00
Disk Writes/sec (queue)
35.30
729.00
876.00
622.00
Disk WriteKB/sec (log)
9,876
32,800
23,279
30,796
Disk WriteKB/sec (queue)
1,086
23,900
19,788
23,556
Disk Writes/msg (log)
0.58
13.17
64.53
138.17
Disk Writes/msg (queue)
0.22
51.89
425.04
474.81
Disk WriteKB/msg (log)
61.99
2,335
11,295
23,508
Disk WriteKB/msg (queue)
6.82
1,701
9,601
17,982
Disk Reads/sec (log)
0.00
Disk reads/sec(queue)
567
*Back pressure, High Version Buckets: Event Log ID 15004
In the 3rd test, with transport service rapidly transitioning on and off from back pressure, disk counters show a heavily serrated pattern; therefore averages are not computed accurately by perfmon. In this case the inaccurate values were left out of the chart.
Nevertheless, throughput on that test is computed by the following ratio: (Total Messages Received)/(Test Duration), so it's accurate. See below for summary data that compares the two 5MB runs.
After testing the first 2 message sizes (25KB and 1MB), we couldn't reach steady state throughput on the 3rd and 4th test with default server settings.
Attempting to inject steady flow of the large messages (5MB) triggered back pressure, with the well known Event Log ID 15004, claiming version buckets are above high watermark.
The first suspect to examine when version buckets are high is disk I/O performance. We immediately discovered that the flow of large messages contributes to a large queue length. In this case, the queue "only" contained 29 messages, but with the large message size being received this translates to 149MB on the queue overflowing the database cache default size of 128MB.
In the table above, notice that the queue size (in MB) never approached the DB cache size in previous. Looking at the disk counters we found that the overflowing of the cache triggered a large amount of disk reads, which don't appear in the regular steady state tests.
To avoid overflowing the cache and triggering back pressure, we decided to experiment with increasing the transport DB cache size. Initially we tested with a 1GB cache, but found that 512MB (up from the default 128MB) was enough to eliminate the overhead of additional disk reads associated with the flow of very large messages.
Here is a fragment from the EdgeTransport.exe.config file that shows the changes made:
<configuration> <runtime> <gcServer enabled="true" /> </runtime> <appSettings> <!-- Optimized Transport DB storage --> <add key="QueueDatabasePath" value="e:\data\"/> <add key="QueueDatabaseLoggingPath" value="c:\logfiles\"/> .... <!-For very large message test: commented default 128M DB Cache --> <!-- add key="DatabaseMaxCacheSize" value="134217728" / --> <!-Using 512M DB Cache: --> < add key="DatabaseMaxCacheSize" value="536870912" /> ... </appSettings>
Additionally, here are a few more interesting statistics for the test that triggers back pressure: 31% of the time server is not receiving messages; the throughput for the non back pressure windows is only 0.57 msg/sec, compared to steady 2.03 msg/sec when back pressure is avoided by using a bigger DB cache.
5MB message size stats â Back pressure vs. Steady state
Database Cache size (MB)
128
512
Duration (min)
20
Total Messages Received
475
2436
# of Transitions into back pressure
41
0
Total Minutes in back pressure mode.
6.17
% of Time in back pressure
31%
0%
Max back pressure Windows (sec)
65
Average Throughput (msg/sec)
Throughput for the non back pressure intervals
0.57
Bill Thompson, from the Exchange Center of Excellence, on his New maximum database cache size guidance for Exchange 2007 Hub Transport Server role blog post has the official guidance on the DatabaseMaxCacheSize settings to use.
A disclaimer: storage is key for transport performance, all the above data only applies to a Hub server with at least an "entry level" SCSI controller with 128 MB of BBWC (battery backed write-back cache) that optimizes the IO pattern transport performs on steady state flow: continuous writes with very few or no reads.
Some useful counters when doing E2K7 transport benchmarking:
1. Throughput counters
MSExchangeTransport SmtpReceive(_total)\Average bytes/message MSExchangeTransport SmtpReceive(_total)\Messages Received/sec MSExchangeTransport SmtpSend(_total)\Messages Sent/sec MSExchange Store Driver(_total)\Inbound: MessageDeliveryAttemptsPerSecond MSExchange Store Driver(_total)\Inbound: Recipients Delivered Per Second MSExchangeTransport Queues(_total)\Messages Queued for Delivery Per Second MSExchangeTransport Queues(_total)\Messages Completed Delivery Per Second
2. Queue counters, others
MSExchangeTransport Queues(_total)\Aggregate Delivery Queue Length (All Queues) MSExchangeTransport Queues(_total)\Active Remote Delivery Queue Length MSExchangeTransport Queues(_total)\Active Mailbox Delivery Queue Length MSExchangeTransport Queues(_total)\Submission Queue Length MSExchangeTransport DSN(_total)\Failure DSNs Total MSExchangeTransport Dumpster\Dumpster Size MSExchange Database(edgetransport)\Database Cache Size (MB) MSExchange Database(edgetransport)\Version buckets allocated
3. Accessory counters to diagnose if CPU, Disk bound, Network, see Bottleneck-Detection Counters
PhysicalDisk(_Total)\Current Disk Queue Length PhysicalDisk(_Total)\Disk Writes/sec PhysicalDisk(_Total)\Disk Reads/sec PhysicalDisk(_Total)\Avg. Disk sec/Write PhysicalDisk(_Total)\Avg. Disk sec/Read Processor(_Total)\% Processor Time Process(Edgetransport)\% Processor Time Process(Edgetransport)\Private Bytes Memory\Available MBytes Network Interface\ Bytes Total/sec .....
If you're wondering how the results differ for other average sizes, we'll be posting more data on some other sizes (40KB, 70KB) later, so stay tuned.
We are currently testing servers with different storage: SATA disk, 7200 RPM, without the advantage of BBWC. More data on this scenario will be coming in a future blog post.
Elias Kaplan