NOTE: The content of this article has been published in the official Exchange 2007 documentation. We recommend that you check the documentation for the must up-to-date version. Please go here:

http://technet.microsoft.com/en-us/library/bb738141.aspx

Introduction

This blog post focuses on the capacity and transactional I/O requirements for both Edge Transport and Hub Transport servers. Having enough capacity is a prerequisite for a highly-available server. After determining your capacity needs, the next step is to understand your transactional I/O requirements to ensure that your storage subsystem can meet your requirements.

Edge Transport Server

The Edge Transport server is your first line of defense from spam, viruses and other unwanted messages. After inspecting incoming messages, it securely routes hygienic mail to the proper Hub Transport server. For more information on the capabilities of the Edge Transport server, see http://technet.microsoft.com/en-us/library/aa996562.aspx.

Edge Transport servers must be designed to meet the capacity and transactional I/O requirements of each organization. It is critical to properly maintain queue growth and to route mail as fast as possible, so that service level agreements are not adversely affected.

Capacity

There are several factors that affect the overall capacity of an Edge Transport server:

  • Message Tracking Logs
  • Protocol Logs
  • Mail Database
  • Connectivity Logs
  • Agent Logs

A minimum of 4 GB of free space must exist (free disk space and free database space) on the drive containing the message queue database, or the transport system will activate back pressure, a system resource monitoring feature of the Exchange Transport service. The default value for back pressure is controlled by the PercentageDatabaseDiskSpaceUsedHighThreshold parameter, which can be modified if necessary. For more information about back pressure, and the options to configure back pressure, see Understanding Back Pressure at http://technet.microsoft.com/en-us/library/bb201658.aspx.

If message tracking logs are enabled (and most companies do enable them), additional capacity will be required. Message tracking capacity requirements depend on the number of messages received by the transport server. You can model your Exchange 2003 log generation rate, and set a hard limit for the number of days to keep data, such as 10 days. At Microsoft, we generate 220 MB of message tracking logs each business day (less on the weekend), and we ensure enough capacity for a week's worth of logs (~1.3 GB for us). Protocol, connectivity, and agent log sizes will vary, depending on the activity. At Microsoft, we generate:

  • 5-15 GB of protocol logs per day on our Edge Transport servers. We also ensure that there is enough capacity for our protocol log quota, which is 15 GB.
  • 100 MB of connectivity logs per day on our Edge Transport servers. We also ensure that there is enough capacity for a week's worth of logs (~600 MB for us).
  • 250 MB of Agent logs per day on our Edge Transport servers. We also ensure that there is enough capacity for a week's worth of logs (~1.5 GB for us).

Transaction logs do not require much disk capacity because normal log creation is limited by the use of circular logging. As a result, transaction logs can be placed on the LUN containing the operating system. At Microsoft, we use a two-disk mirror for this LUN.

The database (mail.que) does not store items indefinitely, and the capacity reserved should be the average message size multiplied by the maximum queue, in the case where the queue is at maximum and the server is shut down. A 500,000 item queue with an average message size of 50 KB is approximately 25 GB of data in the database.

Edge Transport servers that run anti-virus scans on incoming mail will need an enough space for the anti-virus quarantine. The disk I/O resource requirements depend on the percentage of incoming mail that is infected with viruses, which is typically very small. Depending on the quantity of infected messages and attachments, and how long they remain in quarantine dictates the amount of space that quarantine will require. 1 GB is a good starting point, although each organization's actual needs are different.

Database Growth Factor

For most Edge Transport server deployments, we recommend that you add an overhead factor (aka "fluff factor") of 20% to the database size (after all other factors have been considered). This value will account for internal structures within the database and ensure adequate headroom should a spike or change in mail flow results in database size growth.

Example

In this example we will place the transaction logs on the operating system partition (C:), which is hosted by a battery-backed, caching RAID controller. The capacity requirements will be small (in the megabytes).

Step 1: Database Size

Let's start with an Edge Transport server that receives an average of 5 messages per second (with an average size of 50 KB) over a 24-hour period, with a maximum queue of 500,000 items.

Queue Maximum

Queue Capacity

Protocol Logs

Message Tracking Logs

Antivirus Quarantine

Connectivity Logs

Agent Logs

Free Space

Total Size on Disk

500,000

~25 GB (500,000 * 50 K)

15 GB

1.3 GB

1 GB

600 MB

1.5 GB

4 GB

58 GB (48 GB + 20%)

Step 2: Transaction Log Size

Transactional I/O

If the server has enough available memory, incoming mail will be stored in RAM and the transaction log, minimizing the disk impact. When memory resources are low, only the first 128 KB of the message is stored in memory and the transaction log; the rest of the message is stored in the database. During content conversion, data is streamed to the temp directory (%TEMP%). Because of this, it is important to place the temp directory on the same LUN as the database. When there isn't a large growing queue, very few of the disk I/Os will be reads. When a queue is present, the message may not be in the database cache, therefore requiring more disk I/O. It will be important to set your storage controller cache to 50% read and 50% write.

Other Disk I/O

Enabling message tracking logs requires an additional 2-5% overhead on disk I/O.

Enabling protocol and connectivity logs has a small overhead on disk I/O that depends on the amount of incoming mail.

Enabling the default agent logs has a small overhead on disk I/O, though if custom agents are in use, more disk resources may be required.

Anti-spam and antivirus operations occur in memory, requiring more CPU resources. Be sure to test your Edge Transport servers with all of the services running during the test that you expect to use in production.

Database IOPS per Message

In our internal testing, we used an average message size of 60 KB. Many organizations size their transport servers with a particular message rate in mind, for example, 20 messages per second. It would require 140 (20 x (4.5+2.5)) database I/Os and 220 (20 x 11) log I/Os to service an incoming message rate of 20 messages per second.

When a queue forms, more reads are required, particularly in the case of RAID10, as every physical disk responds to the read requests.

Edge Transport Database I/O (steady state)

Edge I/O

Total IOPS/Message (~60 KB)

~ 18

Log Write I/Os per message (sequential)

~ 11

Database Write I/Os per message (random)

~ 4.5

Database Read I/Os per message (random)

~ 2.5

Note: These numbers are averages of many servers in production with variances up to +/- 30%. Extra features such as Journaling and transport rules also have an impact on the expected I/O per message, and these features would affect the example production numbers I have provided.

Apply to the Hardware Design

Once you have your capacity and transactional I/O requirements, you can apply them to a proposed hardware design. For processor and memory configurations see, Planning Processor and Memory Configurations. When designing an Edge Transport server it is important to have enough RAM (each message needs 8-9 KB of memory) in the system to prevent the temporary caching of queued message bodies to disk.

Edge Transport uses an ESE database, and it is important, for best performance, to separate the log and database files on their own physical disks in environments where there will be a large queue. In smaller deployments with lower disk I/O requirements, it may be feasible to place both the transaction logs and the database on the same LUN. Edge Transport, like the Mailbox server, requires I/O response times that are less than 20 ms.

In my last blog, Configuring, validating and monitoring your Exchange 2007 storage, I talked about the importance of choosing the storage technology that will best meet your needs. It is important to use battery-backed, caching RAID controllers, and to run database maintenance nightly, and that the chosen disk type that will provide the right balance of capacity and performance.

Example:

This example illustrates how to design your storage around the expected messages per second. In this example, we have an Edge Transport server that handles 20 messages per second, requiring 140 IOPS for the database LUN and 220 IOPS for the log LUN. Always add a 20% growth factor for disk I/O performance to handle heavier than normal days. The disk layout is RAID10.

DISKS (1) & (2) – RAID1

DISKS (3) & (4) & (5) & (6) – RAID10

Transaction Logs/System
220+20%=264 IOPS

DB, Protocol & Message Tracking Logs, AV Quarantine
140+20%=168 IOPS

This example has a database LUN capacity requirement of approximately 70 GB for a week of data. You should double the capacity requirement to 140 GB if you require 2 weeks' worth of data. Using 146 GB physical disks, would allow a LUN of 292 GB in a RAID10 configuration.

Hub Transport Server

The Hub Transport server routes mail to the proper mailbox server, it also includes a transport dumpster for those storage groups in a cluster continuous replication (CCR) environment. For more information about the Hub Transport server role, see http://technet.microsoft.com/en-us/library/aa998616.aspx.

The Hub Transport server must be designed to meet the capacity and transactional I/O requirements of the organization.

Capacity

As with the Edge Transport server, a minimum of 4 GB of free space must exist (free disk space and free database space), on the drive containing the message queue database, or the transport system will activate back pressure. You can modify the default value for PercentageDatabaseDiskSpaceUsedHighThreshold on Hub Transport servers, as well.

Message tracking log capacity depends on the number of messages received by the transport server. You can model your Exchange 2003 log generation rate, and set a hard limit for the number of days to keep data, such as 10 days. At Microsoft, we generate 700 MB of message tracking logs each business day (less on the weekend) on our Hub Transport servers, and we ensure enough capacity for a week's worth of logs (~4.5 GB for us).

Protocol log sizes vary depending on the activity. At Microsoft, we generate 2.7 GB of protocol logs per day on our Hub Transport servers, and we also ensure that there is enough capacity for a week's worth of logs (~16 GB for us).

Transaction logs do not require much disk capacity because normal log creation is limited by the use of circular logging. As a result, the transaction logs can be placed on the operating system LUN. At Microsoft, we use a two-disk mirror for this LUN.

The database (mail.que) does not store items indefinitely, and the capacity reserved should be the average message size multiplied by the maximum queue, in the case where the queue is at maximum and the server is shut down. A 500,000 item queue at an average message size of 50 KB is approximately 25 GB of data in the database.

Database Growth Factor

For most Hub Transport server deployments, we recommend that you also add the extra "fluff factor" of 20% to the database size (after all other factors have been considered).

Transport Dumpster

Special consideration is necessary for Hub Transport servers in sites containing clustered mailbox servers deployed in a CCR environment. When deploying CCR, care should be taken to design your Hub Transport server with enough capacity to store mail long enough for all storage groups in its site in a CCR environment, so that messages can be recovered in the event of an unscheduled outage of the active node; this feature is known as Transport Dumpster. For more information on how the Transport Dumpster works, see http://technet.microsoft.com/en-us/library/bb124521.aspx.

The I/O overhead of the transport dumpster is similar to growing a queue. There are 2 settings you can use to control how long a message stays in the transport dumpster: MaxDumpsterSizePerStorageGroup and MaxDumpsterTime. The default value for MaxDumpsterSizePerStorageGroup is 18 MB. To size the transport dumpster properly for your environment, take your largest acceptable message size, and increase that by 50%. For example, if the message quota was 10 MB, you would want to set the MaxDumpsterSizePerStorageGroup to 15 MB. If there is more than one Hub Transport server in the same Active Directory site as the clustered mailbox server in the CCR environment, then the aggregate storage for the storage groups on that clustered mailbox server is spread across all Hub Transport servers. For example, if you have 4 Hub servers with a 15 MB dumpster, there would be a 60 MB dumpster for that storage group.

For organizations without message size limits, we recommend that you set MaxDumpsterSizePerStorageGroup to a value that is 1.5 times the average size of messages sent within the organization. Also, if a maximum message size is not set, you cannot guarantee to get that message back after an unscheduled failover in a CCR environment.

It is recommended that MaxDumpsterTime be set to 7 days, which is the default value.

The capacity consumed by the dumpster should be sized at the number of storage groups with transport dumpster enabled multiplied by the maximum dumpster size. If the maximum dumpster size is 15 MB, and the Hub Transport server services 100 storage groups in a CCR environment, then 1.5 GB should be allocated for the transport dumpster.

Example:

In this example, we will place the transaction logs on the operating system partition (C:), which is hosted by a battery-backed, caching RAID controller. The capacity requirements will be small (in the megabytes).

Step 1: Database Size

Let's start with a Hub Transport server that receives an average of 5 messages per second over a 24-hour period, with a maximum queue of 500,000 items.

Queue Maximum

Queue Capacity

Protocol Logs

Message Tracking Logs

Transport Dumpster

Total Size on Disk

500,000

~25 GB (500,000 * 50 KB)

15 GB

4.5 GB

 1.5 GB

55 GB  (46 GB + 20%)

Step 2: Transaction Log Size

Transactional I/O

The same guidance on transactional I/O listed earlier for Edge Transport servers applies to Hub Transport servers, as well. As mentioned previously, it is especially important to configure the cache settings on your storage controller as follows: 50% read, 50% write.

Transport Dumpster

When enabling the transport dumpster for storage groups in a CCR environment, the disk I/O increases. While database writes increase, database reads now also occur, which on Microsoft's production servers, averages about 3 reads per message.

Other Disk I/O

The same guidance on other disk I/O listed earlier for Edge Transport servers applies to Hub Transport servers, as well. As mentioned previously, it is especially important to test your Hub Transport servers with all of the services running during the test that you expect to use in production.

Database IOPS per Message

In our internal testing, using an average message size of 40 KB, we found that enabling the transport dumpster requires more disk resources on the Hub Transport server. Many enterprises size their transport servers with a particular message rate in mind, for example, 20 messages per second. If the transport dumpster is enabled, it would require 200 (20 x (7+3)) DB I/Os and 140 (20 x 7) Log I/Os to service an incoming message rate of 20 messages per second. With the transport dumpster disabled, it would require 40 (20 x 2) database I/Os and 40 (20 x 2) log I/Os to service an incoming message rate of 20 messages per second.

When a queue forms more reads are required, particularly in the case of RAID10, as every physical disk responds to the read requests.

Hub Transport Database I/O (steady state)

 

 

Transport Dumpster Enabled

Transport Dumpster Disabled

Total IOPS/Message (~40 kb)

~ 17

~ 4

Log Write I/Os per message (sequential)

~ 7

~ 2

Database Write I/Os per message (random)

~ 7

~ 2

Database Read I/Os per message (random)

~ 3

~ 0

Note: These numbers are averages of many servers in production with variances up to +/- 30%. Extra features such as Journaling and transport rules will have an impact to the expect I/O per message, and this features would impact the example production numbers I have provided.

Apply to the Hardware Design

Once you have your capacity and transactional I/O requirements, you can apply them to a proposed hardware design. As with Edge Transport servers, for processor and memory configurations for Hub Transport servers, see Planning Processor and Memory Configurations. When designing a Hub Transport server it is important to have enough RAM (each message needs 8-9 KB of memory) in the system to prevent the temporary caching of queued message bodies to disk.

Hub uses an ESE database, and it is important for best performance, to separate the log and database files on their own physical disks in environments where there will be a large queue, or when using the transport dumpster. For smaller deployments with lower disk I/O requirements, it may be feasible to place both the transaction logs and the database on the same LUN. Hub Transport, like the Edge Transport, requires I/O response times that are under 20 ms.

Example:

It is important to design your storage around the expected messages per second. In this example, we have a Hub Transport server that handles 20 messages per second with transport dumpster disabled, requiring 40 IOPS for the database LUN and 40 IOPS for the log LUN. Always add a 20% growth factor for disk I/O performance to handle heavier than normal days. The disk layout would be RAID1. This example has a database LUN capacity requirement of approximately 55 GB for a week of data. You should double the capacity requirement to 110 GB if you require 2 weeks' worth of data. Using 140 GB physical disks would provide a database LUN of 140GB in a RAID1 configuration and a log LUN of 140 GB in a RAID1 configuration.

DISKS (1) & (2) – RAID1

DISKS (3) & (4) – RAID1

Transaction Logs/System
40+20%=48 IOPS

DB, Protocol & Message Tracking Logs, AV Quarantine
40+20%=48 IOPS

In this next example, we have a Hub Transport server, with the transport dumpster enabled, that handles 20 messages per second. This configuration requires 200 IOPS for the database LUN and 140 IOPS for the log LUN, plus the extra 20% growth factor. The disk layout is RAID10. This example has a database LUN capacity requirement of approximately 55 GB for a week of data, or 110 GB if 2 weeks' worth of data is required. Using 140 GB physical disks would provide a database LUN of 280 GB in a RAID10 configuration and a log LUN of 140 GB in a RAID1 configuration.

DISKS (1) & (2) – RAID1

DISKS (3) & (4) & (5) & (6) – RAID10

Transaction Logs/System
140+20%=168 IOPS

DB, Protocol & Message Tracking Logs, AV Quarantine
200+20%=240 IOPS

- Robert Quimbey