Exchange DAG tuning

Deploying an Exchange DAG is the cornerstone of a highly available and site resilient messaging infrastructure. If you are reading this you should well know that an Exchange DAG is a group of no more than 16 mailbox servers that can exist in different subnets as well as different geographic regions.

In some environments there will likely be underlying network issues that are beyond your control as an application owner. Network issues such as latency and micro connectivity outages between data centers can prove to be an Exchange Administrators greatest headache. The results can result in unexpected failovers wreaking havoc on your SLA.

So what can you do about this?  Although Microsoft has said in the past not to mess with failover clustering  for your Exchange deployment, it has become necessary based on real world experience with the product.

Tim McMichael  is a well known high availability expert when it comes to Exchange DAG deployments.  He has written a recent post about tuning Exchange high availability where he states:

As subnet thresholds are adjusted up, this increases the amount of time it takes to detect a failure. The higher the values, the longer it takes to detect a failure, and therefore the longer it takes to act on that failure. There is a balance between reacting quickly to a failure and providing resiliency to transient networking issues.

The post that he writes was inspired by an older post (2012) by Elden Christensen about tuning failover cluster network thresholds that is still very relevant to Exchange Administrators as they consider either a future deployment or are chasing issues with their current Exchange DAG deployment.  Both articles are a very good read!

Leave a Reply