Deploying an Exchange DAG is the cornerstone of a highly available and site resilient messaging infrastructure. If you are reading this you should well know that an Exchange DAG is a group of no more than 16 mailbox servers that can exist in different subnets as well as different geographic regions.
In some environments there will likely be underlying network issues that are beyond your control as an application owner. Network issues such as latency and micro connectivity outages between data centers can prove to be an Exchange Administrators greatest headache. The results can result in unexpected failovers wreaking havoc on your SLA.
So what can you do about this? Although Microsoft has said in the past not to mess with failover clustering for your Exchange deployment, it has become necessary based on real world experience with the product.
As subnet thresholds are adjusted up, this increases the amount of time it takes to detect a failure. The higher the values, the longer it takes to detect a failure, and therefore the longer it takes to act on that failure. There is a balance between reacting quickly to a failure and providing resiliency to transient networking issues.
The post that he writes was inspired by an older post (2012) by Elden Christensen about tuning failover cluster network thresholds that is still very relevant to Exchange Administrators as they consider either a future deployment or are chasing issues with their current Exchange DAG deployment. Both articles are a very good read!