I have searched high and low on this and the first dozen or more pages of google results for a good answer for the following. Even here on EE I have seen dozens of wrong answers accepted as solutions to this problem (see below). The scenario:
Windows Server 2003 with Exchange (one NIC with IP 192.168.100.95) exists in city A on a network connected to the internet through DSL on router 192.168.100.235. From city A to city B (30 miles) is a Motorolla Canopy wireless system (10 Mbps) that connects an office in city B to the same network using the same subnet. No router needed as it is all "LAN" traffic and not a VPN configuration. In city B is a Cable internet connection on router 192.168.100.99. The desire is to have both routers configured on all network nodes as default gateways so that if router 1 goes down then the other will take over.
Because of the geographic dispersity of the backup internet connection, a SonicWall or other dual-WAN router will not help in this scenario.
The (wrong) advice that I keep seeing is "oh it simply can't be done; you are trying to do something that TCP/IP can't do". It's called Dead Gateway Detection and the technology has been around a long time. See below:
and specifically an excerpt:
"If multiple routers are available on the same subnet, configure one (or more) default gateways on the same network adapter."
To be fair, I have only really seen questions phrased as "how do I do this" or "it just isnt' working" and that is not exactly the same as my issue. Configuring multiple gateways is easy enough (it is right there in the TCP/IP properties window which should have been a red flag for the experts saying it was not possible) and by using proper metrics the failover looks like it works. What happens, however, is that when multiple gateways are in place, even with different metrics (I used 1 for router 1 and 2 for router 2), internet connectivity on the Exchange server is hampered. Just browsing sites seems to work perfectly fine in limited testing and the IP returned by IPChicken.com is always the same as the gateway in use (same as router 1 whenever it is active) but email flow into the Exchange server is very hit or miss when both routers are alive. For example, one user found that email arrived at 7am, 10:35am, 2:20pm and 4pm. However, knowing that he normally received a lot more email in a given day, we removed the 2nd gateway from the Exchange server and within minutes the rest of his email for the day, sent at all different times throughout the day, arrived as expected.
One last piece to the puzzle, we use DYNDNS mailhop relay to queue our mail and it is then delivered to a DDNS address with the updater being on the Exchange server. The DDNS updater is working fine and even with both gateways configured the DDNS address did not send an IP update for the hostname so it doesn't look like the server ever tried to failover to the 2nd router. All ping tests to both routers result in zero packet loss.