Vuurvos
asked on
Cluster traffic in Microsoft High Availability clusters routing messup
Hi,
I'm managing a 3-node Exchange DAG cluster (not my own design) that shows exceptional network traffic that I don't understand. I hope someone can tell me what is going on. Each node has 3 network adapters:
Server x has the following IP's assigned to the network adapters.
NIC 1: 10.10.10.x/24, Server network - Default Gateway, DNS Servers and adapter registers in the DNS
NIC 2: 10.20.20.x/24, Replication network - No additional configuration,
NIC 3: 10.30.30.x/24, Backup network - No additional IP configuration, all Microsoft services unbound in the adapter settings.
NIC 1 also has the default gateway. The other networks are dedicated, un-routed, closed vlan's. So they are not attached to the firewall, nor should they ever be. When I observe the firewall logs of the box that is at the default gateway, I see the following traffic dropped.
10.10.10.1:3343 udp 10.30.30.2 :3343
10.10.10.1:3343 udp 10.30.30.3 :3343
10.10.10.2:3343 udp 10.30.30.1 :3343
10.10.10.2:3343 udp 10.30.30.3 :3343
10.10.10.3:3343 udp 10.30.30.1 :3343
10.10.10.3:3343 udp 10.30.30.2 :3343
10.30.30.x is a local network to each of the exchange servers, therefor we can see in the windows routing table that 10.30.30.0/24 network should route over the 10.30.30.x interface.
The big question now is: Why are the server vlan attached nic's trying to reach the adapters in the backup vlan?
- I have checked the DNS zone files and local DNS cache, there are no references on either server to the other two in the dns, cache or local hosts file that refer to the IP addresses on the backup vlan.
- I have checked the IP routing table and all expected networks are properly assigned "on-link" with the matching IP address:
I have been seeing similar traffic from a Hyper-V cluster where packes destined for heartbeat and livemigration VLAN's are send from the server VLAN's as well, so I think this is more a generic Cluster Service issue then that it is an Exchange related thing.
Since I am experiencing a lot of network performance issues, I suspect there is also a part to be found in network packets send from the wrong interfaces, but this example keeps me puzzled. Is there anyone who can explain to me why this is happening?
I'm managing a 3-node Exchange DAG cluster (not my own design) that shows exceptional network traffic that I don't understand. I hope someone can tell me what is going on. Each node has 3 network adapters:
Server x has the following IP's assigned to the network adapters.
NIC 1: 10.10.10.x/24, Server network - Default Gateway, DNS Servers and adapter registers in the DNS
NIC 2: 10.20.20.x/24, Replication network - No additional configuration,
NIC 3: 10.30.30.x/24, Backup network - No additional IP configuration, all Microsoft services unbound in the adapter settings.
NIC 1 also has the default gateway. The other networks are dedicated, un-routed, closed vlan's. So they are not attached to the firewall, nor should they ever be. When I observe the firewall logs of the box that is at the default gateway, I see the following traffic dropped.
10.10.10.1:3343 udp 10.30.30.2 :3343
10.10.10.1:3343 udp 10.30.30.3 :3343
10.10.10.2:3343 udp 10.30.30.1 :3343
10.10.10.2:3343 udp 10.30.30.3 :3343
10.10.10.3:3343 udp 10.30.30.1 :3343
10.10.10.3:3343 udp 10.30.30.2 :3343
10.30.30.x is a local network to each of the exchange servers, therefor we can see in the windows routing table that 10.30.30.0/24 network should route over the 10.30.30.x interface.
The big question now is: Why are the server vlan attached nic's trying to reach the adapters in the backup vlan?
- I have checked the DNS zone files and local DNS cache, there are no references on either server to the other two in the dns, cache or local hosts file that refer to the IP addresses on the backup vlan.
- I have checked the IP routing table and all expected networks are properly assigned "on-link" with the matching IP address:
- I have tried to disable the adapters in cluster manager as an interface to be used for cluster communications, it won't disable as when I disable the interface and then recheck the setting it's back enabled again. The traffic still keeps coming on my firewall.
IPv4 Route Table
==================================== ========== ========== ========== =========
Active Routes:
Network Destination Netmask Gateway Interface Metric
0.0.0.0 0.0.0.0 10.10.10.254 10.10.10.1 261
10.10.10.0 255.255.255.0 On-link 10.10.10.1 261
10.10.10.1 255.255.255.255 On-link 10.10.10.1 261
10.10.10.255 255.255.255.255 On-link 10.10.10.1 261
10.20.20.0 255.255.255.0 On-link 10.20.20.1 261
10.20.20.1 255.255.255.255 On-link 10.20.20.1 261
10.20.20.255 255.255.255.255 On-link 10.20.20.1 261
127.0.0.0 255.0.0.0 On-link 127.0.0.1 306
127.0.0.1 255.255.255.255 On-link 127.0.0.1 306
127.255.255.255 255.255.255.255 On-link 127.0.0.1 306
169.254.0.0 255.255.0.0 On-link 169.254.1.105 261
169.254.1.105 255.255.255.255 On-link 169.254.1.105 261
169.254.255.255 255.255.255.255 On-link 169.254.1.105 261
10.30.30.0 255.255.0.0 On-link 10.30.30.1 261
10.30.30.1 255.255.255.255 On-link 10.30.30.1 261
172.29.255.255 255.255.255.255 On-link 10.30.30.1 261
224.0.0.0 240.0.0.0 On-link 127.0.0.1 306
224.0.0.0 240.0.0.0 On-link 10.10.10.1 261
224.0.0.0 240.0.0.0 On-link 10.20.20.1 261
224.0.0.0 240.0.0.0 On-link 10.30.30.1 261
224.0.0.0 240.0.0.0 On-link 169.254.1.105 261
255.255.255.255 255.255.255.255 On-link 127.0.0.1 306
255.255.255.255 255.255.255.255 On-link 10.10.10.1 261
255.255.255.255 255.255.255.255 On-link 10.20.20.1 261
255.255.255.255 255.255.255.255 On-link 10.30.30.1 261
255.255.255.255 255.255.255.255 On-link 169.254.1.105 261
==================================== ========== ========== ========== =========
Persistent Routes:
Network Address Netmask Gateway Address Metric
0.0.0.0 0.0.0.0 10.10.10.254 Default
I have been seeing similar traffic from a Hyper-V cluster where packes destined for heartbeat and livemigration VLAN's are send from the server VLAN's as well, so I think this is more a generic Cluster Service issue then that it is an Exchange related thing.
Since I am experiencing a lot of network performance issues, I suspect there is also a part to be found in network packets send from the wrong interfaces, but this example keeps me puzzled. Is there anyone who can explain to me why this is happening?
ASKER
Hi Amit, thank you so much for pointing out the strong/weak send/receives to me. It's something I wasn't yet aware of in these terms. I've added the output of what you asked and some more in the attached xlsx. This is a test setup so all information included is irrelevant to security.
When I try to add the rule you suggest, I get the following response:
C:\Windows\system32>route add 10.30.30.0 mask 255.255.255.0 10.30.30.6 METRIC 1 IF 18
The route addition failed: The object already exists.
C:\Windows\system32>route add -r 10.30.30.0 mask 255.255.255.0 10.30.30.6 METRIC 1 IF 18
OK!
I sort of fail to see what the benefit then would be from adding this permanently to the static routes table given that it is already added dynamically anyway. Could you please elaborate on that?
Having said that, I've been reading the article and if I read it correctly, it would mean that I need to have weak sends/receives disabled. As you can see this is disabled on the interface, and I've checked the other interfaces as well, and is set the same:
When I try to add the rule you suggest, I get the following response:
C:\Windows\system32>route add 10.30.30.0 mask 255.255.255.0 10.30.30.6 METRIC 1 IF 18
The route addition failed: The object already exists.
C:\Windows\system32>route add -r 10.30.30.0 mask 255.255.255.0 10.30.30.6 METRIC 1 IF 18
OK!
Having said that, I've been reading the article and if I read it correctly, it would mean that I need to have weak sends/receives disabled. As you can see this is disabled on the interface, and I've checked the other interfaces as well, and is set the same:
Weak Host Sends : disabled
Weak Host Receives : disabled
Weak Host Receives : disabled
I don't see any attachment.
ASKER
First time to add files on EE... apparently forgot to click the upload button...
Experts-Exchange.xlsx
Experts-Exchange.xlsx
Can you check DAG replication setting from EMC. You should only enable replication via replication NIC and disabel it on MAPI/Backup NIC.
ASKER
By default all networks are configured for replication I see. I'll have a check with the network team tomorrow if indeed this traffic has stopped. That might explain where the packets are coming from in this case, it doesn't though explain why they are send anyway.
The way I understand it, the clusterservice (given that I have seen this same behavior also on Hyper-V clusters and Fileserver clusters) deliberately uses the IP address of the LAN as originating address to it's cluster packets. Since the host is acting as a strong host sender on all interfaces, it will only look at the routing table explicitly bound to that interface. Finding the default gateway as a usable route on this interface, it will thus send the cluster packet using the server LAN adapter. If so, would that mean that I would also be able to solve this by enabling weak host sends explicitly from the Serverlan adapter? And if so, what would be the particular security ramifications of such a change?
The way I understand it, the clusterservice (given that I have seen this same behavior also on Hyper-V clusters and Fileserver clusters) deliberately uses the IP address of the LAN as originating address to it's cluster packets. Since the host is acting as a strong host sender on all interfaces, it will only look at the routing table explicitly bound to that interface. Finding the default gateway as a usable route on this interface, it will thus send the cluster packet using the server LAN adapter. If so, would that mean that I would also be able to solve this by enabling weak host sends explicitly from the Serverlan adapter? And if so, what would be the particular security ramifications of such a change?
I don't see any issue in enabling it. However, I would suggest you open case with MS to get more clarity for this issue.
Please check the DAG settings or read the below article FYR,
http://www.msexchange.org/articles-tutorials/exchange-server-2010/high-availability-recovery/uncovering-exchange-2010-database-availability-groups-dags-part1.html
http://www.msexchange.org/articles-tutorials/exchange-server-2010/high-availability-recovery/uncovering-exchange-2010-database-availability-groups-dags-part1.html
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Thanks for sharing.
ASKER
During the discussion in the Microsoft Partner Channel, I came across the PlumbAllCrossSubnetRoutes property and how it impacts the path discovery of cluster services. Through this post I'd like to give some feedback to the other readers on what I have found there.
Read this:
strong host model: https://technet.microsoft.com/en-us/magazine/2007.09.cableguy.aspx
On each server run this command:
Route add -p 10.38.16.0 MASK 255.255.255.0 <GatewayIP> METRIC 1 IF 12
12 is the interface list number. If you still have doubt, send me the route print result for all servers with NIC configuration details. Run this ipconfig /all > ip.txt