We wanted to provide an in-depth explanation of the Ping Node offering clarifications on its function and usage. Incorrect Ping Node configuration and functionality can cause problems with HA clusters. The importance of this article is critical for a proper setup with the iSCSI Failover.
Why do we need a ping node or rather ping nodes?
DSS V6 iSCSI Failover (and soon NFS Failover) uses a heartbeat to check the Primary and Secondary hosts to each other. We require at least 2 NICs configured for the heartbeat. Additionally we strongly recommend using a direct crossover or what is called a point-to-point connection for the Volume Replication. This path must be enabled for the heartbeat as well. With a direct connection both hosts can communicate even during a switch failure and you save on 2 switch ports.
So, what would happen if both the Primary and Secondary hosts are functioning well and are able to communicate to each other (i.e. via mentioned direct connection) but the storage client has lost network connection to the Primary host? For example the switch port or NIC in that path has a problem:
The heartbeat will NOT decide about the failover procedures because both hosts “think” are OK, but still the storage client cannot access the storage. This is where the Ping Node comes into play and prevents such situations. The cluster manager realizes that the Primary host has lost access to the Ping Node(s) but the Secondary host has access. So the cluster manager executes failover. Because of lost access to a single Ping Node will cause a failover, so it is strongly recommended to use at least 2 Ping Nodes for every network segment which will need a Ping Node. This will minimize failover events in case of an unreliable Ping Node.
Now, which network segment will need the Ping Node(s) for monitoring? For sure not ever NIC but only those network paths which are connected to storage clients need to be monitored with Ping Node(s).
The best explanation can be outlined below with examples. So let’s consider the first example with bonding: Failover with Bonding
Here the storage clients (VMware, XenServer, Windows) will be connected via a bonding network segment so the Ping Nodes are in the subnet 192.168.1.x. So a minimum of one Ping Node is required, but we recommend at least 2: Failover with Mpio
In this case storage clients will be connected via both network segment paths, so Ping Nodes are in subnet 192.168.10.x and 192.168.20.x
So a minimum of two ping nodes are required, but we recommend at least 4 due to the Multipath.