Friday, May 23, 2014

Understanding Quorum Configurations in a Failover Cluster

Applies To: Windows Server 2008 R2
This topic contains the following sections:
For information about how to configure quorum options, see Select Quorum Options for a Failover Cluster.

How the quorum configuration affects the cluster

The quorum configuration in a failover cluster determines the number of failures that the cluster can sustain. If an additional failure occurs, the cluster must stop running. The relevant failures in this context are failures of nodes or, in some cases, of a disk witness (which contains a copy of the cluster configuration) or file share witness. It is essential that the cluster stop running if too many failures occur or if there is a problem with communication between the cluster nodes. For a more detailed explanation, see Why quorum is necessary later in this topic.
ImportantImportant
In most situations, use the quorum configuration that the cluster software identifies as appropriate for your cluster. Change the quorum configuration only if you have determined that the change is appropriate for your cluster.

Note that full function of a cluster depends not just on quorum, but on the capacity of each node to support the services and applications that fail over to that node. For example, a cluster that has five nodes could still have quorum after two nodes fail, but the level of service provided by each remaining cluster node would depend on the capacity of that node to support the services and applications that failed over to it.

Quorum configuration choices

You can choose from among four possible quorum configurations:
  • Node Majority (recommended for clusters with an odd number of nodes)

    Can sustain failures of half the nodes (rounding up) minus one. For example, a seven node cluster can sustain three node failures.
  • Node and Disk Majority (recommended for clusters with an even number of nodes)

    Can sustain failures of half the nodes (rounding up) if the disk witness remains online. For example, a six node cluster in which the disk witness is online could sustain three node failures.

    Can sustain failures of half the nodes (rounding up) minus one if the disk witness goes offline or fails. For example, a six node cluster with a failed disk witness could sustain two (3-1=2) node failures.
  • Node and File Share Majority (for clusters with special configurations)

    Works in a similar way to Node and Disk Majority, but instead of a disk witness, this cluster uses a file share witness.

    Note that if you use Node and File Share Majority, at least one of the available cluster nodes must contain a current copy of the cluster configuration before you can start the cluster. Otherwise, you must force the starting of the cluster through a particular node. For more information, see "Additional considerations" in Start or Stop the Cluster Service on a Cluster Node.
  • No Majority: Disk Only (not recommended)

    Can sustain failures of all nodes except one (if the disk is online). However, this configuration is not recommended because the disk might be a single point of failure.

Illustrations of quorum configurations

The following illustrations show how three of the quorum configurations work. A fourth configuration is described in words, because it is similar to the Node and Disk Majority configuration illustration.
noteNote
In the illustrations, for all configurations other than Disk Only, notice whether a majority of the relevant elements are in communication (regardless of the number of elements). When they are, the cluster continues to function. When they are not, the cluster stops functioning.

Cluster with Node Majority quorum configuration As shown in the preceding illustration, in a cluster with the Node Majority configuration, only nodes are counted when calculating a majority.
Cluster with Node and Disk Majority quorum As shown in the preceding illustration, in a cluster with the Node and Disk Majority configuration, the nodes and the disk witness are counted when calculating a majority.
Node and File Share Majority Quorum Configuration
In a cluster with the Node and File Share Majority configuration, the nodes and the file share witness are counted when calculating a majority. This is similar to the Node and Disk Majority quorum configuration shown in the previous illustration, except that the witness is a file share that all nodes in the cluster can access instead of a disk in cluster storage.
Cluster with Disk Only quorum configuration In a cluster with the Disk Only configuration, the number of nodes does not affect how quorum is achieved. The disk is the quorum. However, if communication with the disk is lost, the cluster becomes unavailable.

Why quorum is necessary

When network problems occur, they can interfere with communication between cluster nodes. A small set of nodes might be able to communicate together across a functioning part of a network but not be able to communicate with a different set of nodes in another part of the network. This can cause serious issues. In this "split" situation, at least one of the sets of nodes must stop running as a cluster.
To prevent the issues that are caused by a split in the cluster, the cluster software requires that any set of nodes running as a cluster must use a voting algorithm to determine whether, at a given time, that set has quorum. Because a given cluster has a specific set of nodes and a specific quorum configuration, the cluster will know how many "votes" constitutes a majority (that is, a quorum). If the number drops below the majority, the cluster stops running. Nodes will still listen for the presence of other nodes, in case another node appears again on the network, but the nodes will not begin to function as a cluster until the quorum exists again.
For example, in a five node cluster that is using a node majority, consider what happens if nodes 1, 2, and 3 can communicate with each other but not with nodes 4 and 5. Nodes 1, 2, and 3 constitute a majority, and they continue running as a cluster. Nodes 4 and 5, being a minority, stop running as a cluster. If node 3 loses communication with other nodes, all nodes stop running as a cluster. However, all functioning nodes will continue to listen for communication, so that when the network begins working again, the cluster can form and begin to run.

Additional references

No comments:

Post a Comment