In distributed computing, leader election is the process of designating a single process as the organizer of some task distributed among several computers (nodes). Before the task is begun, all network nodes are either unaware which node will serve as the "leader" (or coordinator) of the task, or unable to communicate with the current coordinator. After a leader election algorithm has been run, however, each node throughout the network recognizes a particular, unique node as the task leader.
The network nodes communicate among themselves in order to decide which of them will get into the "leader" state. For that, they need some method in order to break the symmetry among them. For example, if each node has unique and comparable identities, then the nodes can compare their identities, and decide that the node with the highest identity is the leader.
The definition of this problem is often attributed to LeLann, who formalized it as a method to create a new token in a token ring network in which the token has been lost.
Leader election algorithms are designed to be economical in terms of total bytes transmitted, and time. The algorithm suggested by Gallager, Humblet, and Spira for general undirected graphs has had a strong impact on the design of distributed algorithms in general, and won the Dijkstra Prize for an influential paper in distributed computing.
Many other algorithms were suggested for different kind of network graphs, such as undirected rings, unidirectional rings, complete graphs, grids, directed Euler graphs, and others. A general method that decouples the issue of the graph family from the design of the leader election algorithm was suggested by Korach, Kutten, and Moran.
The problem of leader election is for each processor eventually to decide that whether it is a leader or not subject to only one processor decides that it is the leader. An algorithm solves the leader election problem if:
A valid leader election algorithm must meet the following conditions:
An algorithm for leader election may vary in following aspects:
A ring network is a connected-graph topology in which each node is exactly connected to two other nodes, i.e., for a graph with n nodes, there are exactly n edges connecting the nodes. A ring can be unidirectional, which means processors only communicate in one direction (a node could only send messages to the left or only send messages to the right), or bidirectional, meaning processors may transmit and receive messages in both directions (a node could send messages to the left and right).
A ring is said to be anonymous if every processor is identical. More formally, the system has the same state machine for every processor. There is no deterministic algorithm to elect a leader in anonymous rings, even when the size of the network is known to the processes. This is due to the fact that there is no possibility of breaking symmetry in an anonymous ring if all processes run at the same speed. The state of processors after some steps only depends on the initial state of neighbouring nodes. So, because their states are identical and execute the same procedures, in every round the same messages are sent by each processor. Therefore, each processor state also changes identically and as a result if one processor is elected as a leader, so are all the others.
For simplicity, prove it in anonymous synchronous rings. Prove by contradiction. Let's consider an anonymous ring R with size n>1. Assume there exists an algorithm "A" to solve leader election in this anonymous ring R.
Proof. prove by induction on .
Base case: : all the processes are in the initial state, so all the processes are identical.
Induction hypothesis: assume the lemma is true for rounds.
Inductive step: in round , every process send the same message to the right and send the same message to the left. Since all the processes are in the same state after round , in round k, every process will receive the message from the left edge, and will receive the message from the right edge. Since all processes are receiving the same messages in round , they are in the same state after round .
The above lemma contradicts the fact that after some finite number of rounds in an execution of A, one process entered the elected state and other processes entered the non-elected state.
A common approach to solve the problem of leader election in anonymous rings is the use of probabilistic algorithms. In such approaches, generally processors assume some identities based on a probabilistic function and communicate it to the rest of the network. At the end, through the application of an algorithm, a leader is selected (with high probability).
Since there is no algorithm for anonymous rings (proved above), the asynchronous rings would be considered as asynchronous non-anonymous rings. In non-anonymous rings, each process has a unique , and they don't know the size of the ring. Leader election in asynchronous rings can be solved by some algorithm with using messages or messages.
In the algorithm, every process sends a message with its to the left edge. Then waits until a message from the right edge. If the in the message is greater than its own , then forwards the message to the left edge; else ignore the message, and does nothing. If the in the message is equal to its own , then sends a message to the left announcing myself is elected. Other processes forward the announcement to the left and turn themselves to non-elected. It is clear that the upper bound is for this algorithm.
In the algorithm, it is running in phases. On the th phase, a process will determine whether it is the winner among the left side and right side neighbors. If it is a winner, then the process can go to next phase. In phase , each process needs to determine itself is a winner or not by sending a message with its to the left and right neighbors (neighbor do not forward the message). The neighbor replies an only if the in the message is larger than the neighbor's , else replies an . If receives two s, one from the left, one from the right, then is the winner in phase . In phase , the winners in phase need to send a message with its to the left and right neighbors. If the neighbors in the path receive the in the message larger than their , then forward the message to the next neighbor, otherwise reply an . If the th neighbor receives the larger than its , then sends back an , otherwise replies an . If the process receives two s, then it is the winner in phase . In the last phase, the final winner will receive its own in the message, then terminates and send termination message to the other processes. In the worst case, each phase there are at most winners, where is the phase number. There are phases in total. Each winner sends in the order of messages in each phase. So, the messages complexity is .
In Attiya and Welch's Distributed Computing book, they described a non-uniform algorithm using messages in synchronous ring with knowing the ring size . The algorithm is operating in phases, each phase has rounds, each round is one time unit. In phase , if there is a process with , then process sends termination message to the other processes (sending termination messages cost rounds). Else, go to the next phase. The algorithm will check if there is a phase number equals to a process , then does the same steps as phase . At the end of the execution, the minimal will be elected as the leader. It used exactly messages and rounds.
Itai and Rodeh introduced an algorithm for a unidirectional ring with synchronized processes. They assume the size of the ring (number of nodes) is known to the processes. For a ring of size n, a There are also algorithms for rings of special sizes such as prime size and odd size.
In typical approaches to leader election, the size of the ring is assumed to be known to the processes. In the case of anonymous rings, without using an external entity, it is not possible to elect a leader. Even assuming an algorithm exists, the leader could not estimate the size of the ring. i.e. in any anonymous ring, there is a positive probability that an algorithm computes a wrong ring size. To overcome this problem, Fisher and Jiang used a so-called leader oracle that each processor can ask whether there is a unique leader. They show that from some point upward, it is guaranteed to return the same answer to all processes.
In one of the early works, Chang and Roberts proposed a uniform algorithm in which a processor with the highest ID is selected as the leader. Each processor sends its ID in a clockwise direction. A process receiving a message and compares it with its own. If it is bigger, it passes it through, otherwise it will discard the message. They show that this algorithm uses at most O(n^2) messages and O(nlogn) in the average case.
Hirschberg and Sinclair improved this algorithm with O(nlogn) message complexity by introducing a 2 directional message passing scheme allowing the processors to send messages in both directions.
The mesh is another popular form of network topology, especially in parallel systems, redundant memory systems and interconnection networks.
In a mesh structure, nodes are either corner (only two neighbours), border (only three neighbours) or interior (with four neighbours). The number of edges in a mesh of size a x b is m=2ab-a-b.
A typical algorithm to solve the leader election in an unoriented mesh is to only elect one of the four corner nodes as the leader. Since the corner nodes might not be aware of the state of other processes, the algorithm should first wake up the corner nodes. A leader can be elected as follows.
The message complexity is at most , and if the mesh is square-shaped, O.
An oriented mesh is a special case where port numbers are compass labels, i.e. north, south, east and west. Leader election in an oriented mesh is trivial. We only need to nominate a corner, e.g. "north" and "east" and make sure that node knows it is a leader.
A special case of mesh architecture is a torus which is a mesh with "wrap-around". In this structure, every node has exactly 4 connecting edges. One approach to elect a leader in such a structure is known as electoral stages. Similar to procedures in ring structures, this method in each stage eliminates potential candidates until eventually one candidate node is left. This node becomes the leader and then notifies all other processes of termination. This approach can be used to achieve a complexity of O(n). There also more practical approaches introduced for dealing with presence of faulty links in the network.
A Hypercube H_k is a network consisting of n=2^k nodes, each with degree of k and O(n log n) edges. A similar electoral stages as before can be used to solve the problem of leader election. In each stage two nodes (called duelists) compete and the winner is promoted to the next stage. This means in each stage only half of the duelists enter the next stage. This procedure continues until only one duelist is left, and it becomes the leader. Once selected, it notifies all other processes. This algorithm requires O(n) messages. In the case of unoriented hypercubes, a similar approach can be used but with a higher message complexity of O(nloglogn).
Complete networks are structures in which all processes are connected to one another, i.e., the degree of each node is n-1, n being the size of the network. An optimal solution with O(n) message and space complexity is known. In this algorithm, processes have the following states:
To elect a leader, a virtual ring is considered in the network. All processors initially start in a passive state until they are woken up. Once the nodes are awake, they are candidates to become the leader. Based on a priority scheme, candidate nodes collaborate in the virtual ring. At some point, candidates become aware of the identity of candidates that precede them in the ring. The higher priority candidates ask the lower ones about their predecessors. The candidates with lower priority become dummies after replying to the candidates with higher priority. Based on this scheme, the highest priority candidate eventually knows that all nodes in the system are dummies except itself, at which point it knows it is the leader.
As the name implies, these algorithms are designed to be used in every form of process networks without any prior knowledge of the topology of a network or its properties, such as its size.
Shout (protocol) builds a spanning tree on a generic graph and elects its root as leader. The algorithm has a total cost linear in the edges cardinality.
This technique in essence is similar to finding a Minimum Spanning Tree (MST) in which the root of the tree becomes the leader. The basic idea in this method is individual nodes merge with each other to form bigger structures. The result of this algorithm is a tree (a graph with no cycle) whose root is the leader of entire system. The cost of mega-merger method is where m is the number of edges and n is the number of nodes.
Yo-yo (algorithm) is a minimum finding algorithm consisting of two parts: a preprocessing phase and a series of iterations. In the first phase or setup, each node exchanges its id with all its neighbours and based on the value it orients its incident edges. For instance, if node x has a smaller id than y, x orients towards y. If a node has a smaller id than all its neighbours it becomes a source. In contrast, a node with all inward edges (i.e., with id larger than all of its neighbours) is a sink. All other nodes are internal nodes.
Once all the edges are oriented, the iteration phase starts. Each iteration is an electoral stage in which some candidates will be removed. Each iteration has two phases: YO- and -YO. In this phase sources start the process to propagate to each sink the smallest values of the sources connected to that sink.
After the final stage, any source who receives a NO is no longer a source and becomes a sink. An additional stage, pruning, also is introduced to remove the nodes that are useless, i.e. their existence has no impact on the next iterations.
This method has a total cost of O(mlogn) messages. Its real message complexity including pruning is an open research problem and is unknown.
This section needs expansion with: examples and additional citations. You can help by adding to it. (October 2014)
In radio network protocols, leader election is often used as a first step to approach more advanced communication primitives, such as message gathering or broadcasts. The very nature of wireless networks induces collisions when adjacent nodes transmit at the same time; electing a leader allows to better coordinate this process. While the diameter D of a network is a natural lower bound for the time needed to elect a leader, upper and lower bounds for the leader election problem depend on the specific radio model studied.
In radio networks, the n nodes may in every round choose to either transmit or receive a message. If no collision detection is available, then a node cannot distinguish between silence or receiving more than one message at a time. Should collision detection be available, then a node may detect more than one incoming message at the same time, even though the messages itself cannot be decoded in that case. In the beeping model, nodes can only distinguish between silence or at least one message via carrier sensing.
Known runtimes for single-hop networks range from a constant (expected with collision detection) to O(n log n) rounds (deterministic and no collision detection). In multi-hop networks, known runtimes differ from roughly O((D+ log n)(log² log n)) rounds (with high probability in the beeping model), O(D log n) (deterministic in the beeping model), O(n) (deterministic with collision detection) to O(n log3/2 n (log log n)0.5) rounds (deterministic and no collision detection).