Chernoff Bound

Get Chernoff Bound essential facts below. View Videos or join the Chernoff Bound discussion. Add Chernoff Bound to your Like2do.com topic list for future reference or share this resource on social media.
## The generic bound

## Example

## Additive form (absolute error)

## Multiplicative form (relative error)

## Applications

## Matrix bound

### Theorem without the dependency on the dimensions

## Sampling variant

## Proofs

### Chernoff-Hoeffding Theorem (additive form)

### Multiplicative form

## See also

## References

## Further reading

This article uses material from the Wikipedia page available here. It is released under the Creative Commons Attribution-Share-Alike License 3.0.

Chernoff Bound

In probability theory, the **Chernoff bound**, named after Herman Chernoff but due to Herman Rubin,^{[1]} gives exponentially decreasing bounds on tail distributions of sums of independent random variables. It is a sharper bound than the known first- or second-moment-based tail bounds such as Markov's inequality or Chebyshev's inequality, which only yield power-law bounds on tail decay. However, the Chernoff bound requires that the variates be independent - a condition that neither Markov's inequality nor Chebyshev's inequality require, although Chebyshev's inequality does require the variates to be pairwise independent.

It is related to the (historically prior) Bernstein inequalities and to Hoeffding's inequality.

The generic Chernoff bound for a random variable X is attained by applying Markov's inequality to e^{tX}.^{[2]} For every :

When X is the sum of n random variables *X*_{1}, ..., *X _{n}*, we get for any

In particular, optimizing over *t* and using the assumption that X_{i} are independent, we obtain,

Similarly,

and so,

Specific Chernoff bounds are attained by calculating for specific instances of the basic variables .

Let *X*_{1}, ..., *X _{n}* be independent Bernoulli random variables, whose sum is

So:

For any , taking and gives:

- and

and the generic Chernoff bound gives:^{[3]}^{:64}

The probability of simultaneous occurrence of more than *n*/2 of the events {*X _{k}* = 1} has an exact value:

A lower bound on this probability can be calculated based on Chernoff's inequality:

Indeed, noticing that *?* = *np*, we get by the multiplicative form of Chernoff bound (see below or Corollary 13.3 in Sinclair's class notes),^{[4]}

This result admits various generalizations as outlined below. One can encounter many flavors of Chernoff bounds: the original *additive form* (which gives a bound on the absolute error) or the more practical *multiplicative form* (which bounds the error relative to the mean).

The following Theorem is due to Wassily Hoeffding^{[5]} and hence is called the Chernoff-Hoeffding theorem.

**Chernoff-Hoeffding Theorem.**Suppose*X*_{1}, ...,*X*are i.i.d. random variables, taking values in {0, 1}. Let_{n}*p*= E[*X*] and_{i}*?*> 0. Then- where
- is the Kullback-Leibler divergence between Bernoulli distributed random variables with parameters
*x*and*y*respectively. If*p*>= , then

A simpler bound follows by relaxing the theorem using *D*(*p* + *?* || *p*) >= 2*?*^{2}, which follows from the convexity of *D*(*p* + *?* || *p*) and the fact that

This result is a special case of Hoeffding's inequality. Sometimes, the bounds

which are stronger for *p* < , are also used.

**Multiplicative Chernoff Bound.**Suppose*X*_{1}, ...,*X*are independent random variables taking values in {0, 1}. Let X denote their sum and let_{n}*?*= E[*X*] denote the sum's expected value. Then for any*?*> 0,

A similar proof strategy can be used to show that

The above formula is often unwieldy in practice,^{[3]} so the following looser but more convenient bounds are often used:

Or looser still:

Chernoff bounds have very useful applications in set balancing and packet routing in sparse networks.

The set balancing problem arises while designing statistical experiments. Typically while designing a statistical experiment, given the features of each participant in the experiment, we need to know how to divide the participants into 2 disjoint groups such that each feature is roughly as balanced as possible between the two groups. Refer to this book section for more info on the problem.

Chernoff bounds are also used to obtain tight bounds for permutation routing problems which reduce network congestion while routing packets in sparse networks. Refer to this book section for a thorough treatment of the problem.

Chernoff bounds are used in computational learning theory to prove that a learning algorithm is probably approximately correct, i.e. with high probability the algorithm has small error on a sufficiently large training data set.^{[6]}

Chernoff bounds can be effectively used to evaluate the "robustness level" of an application/algorithm by exploring its perturbation space with randomization.^{[7]}
The use of the Chernoff bound permits to abandon the strong -and mostly unrealistic- small perturbation hypothesis (the perturbation magnitude is small). The robustness level can be, in turn, used either to validate or reject a specific algorithmic choice, a hardware implementation or the appropriateness of a solution whose structural parameters are affected by uncertainties.

Rudolf Ahlswede and Andreas Winter introduced a Chernoff bound for matrix-valued random variables.^{[8]} The following version of the inequality can be found in the work of Tropp.^{[9]}

Let *M*_{1}, ..., *M _{t}* be independent matrix valued random variables such that and .
Let us denote by the operator norm of the matrix . If holds almost surely for all , then for every

Notice that in order to conclude that the deviation from 0 is bounded by *?* with high probability, we need to chose a number of samples proportional to the logarithm of . In general, unfortunately, a dependence on is inevitable: take for example a diagonal random sign matrix of dimension . The operator norm of the sum of *t* independent samples is precisely the maximum deviation among *d* independent random walks of length *t*. In order to achieve a fixed bound on the maximum deviation with constant probability, it is easy to see that *t* should grow logarithmically with *d* in this scenario.^{[10]}

The following theorem can be obtained by assuming *M* has low rank, in order to avoid the dependency on the dimensions.

Let 0 < *?* < 1 and *M* be a random symmetric real matrix with and almost surely. Assume that each element on the support of *M* has at most rank *r*. Set

If holds almost surely, then

where *M*_{1}, ..., *M _{t}* are i.i.d. copies of

The following variant of Chernoff's bound can be used to bound the probability that a majority in a population will become a minority in a sample, or vice versa.^{[11]}

Suppose there is a general population *A* and a sub-population *B*?*A*. Mark the relative size of the sub-population (|*B*|/|*A*|) by *r*.

Suppose we pick an integer *k* and a random sample *S*?*A* of size *k*. Mark the relative size of the sub-population in the sample (|*B*?*S*|/|*S*|) by *r _{S}*.

Then, for every fraction *d*?[0,1]:

In particular, if *B* is a majority in *A* (i.e. *r* > 0.5) we can bound the probability that *B* will remain majority in *S* (*r _{S}*>0.5) by taking:

This bound is of course not tight at all. For example, when *r*=0.5 we get a trivial bound *Prob* > 0.

Let *q* = *p* + *?*. Taking *a* = *nq* in (**1**), we obtain:

Now, knowing that Pr(*X _{i}* = 1) =

Therefore, we can easily compute the infimum, using calculus:

Setting the equation to zero and solving, we have

so that

Thus,

As *q* = *p* + *?* > *p*, we see that *t* > 0, so our bound is satisfied on t. Having solved for t, we can plug back into the equations above to find that

We now have our desired result, that

To complete the proof for the symmetric case, we simply define the random variable *Y _{i}* = 1 -

Set Pr(*X _{i}* = 1) =

The third line above follows because takes the value e^{t} with probability p_{i} and the value 1 with probability 1 - *p _{i}*. This is identical to the calculation above in the proof of the Theorem for additive form (absolute error).

Rewriting as and recalling that (with strict inequality if *x* > 0), we set . The same result can be obtained by directly replacing a in the equation for the Chernoff bound with (1 + *?*)*?*.^{[13]}

Thus,

If we simply set *t* = log(1 + *?*) so that *t* > 0 for *?* > 0, we can substitute and find

This proves the result desired.

- Concentration inequality - a summary of tail-bounds on random variables.
- Entropic value at risk

**^**Chernoff, Herman (2014). "A career in statistics" (PDF). In Lin, Xihong; Genest, Christian; Banks, David L.; Molenberghs, Geert; Scott, David W.; Wang, Jane-Ling.*Past, Present, and Future of Statistics*. CRC Press. p. 35. ISBN 9781482204964.**^**This method was first applied by Sergei Bernstein to prove the related Bernstein inequalities.- ^
^{a}^{b}Mitzenmacher, Michael; Upfal, Eli (2005).*Probability and Computing: Randomized Algorithms and Probabilistic Analysis*. Cambridge University Press. ISBN 0-521-83540-2. **^**Sinclair, Alistair (Fall 2011). "Class notes for the course "Randomness and Computation"" (PDF). Retrieved 2014.**^**Hoeffding, W. (1963). "Probability Inequalities for Sums of Bounded Random Variables".*Journal of the American Statistical Association*.**58**(301): 13-30. doi:10.2307/2282952. JSTOR 2282952.**^**M. Kearns, U. Vazirani.*An Introduction to Computational Learning Theory.*Chapter 9 (Appendix), pages 190-192. MIT Press, 1994.**^**C.Alippi: "Randomized Algorithms" chapter in*Intelligence for Embedded Systems.*Springer, 2014, 283pp, ISBN 978-3-319-05278-6.**^**Ahlswede, R.; Winter, A. (2003). "Strong Converse for Identification via Quantum Channels".*IEEE Transactions on Information Theory*.**48**(3): 569-579. arXiv:quant-ph/0012127. doi:10.1109/18.985947.**^**Tropp, J. (2010). "User-friendly tail bounds for sums of random matrices".*Foundations of Computational Mathematics*.**12**: 389-434. arXiv:1004.4389. doi:10.1007/s10208-011-9099-z.**^**Magen, A.; Zouzias, A. (2011). "Low Rank Matrix-Valued Chernoff Bounds and Approximate Matrix Multiplication". arXiv:1005.2724 [cs.DM].**^**Goldberg, A. V.; Hartline, J. D. (2001). "Competitive Auctions for Multiple Digital Goods".*Algorithms -- ESA 2001*. Lecture Notes in Computer Science.**2161**. p. 416. doi:10.1007/3-540-44676-1_35. ISBN 978-3-540-42493-2.; lemma 6.1**^**See graphs of: the bound as a function of*r*when*k*changes and the bound as a function of*k*when*r*changes.**^**Refer to the proof above

- Chernoff, H. (1952). "A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations".
*Annals of Mathematical Statistics*.**23**(4): 493&ndash, 507. doi:10.1214/aoms/1177729330. JSTOR 2236576. MR 0057518. Zbl 0048.11804. - Chernoff, H. (1981). "A Note on an Inequality Involving the Normal Distribution".
*Annals of Probability*.**9**(3): 533. doi:10.1214/aop/1176994428. JSTOR 2243541. MR 0614640. Zbl 0457.60014. - Hagerup, T.; Rüb, C. (1990). "A guided tour of Chernoff bounds".
*Information Processing Letters*.**33**(6): 305. doi:10.1016/0020-0190(90)90214-I. - Nielsen, F. (2011). "Chernoff information of exponential families". arXiv:1102.2684 [cs.IT].

This article uses material from the Wikipedia page available here. It is released under the Creative Commons Attribution-Share-Alike License 3.0.

Top US Cities

United States

Like2do.com was developed using defaultLogic.com's knowledge management platform. It allows users to manage learning and research. Visit defaultLogic's other partner sites below: