Key Probabilistic Concentration Bounds

Especially in the analysis of randomized algorithms, probabilistic bounds play a pivotal role in the proving of probabilistic theorems. This blog aims to be a reference for such key results that will be used in the randomized analyses conducted in upcoming blogs. Foremost, we cover the all-important Markov and Chebyshev bounds, complete with their simplistic proofs. Then, we move on to state the Chernoff and Hoeffding concentration bounds; the proofs of these bounds are quite involved probabilistically and will not be stressed in detail here. Lastly, we conclude other probabilistic bounds such as the union bound for future reference.

Normal Distribution with Highlighted Upper Tail

For the foundational probability theory background for internalizing the material here, look into a previous post. The content presented here is inspired and adapted from Mitzenmachel and Upfal’s Probability and Computing, and the notes of C. Seshadhri’s CSE 290A course — the course webpage can be found here.

Markov’s Bound

Russian mathematician, Andrey Markov, was instrumental in the fields of probability theory and statistics, and he is most famously known for his concoction of the stochastic process known as Markov chains. Here we present one of the fundamental inequalities in probability theory known as Markov’s inequality along with its proof.

Theorem (Markov’s Bound): Let X be a nonnegative random variable (i.e. all possible values of X are nonnegative). Then,

    \begin{align*}  \Pr[X \ge a] \le \frac{\EX[X]}{a} \end{align*}

or equivalently written as

    \begin{align*} \Pr[X \ge a\EX[X]] \le \frac{1}{a} \end{align*}

for all constant a > 0.

Proof: Consider some random variable X such that all its possible values are nonnegative, and also let a > 0 be some constant. Now define the following indicator:

    \begin{align*} I = \begin{cases} 1 & \text{ if $X \ge a$} \\ 0 & \text{ otherwise} \end{cases} \end{align*}

and thus we can write that

    \begin{align*} I \le \frac{X}{a} \end{align*}

at all times because of how I is defined and since X is nonnegative. By definition of expectation, we get that

    \begin{align*} \EX[I] = 0 \cdot \Pr[I = 0] + 1 \cdot \Pr[I = 1] = \Pr[X \ge a]. \end{align*}

since I = 1 is equivalent to X \ge a. So thus by substitution we see that

    \begin{align*} \Pr[X \ge a] = \EX[I] \le \EX\left[ \frac{X}{a} \right] = \frac{\EX[X]}{a} \end{align*}

since a is a constant and by the use of our knowledge of the relation between I and \frac{X}{a}. Thus we have completed the derivation of the claim as desired. \quad \blacksquare

The only assumption we need to apply Markov’s inequality is that the random variable given is nonnegative. Other than that, we don’t require any other information in relation to the random variable (e.g. variance, etc.), only the expectation. This inequality, although crude and simplistic, is extremely useful due to its convenient generality. Further, it is tight under the assumptions made to user Markov’s inequality, i.e. there exists a random variable X and constant a such that we have equality.

Chebyshev’s Bound

Before diving into the next bound, recall the definition of variance, denoted by \var[X], for a random variable X:

    \begin{align*} \var[X] = \EX[(X - \EX[X])^2] = \EX[X^2] - (\EX[X])^2. \end{align*}

The variance intuitively quantifies the level of the random variable X deviance from its expected value. The variance sometimes can be computable from a given random variable, and in these cases, the Chebyshev bound is applicable as we will see further on.

Another Russian mathematician is the renowned Pafnuty Chebyshev, whose student in fact was Markov himself and many other prominent Russian mathematicians. His most famous contribution to the field of probability theory is the inequality we will introduce here, and from this result followed many other important results in the early days of probability theory. Let’s now discuss Chebyshev’s inequality below.

Theorem (Chebyshev’s Bound): Let X be a random variable. Then,

    \begin{align*} \Pr[|X - \EX[X]| \ge a] \le \frac{\var[X]}{a^2} \end{align*}

for all constant a > 0.

Proof: Notice that since a > 0, the inequality |X - \EX[X]| \ge a and (X - \EX[X])^2 \ge a^2 are equivalent, so we can thus write

    \begin{align*} \Pr[|X - \EX[X]| \ge a] = \Pr[(X - \EX[X])^2 \ge a^2]. \end{align*}

Furthermore recall that \var[X] = \EX[(X - \EX[X])^2], so simply by applying Markov’s inequality to the random variable (X - \EX[X])^2, we get that

    \begin{align*} \Pr[(X - \EX[X])^2 \ge a^2] \le \frac{\EX[(X - \EX[X])^2]}{a^2} = \frac{\var[X]}{a^2}, \end{align*}

and thus by substitution we get that \Pr[|X - \EX[X]| \ge a] \le \frac{\var[X]}{a^2} as claimed. \quad \blacksquare

As with Markov’s bound, there are a few more useful expressions of the same inequality: we list them below. Recall that the standard deviation of a random variable X, denoted by \sigma[X], is defined as \sigma[X] = \sqrt{\var[X]}.

Corollary: Consider a random variable X. Then,

    \begin{align*} \Pr[|X - \EX[X]| \ge t\sigma[X]] \le \frac{1}{t^2} \end{align*}

and

    \begin{align*} \Pr[|X - \EX[X]| \ge t\EX[X]] \le \frac{\var[X]}{t^2(\EX[X])^2} \end{align*}

for any constant t > 1.

The proof of the above corollary follows immediately by substitution into the Chebyshev bound, and so we don’t state the contents of the derivation here.

As we can see the Chebyshev bound (and its corollaries) is in fact more general than the Markov bound, as we do not make any assumptions on the random variable X. However, we do require the use of the variance, and this operation may difficult to compute, so at the same time, the Chebyshev inequality isn’t as general as we might want it to be. Nevertheless, this bound is extremely useful for many steps of probabilistic analyses. Moreover, similar to Markov’s bound, this bound is also tight with the assumptions made.

Poisson Trials

Though the general formulations of the Chernoff and Hoeffding bounds only require the use of the moment generating function (which essentially provides a method of understanding “moments” of a random variable), the most compulsive rendering of the bounds are for the case of analyzing Poisson trials which we will first define here.

Recall that a 0-1 random variable is the type of random variable X such that X = 0 or X = 1 where its “probability” is defined by p \triangleq \Pr[X = 1], i.e. the probability the 0-1 random variable X is 1; the definition is quite evident in the nomenclature for this class of random variables. Now let’s define Poisson trials formally.

Definition (Poisson Trials): Consider n mutually independent 0-1 random variables X_1, \ldots, X_n with corresponding probabilities p_1, \ldots, p_n (not necessarily identical); this sequence is known as a sequence of Poisson trials. Then the random variable defined by X = \sum_{i = 1}^n X_i is known as the sum of Poisson trials.

Now it’s not daft that this might seem similar to the definition of the Bernoulli random variable. In fact, the Bernoulli is simply a special case of the sum of Poisson trials, and that is where the corresponding probabilities identical (they are all equal). Further, the sum of random variables does remind us of the Central Limit Theorem which asserts the sum of independent random variables from a fixed distribution approaches a Gaussian distribution. Note that in Poisson trials, each trial can come from its own distribution, though the sum of Poisson trials can be deemed sub-Gaussian, i.e. it retains certain properties of the Gaussian distribution, especially the tail properties. Keeping this in the back of our minds, it is easier to understand the intuition behind the Chernoff bounds.

Noting the lenience of varying probabilities for the 0-1 random variables is what makes them interesting, and in fact quite ubiquitous in the world probabilistic analysis. The following bounds have very particular results for sums of Poisson trials primarily, and we delve into these in detail.

The Chernoff Bounds

By the general derivation of the Chernoff bound (by the process of moment generating functions) we reach the following conclusions for some given random variable X:

    \begin{align*} \Pr[X \ge a] \le \min_{t > 0} \frac{\EX[e^{tX}]}{e^{ta}} \end{align*}

and

    \begin{align*} \Pr[X \le a] \le \min_{t  < 0} \frac{\EX[e^{tX}]}{e^{ta}} \end{align*}

where we must choose choose our t wisely to obtain beneficial bounds. Quite underwhelming for the most famous set of concentration inequalities in probability theory.

However, by this process, there are numerous well-established results for Poisson trials specifically and this is quite useful since many analyses can be transformed into an equivalent problem involving the sum of Poisson trials. To introduce some “probability theory slang,” the terms “upper tail” and “lower tail” refer to the ends (right and left, respectively) of the somewhat-Gaussian distributions that Poisson trials generally are.

So we first take a look at the multiplicative Chernoff upper tail inequalities.

Theorem (Multiplicative Chernoff Bounds for the Upper Tail): Let X_1, \ldots, X_n be independent Poisson trials with corresponding probabilities p_1, \ldots, p_n, respectively. Also let X = \sum_{i = 1}^n X_i. Then we have the following bounds:

  • for \varepsilon > 0,

    \begin{align*} \Pr[X \ge (1 + \varepsilon)\EX[X]] \le \lp \frac{e^\varepsilon}{(1 + \varepsilon)^{(1 + \varepsilon)}} \rp^{\EX[X]},\end{align*}

  • for 0 < \varepsilon \le 1,

    \begin{align*} \Pr[X \ge (1 + \varepsilon)\EX[X]] \le \exp\lp -\frac{\varepsilon^2\EX[X]}{3} \rp, \end{align*}

  • for R > 6\EX[X],

    \begin{align*} \Pr[X \ge R] \le 2^{-R}, \end{align*}

which we call the upper tail multiplicative Chernoff bounds.

On the other hand, for the lower tail, we have similar bounds as follows.

Theorem (Multiplicative Chernoff Bounds for the Lower Tail): Let X_1, \ldots, X_n be independent Poisson trials with corresponding probabilities p_1, \ldots, p_n, respectively. Also let X = \sum_{i = 1}^n X_i. Then we have the following bounds:

  • for 0 < \varepsilon \le 1,

    \begin{align*} \Pr[X \ge (1 - \varepsilon)\EX[X]] \le \lp \frac{e^{-\varepsilon}}{(1 - \varepsilon)^{(1 - \varepsilon)}} \rp^{\EX[X]},\end{align*}

  • for 0 < \varepsilon \le 1,

    \begin{align*} \Pr[X \ge (1 - \varepsilon)\EX[X]] \le \exp\lp -\frac{\varepsilon^2\EX[X]}{2} \rp, \end{align*}

which we call the lower tail multiplicative Chernoff bounds.

Note that we write “multiplicative” since the types of probabilities we are bounding, are in the form of some multiplicative factor away from the expectation. Further, for the lower tail, we only have cases for 0 < \varepsilon \le 1 since X \ge 0 as it is a sum of Poisson trials.

Further, although the distinctive applications of these bounds seem innocuous now, in upcoming blog posts and in many randomized analyses these bounds do come into play; one example is the analysis of the high probability of the expected case for the run-time of Randomized QuickSort.

Hoeffding’s Bound

The Hoeffding bound is derived from the Chernoff bound, but it is in an additive form rather than the multiplicative form of the above bounds. Firstly, let’s dive into the generalized theorem.

Theorem (Hoeffding’s Bound): Let X_1, \ldots, X_n be independent random variables with identical expectation and such that for all 1 \le i \le n, we have \Pr[a \le X_i \le b] = 1, i.e. X_i \in [a, b]. Also let X = \sum_{i = 1}^n X_i and not that \EX[X] = n\EX[X_i] for any 1 \le i \le n. Then,

    \begin{align*} \Pr[|X - \EX[X]| \ge t] \le 2\exp\lp -\frac{2t^2}{n(b-a)^2} \rp \end{align*}

for all t > 0.

Notice the type of probability we wish to compute, it’s about looking at a very specific range away from the expectation of X. Thus we use Hoeffding bounds when we wish to analyze such types of events than multiplicative ones.

Now let’s introduce the same Hoeffding bound but for a more specific case, and this case tends to be very crucial and ubiquitous in itself, so it’s worth presenting here.

Corollary: Let X_1, \ldots, X_n be independent random variables with identical expectation and such that for all 1 \le i \le n we have X_i \in [0, 1]. Also let X = \sum_{i = 1}^n X_i. Then,

    \begin{align*} \Pr[|X - \EX[X]| \ge t] \le 2 \exp \lp -\frac{2t^2}{n} \rp \end{align*}

for all t > 0.

As with the Chernoff bounds, the results here may seem quite arbitrary without the provision of examples, but they are key to understand and remember for future posts and of course for all kinds of randomized analyses out there. Both the Hoeffding and Chernoff bounds are the principal concentration bounds employed probabilistic analysis.

The Union Bound

Taking a break from the very intense results presented above in the form of the Chernoff and Hoeffding bounds, let’s take a look at a very key and useful bound in probability theory. Recall that for any events A and B, we have that

    \begin{align*} \Pr[A \cup B] = \Pr[A] + \Pr[B] - \Pr[A \cap B] \end{align*}

by the basic use of set theory and mutual disjoint linearity of the probability function. Also recall that by the union of events, we signify an “or,” and similarly for an intersection of events, we signify an “and,” which can also be denoted simply by a comma (e.g. \Pr[A, B]). From the above expression, it is easy to see that

    \begin{align*} \Pr[A \cup B] \le \Pr[A] + \Pr[B] \end{align*}

for any events A and B since probability is always nonnegative. With this knowledge in hand, lets dive into the bound and its proof.

Theorem (Union Bound): Consider the events \varepsilon_1, \ldots, \varepsilon_n. Then,

    \begin{align*} \Pr\left[\bigcup_{i = 1}^n \varepsilon_i\rb \le \sum_{i = 1}^n \Pr[\varepsilon_i] \end{align*}

with equality when the n events are mutually disjoint.

Proof: We prove this by induction on n. So our inductive hypothesis assumes that

    \begin{align*} \Pr\left[\bigcup_{i = 1}^{n-1} \varepsilon_i\rb \le \sum_{i = 1}^{n-1} \Pr[\varepsilon_i], \end{align*}

and so by substitution we get that

    \begin{align*} \Pr\left[\bigcup_{i = 1}^{n} \varepsilon_i\rb &= \Pr\left[\lp\bigcup_{i = 1}^{n - 1} \varepsilon_i \rp \cup \varepsilon_n\rb \\ &\le \Pr\left[\bigcup_{i = 1}^{n-1} \varepsilon_i\rb + \Pr[\varepsilon_n] \\ &\le \sum_{i = 1}^{n-1} \Pr[\varepsilon_i] + \Pr[\varepsilon_n] \\ &= \sum_{i = 1}^n \Pr[\varepsilon_i] \end{align*}

with facets of the above derivation followed by the use of observations stated previously in this section. Thus we have achieved our bound as desired. Similarly, when the events are mutually disjoint equality follows from the fact that \Pr[A \cup B] = \Pr[A] + \Pr[B] when events A and B are disjoint since \Pr[A \cap B] = 0. \quad \blacksquare

This bound is universal, and almost in any situation involving not well-known events or random variables, the use of the union bound is extremely beneficial in coming to a clean answer. Personally, in my experience, almost always in the computation of an upper bound in the probability of a union of events, this bound is brought into play, so it’s definitely a key one to remember.

Looking Ahead

All the bounds presented above are key to all kinds of probabilistic analysis, and especially in the world of randomized algorithms. However, the material is definitely extremely dry and seemingly arbitrary with the lack of examples. In upcoming posts, we will employ many of the results here for our randomized algorithm analysis; but in the meantime, the books Probability and Computing and Randomized Algorithms are definitely great mines for such examples of the bounds in use. Again this post should act reference to other posts or for personal archiving.

About the Author

One thought on “Key Probabilistic Concentration Bounds

Leave a Reply

Your email address will not be published. Required fields are marked *

You may also like these