The Lovász Local Lemma
In this post, I’ll explain the statement and proof of the Lovász local lemma.
Motivation
There are many situations where we have a random process and a collection of ‘bad’ events $A_1$, $\ldots$, $A_n$, and we want to show that there’s a positive probability that none of these bad events happen — i.e., that
This especially happens when we’re using the probabilistic method — often we’re trying to prove the existence of an object with certain properties by taking a random construction and showing that it works with positive probability, and in many cases, the construction working corresponds to avoiding a collection of bad events.
There’s two extreme cases where it’s easy to show that we avoid all the bad events with positive probability. One is when there’s not too many events and their probabilities are all small (relative to the number of events). In this case, we can simply use a union bound — we have
so if $\mathbb{P}[A_1] + \cdots + \mathbb{P}[A_n] < 1$, then we get that
In particular, if we know that $\mathbb{P}[A_i] \leq p$ for all $i$ and $pn < 1$, then this means we avoid all the bad events with positive probability.
The other extreme case is when the events are all independent. In this case, it doesn’t matter how many events there are or how large their probabilities are (as long as they aren’t $1$, of course) — we simply have
The motivation for the Lovász local lemma is the question of whether we can ‘interpolate’ between these two extreme cases.
It turns out the answer is yes! And that’s exactly what the Lovász local lemma states.
The Lovász local lemma
To state the Lovász local lemma, we first need to formalize what it means for each event to only be dependent on a few others.
For example, this means $\mathbb{P}[A \mid A_1 \wedge \overline{A_3} \wedge A_5] = \mathbb{P}[A]$.
We’ll first state a ‘symmetric’ form of the Lovász local lemma (which is often what’s used in applications when our bad events have comparable probabilities).
(Here $e$ denotes the constant $e \approx 2.718$.)
The important feature of this theorem is that the condition $ep(d + 1) \leq 1$ doesn’t reference the number of events at all — it only considers the number of dependencies.
We’ll actually deduce the symmetric form of the Lovász local lemma from a more general ‘asymmetric’ form (which is useful even when the bad events don’t have comparable probabilities).
Suppose there exist $x_1, \ldots, x_n \in (0, 1)$ such that $\mathbb{P}[A_i] \leq x_i\prod_{j \in \mathcal{D}_i} (1 - x_j)$ for all $i \in [n]$. Then \[\mathbb{P}[\overline{A_1} \wedge \cdots \wedge \overline{A_n}] \geq \prod_{i = 1}^n (1 - x_i) > 0.\]
First, here’s why the asymmetric version of the Lovász local lemma implies the symmetric version — given $A_1$, $\ldots$, $A_n$ as in the symmetric version, we can set $x_i = 1/(d + 1)$ for each $i$. This satisfies the hypothesis of the asymmetric version — for each $i$, we have $\lvert \mathcal{D}_i \rvert \leq d$, so
and the assumption that $ep(d + 1) \leq 1$ means that this is at least $p$ (which was an upper bound on $\mathbb{P}[A_i]$). So the asymmetric version tells us we avoid all the bad events with positive probability, as desired.
In the rest of this post, we’ll prove this asymmetric version of the Lovász local lemma.
Proof of the asymmetric version
We’re going to actually prove the following claim.
Once we have this claim, we can deduce Theorem 4 by writing
We’ll prove this claim by induction on $\lvert\mathcal{S}\rvert$. When $\mathcal{S}$ is empty, we have $\mathbb{P}[A_i] \leq x_i\prod_{j \in \mathcal{D}_i} (1 - x_j) \leq x_i$, so the claim is true. Now suppose that $\mathcal{S}$ is not empty, and that we’ve proven the claim for all smaller sets $\mathcal{S}’$.
We’re trying to consider the probability of $A_i$ conditioned on a bunch of events, so we’ll start by splitting these events into two — we’ll separately consider the ones that are independent from $A_i$ and the ones that aren’t. So we split up the event that we’re trying to condition on as $B_{\text{ind}} \wedge B_{\text{dep}}$, where
First, since $A_i$ is independent of \(\{A_j \mid j \not\in \mathcal{D}_i\}\) and $B_{\text{ind}}$ is a logical combination of these events, we know that conditioning on $B_{\text{ind}}$ doesn’t affect the probability of $A_i$ — i.e., we have $\mathbb{P}[A_i \mid B_{\text{ind}}] = \mathbb{P}[A_i]$. But we don’t know anything about how conditioning on $B_{\text{dep}}$ affects $A_i$, so the best thing we can do is try to get rid of it — so we drop the conditioning on $B_{\text{dep}}$ using the fact that
(We can’t really do anything better with $B_{\text{dep}}$ because we know nothing about how the events $A_j$ that go into it interact with $A_i$.)
Now in the numerator, we’re only conditioning on $B_{\text{ind}}$, and we know how to handle this conditioning — we simply have $\mathbb{P}[A_i \mid B_{\text{ind}}] = \mathbb{P}[A_i]$. So now it remains to get a lower bound on the denominator $\mathbb{P}[B_{\text{dep}}]$.
But $B_{\text{dep}}$ is an event of the form \(\overline{A_{j_1}} \wedge \cdots \wedge \overline{A_{j_m}}\) (where $j_1$, $\ldots$, $j_m$ are the elements of $\mathcal{S} \cap \mathcal{D}_i$), so we can actually do this in the same way as in (1) using the inductive hypothesis! We can write
And for each term on the right-hand side, the set of indices of the events we’re conditioning on is a proper subset of $\mathcal{S} \cap \mathcal{D}_i \subseteq \mathcal{S}$, which means in particular that we’ve already proven the claim for this set (in our induction). So for each $k$, we have
which means that the $k$th term on the right-hand side of (3) is at least $1 - x_{j_k}$. This gives the lower bound
Plugging this into (2), we get that
(using the bound on $\mathbb{P}[A_i]$ from the given hypothesis), as desired.
Once we have the statement of Claim 5, the rest of the proof is reasonably intuitive — we split the event we're conditioning on into a part that we know how to deal with ($B_{\text{ind}}$, which $A_i$ is independent from) and a part we don't know how to deal with ($B_{\text{dep}}$, whose interaction with $A_i$ we can't really say anything about), and get rid of the latter by dropping conditioning (which costs us a factor of $\mathbb{P}[B_{\text{dep}}]$). Then we need a lower bound on $\mathbb{P}[B_{\text{dep}}]$, and if we expand out this probability as a product of conditional probabilities, we realize that we can get a lower bound on each by using induction (each of the terms in the product is of the same form as the one we're trying to bound, but with fewer events being conditioned on).
And when we plug in that bound, we find that the condition on $\mathbb{P}[A_i]$ that we need to make the proof of the claim go through is precisely the one in Theorem 4.