Randomized rounding


Within computer science and operations research,
many combinatorial optimization problems are computationally intractable to solve exactly.
Many such problems do admit fast approximation algorithms—that is, algorithms that are guaranteed to return an approximately optimal solution given any input.
Randomized rounding
is a widely used approach for designing and analyzing such approximation algorithms.
The basic idea is to use the probabilistic method
to convert an optimal solution of a relaxation
of the problem into an approximately optimal solution to the original problem.

Overview

The basic approach has three steps:
  1. Formulate the problem to be solved as an integer linear program.
  2. Compute an optimal fractional solution to the linear programming relaxation of the ILP.
  3. Round the fractional solution of the LP to an integer solution of the ILP.
The challenge in the first step is to choose a suitable integer linear program.
Familiarity with linear programming is required, in particular, familiarity with
how to model problems using linear programs and integer linear programs.
But, for many problems, there is a natural integer linear program that works well,
such as in the Set Cover example below.
In the second step, the optimal fractional solution can typically be computed
in polynomial time
using any standard linear programming algorithm.
In the third step, the fractional solution must be converted into an integer solution
.
This is called rounding the fractional solution.
The resulting integer solution should have cost
not much larger than the cost of the fractional solution.
This will ensure that the cost of the integer solution
is not much larger than the cost of the optimal integer solution.
The main technique used to do the third step is to use randomization,
and then to use probabilistic arguments to bound the increase in cost due to the rounding
.
There, probabilistic arguments are used to show the existence of discrete structures with
desired properties. In this context, one uses such arguments to show the following:
Finally, to make the third step computationally efficient,
one either shows that approximates
with high probability
or one derandomizes the rounding step,
typically using the method of conditional probabilities.
The latter method converts the randomized rounding process
into an efficient deterministic process that is guaranteed
to reach a good outcome.

Comparison to other applications of the probabilistic method

The randomized rounding step differs from most applications of the probabilistic method in two respects:
  1. The computational complexity of the rounding step is important. It should be implementable by a fast algorithm.
  2. The probability distribution underlying the random experiment is a function of the solution of a relaxation of the problem instance. This fact is crucial to proving the performance guarantee of the approximation algorithm --- that is, that for any problem instance, the algorithm returns a solution that approximates the optimal solution for that specific instance. In comparison, applications of the probabilistic method in combinatorics typically show the existence of structures whose features depend on other parameters of the input. For example, consider Turán's theorem, which can be stated as "any graph with vertices of average degree must have an independent set of size at least. While there are graphs for which this bound is tight, there are also graphs which have independent sets much larger than. Thus, the size of the independent set shown to exist by Turán's theorem in a graph may, in general, be much smaller than the maximum independent set for that graph.

    Set cover example

The following example illustrates how randomized rounding can be used to design an approximation algorithm for the Set Cover problem.
Fix any instance of set cover over a universe.
For step 1, let IP be the standard integer linear program for set cover for this instance.
For step 2, let LP be the linear programming relaxation of IP,
and compute an optimal solution to LP
using any standard linear programming algorithm.
----
Note that any set cover for
gives a feasible solution
.
The cost of this equals the cost of, that is,
In other words, the linear program LP is a relaxation
of the given set-cover problem.
Since has minimum cost among feasible solutions to the LP,
the cost of is a lower bound on the cost of the optimal set cover.

Step 3: The randomized rounding step

Here is a description of the third step—the rounding step,
which must convert the minimum-cost fractional set cover
into a feasible integer solution .
The rounding step should produce an that, with positive probability,
has cost within a small factor of the cost of.
Then,
the cost of will be within a small factor of the optimal cost.
As a starting point, consider the most natural rounding scheme:
With this rounding scheme,
the expected cost of the chosen sets is at most,
the cost of the fractional cover.
This is good. Unfortunately the coverage is not good.
When the variables are small,
the probability that an element is not covered is about
So only a constant fraction of the elements will be covered in expectation.
To make cover every element with high probability,
the standard rounding scheme
first scales up the rounding probabilities
by an appropriate factor.
Here is the standard rounding scheme:
Scaling the probabilities up by
increases the expected cost by,
but makes coverage of all elements likely.
The idea is to choose as small
as possible so that all elements are provably
covered with non-zero probability.
Here is a detailed analysis.
----

Lemma (approximation guarantee for rounding scheme)

Proof

The output of the random rounding scheme has the desired properties
as long as none of the following "bad" events occur:
  1. the cost of exceeds, or
  2. for some element, fails to cover.
The expectation of each is at most.
By linearity of expectation,
the expectation of
is at most.
Thus, by Markov's inequality, the probability of the first bad event
above is at most.
For the remaining bad events, note that,
since for any given element,
the probability that is not covered is
Thus, for each of the elements,
the probability that the element is not covered is less than.
By the naive union bound,
the probability that one of the bad events happens
is less than.
Thus, with positive probability there are no bad events
and is a set cover of cost at most.
QED

Derandomization using the method of conditional probabilities

The lemma above shows the existence of a set cover
of cost ).
In this context our goal is an efficient approximation algorithm,
not just an existence proof, so we are not done.
One approach would be to increase
a little bit, then show that the probability of success is at least, say, 1/4.
With this modification, repeating the random rounding step a few times
is enough to ensure a successful outcome with high probability.
That approach weakens the approximation ratio.
We next describe a different approach that yields
a deterministic algorithm that is guaranteed to
match the approximation ratio of the existence proof above.
The approach is called the method of conditional probabilities.
The deterministic algorithm emulates the randomized rounding scheme:
it considers each set in turn,
and chooses.
But instead of making each choice randomly based on,
it makes the choice deterministically, so as to
keep the conditional probability of failure, given the choices so far, below 1.

Bounding the conditional probability of failure

We want to be able to set each variable in turn
so as to keep the conditional probability of failure below 1.
To do this, we need a good bound on the conditional probability of failure.
The bound will come by refining the original existence proof.
That proof implicitly bounds the probability of failure
by the expectation of the random variable
where
is the set of elements left uncovered at the end.
The random variable may appear a bit mysterious,
but it mirrors the probabilistic proof in a systematic way.
The first term in comes from applying Markov's inequality
to bound the probability of the first bad event.
It contributes at least 1 to if the cost of is too high.
The second term
counts the number of bad events of the second kind.
It contributes at least 1 to if leaves any element uncovered.
Thus, in any outcome where is less than 1,
must cover all the elements
and have cost meeting the desired bound from the lemma.
In short, if the rounding step fails, then.
This implies that
is an upper bound on the probability of failure.
Note that the argument above is implicit already in the proof of the lemma,
which also shows by calculation that.
To apply the method of conditional probabilities,
we need to extend the argument to bound the conditional probability of failure
as the rounding step proceeds.
Usually, this can be done in a systematic way,
although it can be technically tedious.
So, what about the conditional probability of failure as the rounding step iterates through the sets?
Since in any outcome where the rounding step fails,
by Markov's inequality, the conditional probability of failure
is at most the conditional expectation of.
Next we calculate the conditional expectation of,
much as we calculated the unconditioned expectation of in the original proof.
Consider the state of the rounding process at the end of some iteration.
Let denote the sets considered so far
.
Let denote the vector
.
For each set,
let
denote the probability with which will be set to 1.
Let contain the not-yet-covered elements.
Then the conditional expectation of,
given the choices made so far, that is, given, is
Note that is determined only after iteration.

Keeping the conditional probability of failure below 1

To keep the conditional probability of failure below 1,
it suffices to keep the conditional expectation of below 1.
To do this, it suffices to keep the conditional expectation of from increasing.
This is what the algorithm will do.
It will set in each iteration to ensure that
.
In the th iteration,
how can the algorithm set
to ensure that ?
It turns out that it can simply set
so as to minimize the resulting value of.
To see why, focus on the point in time when iteration starts.
At that time, is determined,
but is not yet determined
--- it can take two possible values depending on how
is set in iteration.
Let denote the value of.
Let and,
denote the two possible values of ,
depending on whether is set to 0, or 1, respectively.
By the definition of conditional expectation,
Since a weighted average of two quantities
is always at least the minimum of those two quantities,
it follows that
Thus, setting
so as to minimize the resulting value of
will guarantee that
This is what the algorithm will do.
In detail, what does this mean?
Considered as a function of
is a linear function of,
and the coefficient of in that function is
Thus, the algorithm should set to 0 if this expression is positive,
and 1 otherwise. This gives the following algorithm.

Randomized-rounding algorithm for set cover

input: set system, universe, cost vector
output: set cover
----
  1. Compute a min-cost fractional set cover .
  2. Let. Let for each.
  3. For each do:
  4. # Let.
  5. # If
  6. #: then set,
  7. #: else set and.
  8. #:
  9. Return.
----

lemma (approximation guarantee for algorithm)

proof

----
The algorithm ensures that the conditional expectation of,
, does not increase at each iteration.
Since this conditional expectation is initially less than 1,
the algorithm ensures that the conditional expectation stays below 1.
Since the conditional probability of failure
is at most the conditional expectation of,
in this way the algorithm
ensures that the conditional probability of failure stays below 1.
Thus, at the end, when all choices are determined,
the algorithm reaches a successful outcome.
That is, the algorithm above returns a set cover
of cost at most times
the minimum cost of any set cover.

Remarks

In the example above, the algorithm was guided by the conditional expectation of a random variable.
In some cases, instead of an exact conditional expectation,
an upper bound
on some conditional expectation is used instead.
This is called a pessimistic estimator.