### DIMACS Workshop on Probabilistic Analysis of Algorithms for Hard Problem

#### November 1 - 3, 1999 DIMACS Center, Rutgers University, Piscataway, NJ

Principal Organizer:
Martin Dyer, University of Leeds, dyer@scs.leeds.ac.uk
Alan Frieze, Carnegie Mellon University, alan@random.math.cmu.edu

Dimitris Achlioptas, Microsoft

Title: Mick gets more than pie.

Let $X=\{x_1,\ldots,x_n\}$ be a set of $n$ Boolean variables. Let $C(X)$ denote the set of all 3-clauses" over $X$, i.e.\ the set containing all $8 {n \choose 3}$ possible disjunctions of three, distinct, non-complimentary members of $\{x_1,\overline{x}_1,\ldots,x_n,\overline{x}_n\}$. Consider now a random 3-SAT formula $F(n,m)$ generated by selecting $m$ clauses uniformly at random from $C(X)$ and taking their conjunction. The {\em satisfiability threshold conjecture\/} asserts that there exists a constant $r_c$ such that $F(n,rn)$ is almost surely satisfiable for $r < r_c$ and almost surely unsatisfiable for $r > r_c$. Experimental evidence suggests $r_c \approx 4.2$. We prove $r_c > 3.145 > \pi$ improving over $r_c > 3.003$ due to Frieze and Suen. For this, we introduce a new satisfiability heuristic and analyze its performance. The framework for our analysis is simple and intuitive, allowing us to recover this and previous lower bounds with relatively little effort.

Christian Borgs, Microsoft

Title: Slow Mixing on the Hypercubic Lattice

We provide a general framework for analyzing certain Monte Carlo Markov chain algorithms for systems in statistical mechanics. Our framework uses expansion techniques from mathematical statistical mechanics, notably Pirogov-Sinai theory. Among our results is exponentially slow mixing for the Swendsen Wang algorithm at the transition point of the Potts model. We also establish exponentially slow mixing of Glauber dynamics in the independent set model at high fugacities.

This is joint work with J.T. Chayes, A. Frieze, J.H. Kim, P. Tetali, E. Vigoda and V.H. Vu.

Ed Coffman, New Jersey Institute of Technology

Title: Packing Random Intervals in One and Two Dimensions

Consider a collection of $n$ random intervals, where the endpoints of each are determined by two i.i.d. random draws from $[0,1]$; and consider algorithms that select/pack subsets of mutually disjoint intervals to (a) maximize the number packed, or (b) minimize the wasted space (the fraction of $[0,1]$ uncovered by the intervals packed). We know for problem (a) that the expected number packed is $\Theta(n^{1/2})$ and for problem (b) that the expected wasted space is $\Theta(\log^2 n/n)$. The problems have obvious generalizations to two dimensions, where intervals become rectangles. We will discuss a proof that $\Theta(n^{1/2})$ also applies to the expected number of rectangles in a maximal disjoint subset (this work is with George Lueker, Joel Spencer, and Peter Winkler). We will also mention applications, cognate problems/results, and some intriguing open questions.

Title: Towards an Algorithmic Version of the General Lovasz Local Lemma

The Lovasz Local Lemma is a powerfull tool that is increasingly playing a valuable role in computer science. It has led to solutions for numerous problems in many different areas, reaching from problems in pure combinatorics to problems in routing, scheduling and approximation theory. However, since the original lemma is nonconstructive, many of these solutions were first purely existential. A breakthrough result by Beck and its generalizations by Alon, Molloy & Reed) have led to polynomial time algorithms for many of these problems. However, these methods can only be applied to a simple, symmetric form of the Lovasz Local Lemma. In this talk I will present a new, recently developed approach to design polynomial-time algorithms that require the Lovasz Local Lemma in its general (i.e., non-symmetric) form. I will demonstrate the technique on the problem of 2-coloring non-uniform hypergraphs. Further, I briefly describe how this method can be applied to a certain partitioning problem which has applications to design approximation solutions to a large class of NP-hard problems called minimax integer programs.

Joint work with Christian Scheideler.

J. Franco, University of Cincinnati

Title: A Perspective on Certain Polynomial Time Solvable Classes of Satisfiability

The scope of certain well-studied polynomial-time solvable classes of Satisfiability is investigated relative to a polynomial-time solvable class consisting of what we call \emph{matched} formulas. The class of matched formulas has not been studied in the literature, probably because it seems not to contain many challenging formulas. Yet, we find that, in some sense, the matched formulas are more numerous than Horn, extended Horn, renamable Horn, q-Horn, CC-balanced, or Single Lookahead Unit Resolution (SLUR) formulas. In addition, we find that relatively few unsatisfiable formulas are in any of the above classes. However, there are many unsatisfiable formulas that can be solved in polynomial time by restricting resolution so as not to generate resolvents of size greater than the number of literals in a maximum length clause.

We use the well-studied random $k$-SAT model $M(n,m,k)$ for generating CNF formulas with $m$ clauses, each with $k$ distinct literals, from $n$ variables. We show, for all $m/n > 2/k(k-1)$, the probability that a random formula is SLUR, q-Horn, extended Horn, CC-balanced, or renamable Horn tends to 0 as $n$ goes to infinity. We show, for all $m/n < .64$, that random formulas are matched formulas with probability tending to 1 as $n$ goes to infinity.

The propositional connection graph is introduced to represent clause structure for formulas with general-width clauses. The probabilistic analysis highlights the vulnerability of previously studied polynomial-time solvable classes to certain cyclic substructures in this graph. The matched formulas are not vulnerable to such structures. Thus, we believe that part of the significance of this work lies in guiding the future development of polynomial-time solvable classes of Satisfiability.

Martin Furer, Pennsylvania State University

Title: Approximating Permanents of Complex Matrices

A wide variety of algorithms for approximating permanents of non-negative matrices have been proposed and analyzed before. Usually, these approximation algorithms have been presented for 0-1 matrices, and it has often been remarked that they extend to other matrices as long as all entries are non-negative.

Here we present the first approximation algorithms for permanents of arbitrary complex matrices. Like several other approximation algorithms for permanents, they are based on easily computable unbiased estimators. These are random variables whose expected value is equal to the permanent. The random variables are computed by randomized algorithms that do nothing more complicated than evaluating a determinant. After many repetitions, the error is likely to be small compared to the sum of the absolute values of the monomials.

Leslie Ann Goldberg, University of Warwick

Title: An extension of path coupling and its application to the Glauber dynamics for graph colourings

One of the best-known methods for bounding the convergence rate of a Markov chain is coupling. In practice, however, couplings can be difficult to construct. A powerful tool which helps with this task is the "path-coupling" technique of Bubley and Dyer. In this work, we combine the path-coupling technique with multiple-step analysis. In particular, we obtain convergence-rate bounds by analysing the behaviour of the path coupling at a random stopping time. We apply the new method to analyse the mixing time of the Glauber dynamics for graph colourings. We show that the mixing time is O(n log n) for triangle-free Delta-regular n-vertex graphs if at least (2 - eta) Delta colours are used, for a small positive constant eta.

(Joint work with Martin Dyer, Catherine Greenhill, Mark Jerrum and Michael Mitzenmacher.)

Title: Case studies in torpid mixing

A Markov chain is rapidly mixing if, loosely, it converges to equilibrium in a number of steps that is polynomial in the size of its description. Torpid mixing is the opposite: it describes a situation in which convergence requires an exponential or at least super-polynomial number of steps. I'll cover some cases of provable torpidity, drawn from the following collection: independent sets (Glauber dynamics and its relatives), the Potts model (Swendsen-Wang dynamics), and the Burnside Process.''

(Joint work with Martin Dyer, Alan Frieze, Leslie Goldberg and Vivek Gore.)

Title: Average-Case Analysis of a Self-Organizing Bin Packing Algorithm

In a "discrete distribution" we are given a bin capacity B and a probability distribution over the item sizes from 1 to B-1. Such distributions model many real-world applications of bin packing, where our goal is typically to minimize the total number of bins used or, equivalently, to minimize the wasted space in the non-empty bins of the packing. If N is the number of items generated, it is known that for any discrete distribution, the total expected waste in the optimal packing must grow either as O(N), O(sqrt(N)), or O(1), depending on the distribution. Standard bin packing algorithms such as Best Fit can be far from optimum, for instance yielding linear waste under distributions where the optimal expected waste is O(1). This is even true for offline algorithms such as Best Fit Decreasing.

We describe a remarkably simple O(nB)-time online algorithm for which the wasted space provably has the same expected growth rate (to within a constant factor) as the optimal packing for all discrete distributions. We also present an O(nBlogB)-time variant for which the expected ratio of the number of bins used to the optimal number is asymptotically 1 for all discrete distributions, even those where the expected optimal waste grows as O(N).

This is joint work with Janos Csirik, Claire Kenyon, Jim Orlin, Peter Shor, and Richard Weber.

Richard Karp, University of Washington

Title: Matchings, Cores and Dense Subgraphs in Sparse Random Graphs

Let a graph H be drawn from the probability space G(n, c/n), where c is a constant. We consider the following questions:

(1) How is the size of the largest matching in H distributed?
(2) What is the probability that H has a nonempty subgraph in which all vertices are of degree at least k?
(3) What is the probability that H has no subgraph of average degree greater than or equal to 2k? This property is equivalent to the existence of an orientation of the edges of H such that no vertex has more than k edges oriented into it.

The first question was investigated nonrigorously in [2] and settled rigorously in [1]. The second question was settled in [3]. The third question has been investigated by Molloy et al (private communication) and by M. Saks and the speaker.

Such questions can often be attacked by analyzing computational processes, rather than by counting methods or nonalgorithmic probabilistic analysis. We present some heuristic principles for conducting such analyses. These include the following:

(1) Breadth-first search in a graph drawn from G(n, c/n) can be approximated as a branching process in which the number of offspring of an individual has the Poisson distribution with mean c.
(2) If we fix the number of vertices and edges in a sparse random graph, and condition on the event that all degrees are greater than or equal to k, then breadth-first search can be approximated as a branching process in which the number of offspring of an individual has a truncated Poisson distribution.
(3) Vertex deletion or edge deletion processes on a graph drawn from G(n, c/n) can often be described exactly by Markov chains, and their behavior as n tends to infinity can be described by differential equations.
As an application we sketch a (not completely rigorous) derivation due to M. Saks and the speaker of the threshold for the existence of an orientation of the edges of H such that no vertex has more than k edges oriented into it.

[1] J. Aronson, A. Frieze and B.G. Pittel, Random Structures and Algorithms (1998 )

[2] R.M. Karp and M. Sipser, Proc. Twenty-Second Annual IEEE Symposium on Foundations of Computing (1981)

[3] B. Pittel, S. Spencer and N. Wormald, Journal of Combinatorial Theory, Vol. 67, No. 1 (1996)

Jeong Han Kim, Microsoft Research

Title: The Scaling Window of the 2-SAT Transition

We consider the random 2-satisfiability problem, in which each instance is a formula that is the conjunction of $m$ clauses of the form $x\vee y$, chosen uniformly at random from among all 2-clauses on $n$ Boolean variables and their negations. As $m$ and $n$ tend to infinity in the ratio $m/n\rightarrow\alpha$, the problem is known to have a phase transition at $\alpha_c = 1$, below which the probability that the formula is satisfiable tends to one and above which it tends to zero. We determine the finite-size scaling about this transition, namely the scaling of the maximal window $W(n,\delta) = (\alpha_-(n,\delta), \alpha_+(n,\delta))$ such that the probability of satisfiability is greater than $1-\delta$ for $\alpha < \alpha_-$ and it is less than $\delta$ for $\alpha > \alpha_+$. We show

$$W(n,\delta)=(1-\Theta(n^{-1/3}), 1+\Theta(n^{-1/3})),$$

where the constants implicit in $\Theta$ depend on $\delta$. We also determine the rates at which the probability of satisfiability approaches one and zero at the boundaries of the window. Namely, for $m=(1+\varepsilon)n$, where $\varepsilon$ may depend on $n$ as long as $|\varepsilon|$ is sufficiently small and $|\varepsilon| n^{1/3}$ is sufficiently large, we show that the probability of satisfiability decays like $\exp\left(-\Theta\left({n\varepsilon^3}\right)\right)$ above the window, and goes to one like $1-\Theta\left( n^{-1}|\varepsilon|^{-3}\right)$ below the window. We prove these results by defining an order parameter for the transition and establishing its scaling behavior in $n$ both inside and outside the window. Using this order parameter, we prove that the 2-SAT phase transition is continuous with an order parameter critical exponent of 1. We also determine the values of two other critical exponents, showing that the exponents of 2-SAT are identical to those of the random graph. Joint work with B. Bollobas, C. Borgs, J. T. Chayes, and D. B. Wilson.

Leonid Levin, Boston University

Title: The Tale of One-Way Functions

This is a sequel to the talk "The Treacherous Averaging" I gave here two years ago. I concentrate now on the concept of OWF which differs subtly from that of Hard on Average problems. After an overview I will try to sell some open problems which seem important while not hopeless.

Title: Random Graphs and Positional Games

Abstract: Let $\Gamma(H;n,q)$ denote the game in which two players, Maker and Breaker, alternately claim edges from the complete graph $K_n$. In each round Maker picks one edge, then Breaker answers with $q$ edges, chosen among all pairs which have not been claimed so far. The aim of Maker is to build a copy of a graph $H$; if he fails to do that he loses. This game has been introduced by Chvat\'al and Erd\H{o}s, and similar types of biased positional games have been thoroughly studied by Beck. In particular, it turns out that many results proved for such games have their counterparts among theorems on the behaviour of the random graph $G(n,p)$, where $p=1/(q+1)$.

In the talk we use some results on random graphs and show that the threshold function'' $q_0(n)$ for the game $\Gamma(H;n,q)$ is $\Theta(n^{1/\alpha(H)})$, where $\alpha(H)=\max_F\{(e(F)-1)/(v(F)-2)\}$ and the maximum is taken over all subgraphs $F$ of $H$ with at least two edges. More precisely, we prove that if $q\le cn^{1/\alpha(H)}$ and $c>0$ is a small constant, then Maker has a winning strategy for $\Gamma(H;n,q)$, while for $q\ge Cn^{1/\alpha(H)}$ and large enough~$C$, Breaker can win $\Gamma(H;n,q)$. We also briefly discuss some related results on combinatorial games.

This is joint work with Ma{\l}gorzata Bednarska.

George Lueker, University of California - Irvine

Title: Expected Length of Longest Common Subsequences

David W. Matula, Southern Methodist University

Title: A Cell Occupancy Sieve Algorithm Resolving Hardest Case Roundings of Floating Point Functions

Hardware implementations of $n$-bit floating point functions, such as square root, reciprocal, and selected transcendentals, require well defined outputs, such as $n$-bit round-to-nearest, to provide deterministic portable and testable computations. Employment of extra-precise fast function approximations encounters problems, in that typically about $2^k$ input $n$-bit arguments require approximations accurate to one part in $2^{2n-k}$ to obtain correct roundings.We herein propose and analyze a probabilistic sieve algorithm for resolving the $2^k$ hardest to round cases (practical up to $k\approx 15$) allowing much less than double precision approximation to suffice for determining correctly rounded results. We model this hard-to-round'' problem by a sparse random boolean function $B(n,p)$ which assigns an $n$-bit vector hard-to-round label to a red ball (denoting round down) with probability $p/2$, and similarly to a blue ball (denoting round up) where $p\cdot n = 2^k < < 2^n$. The sieve algorithm employs cell occupancy trials to alternately sift-out red and blue balls. Each trial distributes all remaining balls into an array of $2^i$ cells according to an $i$-bit index from the ball's bit vector label, with each cell's {\it sift-out/pass-through} state contained in a $2^i\times 1$ bit table governing that trial.

We demonstrate that with high probability such cell occupancy sieves determine the rounding direction for any hard-to-round case, at a total table size cost amounting to just $2$ bits per case, with a related lower bound result suggesting at least nearly one bit per case is required. In current hardware, an acceptably small 8K byte ROM can thereby reduce the required function approximation accuracy by some $15$ bits, making efficient single precision rounded function implementation a serious possibility.

Andrzej Rucinski, Adam Mickiewicz University, Poznan, Poland and Emory University, Atlanta

Title: Embedding Sparse Graphs into Dense, Regular Graphs

In the talk I will address the problem stated in the title, beginning with the classic result of Chvatal, Rodl, Szemeredi and Trotter from 1983 and ending with a survey of several proofs of the so called "Blow-up Lemma" of Komlos, Sarkozy and Szemeredi. In the end, the algorithmic aspect of the problem will be discussed.

Gregory Sorkin, IBM T.J. Watson Research Center

Title: A Biased Walk Through Simulated Annealing

Sixteen years after its invention, what do we know about simulated annealing? The classical asymptotic analysis is based on an impractical "logarithmic" cooling schedule, and it fails to distinguish annealing from the fixed-temperature Metropolis algorithm. A small assortment of negative results helps delineate cases where no variant of annealing is efficient, while the few positive results for natural combinatorial optimization problems are all achieved by the simple Metropolis algorithm. I will describe one class of artificial problems for which annealing is efficient but Metropolis is not, and where the annealing result employs the "geometric" cooling schedule favored in practice.

Joel Spencer, Courant Institute, NYU

Title: Random Sequential Greedy

Analysis of Algorithms of the above type have been subtle, but there have been some notable successes. And failures.

Angelika Steger, Munich University of Technology

Title: Balanced Allocations: More balls than bins

Balls and bins algorithms using multiple choices for each ball have caused furor in recent years. In contrast to the classical balls and bins game in which each ball is placed into a randomly selected bin, these algorithms consider several random locations for each ball and place the ball into the bin with minimal load.

Almost all previous analyses focus on the lightly loaded case, i.e., $m = O(n)$. In this case, the multiple-choice algorithms improve greatly upon the classical one-choice algorithm. The known results for the heavily loaded case, i.e., $m \gg n$, however, were not impressive at all. Indeed, here the best known result is that the fullest bin contains $\frac{m}{n} + O(\sqrt{m \cdot \ln n / n})$ balls, which just corresponds to the maximum load produced by the classical one-choice algorithm.

In this talk, we show that the actual distribution produced by the multiple-choice algorithms is much better than this prediction. We explicitely calculate the distributions produced by different multiple-choice algorithms. For example we show that for the sequential two-choice algorithm the deviation of the fullest bin from the average is at most $(\ln \ln n)/\ln 2 + \Theta(1)$, independently of the number of balls.

This is joint work with Petra Berenbrink, Artur Czumaj and Berthold V\"ocking.

L. Stougie, Einhoven University of Technology

Fast Randomized Algorithms for Stochastic Programming Problems

We present a fast randomized algorithm for solving smooth convex programming problems. Smoothness is defined in a specific but intuitive way. As a main area of application of the resulting method we consider stochiastic programming problems. In particular, we design a fully polynomial randomized approximation scheme for the #P-hard stochastic recourse problem. The assumptions we need on the smoothness of the functions involved coincide with the assumptions needed for fast approximate evaluation of the objective value in feasible points of the problem.

Joint work with M. Dyer, R. Kannan and A. Nolte

C.R. Subramanian, Max-Planck Institute fur Informatik

Title: A BFS approach to partitioning problems on random graphs

Several graph partitioning problems (like $k$-coloring, finding a bisection of minimum width) are NP-hard but are solvable in polynomial time over random instances. We present a simple approach based on BFS trees to solve such partitioning problems on random graphs. In this talk, we illustrate this approach in the context of the $k$-coloring problem. Consider random $k$-colorable graphs drawn by choosing each allowed edge independently with probability $p$ after initially partitioning the vertex set into $k$ color classes. We show that growing BFS trees partially from each vertex is sufficient to separate the largest or smallest color class. Repeating this procedure $k-1$ times, one obtains a $k$-coloring of $G$ with high probability, even for very small values of $p$. We also show how to use this approach so that one gets even smaller failure probabilities at the cost of running time. Based on this, we present polynomial average time (\pavt) $k$-coloring algorithms for the same range of $p$. This approach is applicable to general partitioning problems in which vertices are to be split into parts with each part having specified densities.

Santosh Vempala, M.I.T.

Title: Inapproximability via the Probabilistic Method

We show that the traveling salesman problem with triangle inequality cannot be approximated within 41/40 when the edge lengths are allowed to be asymmetric and within 129/128 when the edge lengths are symmetric. The best previous lower bounds were 2805/2804 and 5381/5380 respectively. The reduction, from H\aa stad's maximum satisfiability of linear equations modulo 2, is nonconstructive and relies on a probabilistic proof of the existence of graphs with specialized expansion-like properties.

Joint work with Christos H. Papadimitriou

R. Venkatesan, Microsoft

Title: Random Cayley Graphs and Discrete Log

Santosh Venkatesh, University of Pennsylvania

Title: Information Storage in Finite Memory

Finite memory problems have a long history in statistics. In recent work it has been shown that randomised on-line algorithms can be efficiently deployed to store information about the entire past in a single bit of memory in a setting where all deterministic strategies can be shown to fail. I will show the application of these ideas in a binary integer programming setting and summarise a variety of results with a random graph flavour. The analysis tools may be of independent interest and involve delicate Poissonisation arguments for large deviations in row sums of triangular arrays of random variables. The general resolution along these lines of the efficacy of on-line storage of information when memory resources are finite has implications to optimal data compression and to the design of algorithms for hard problems with memory constraints.

Van Vu, Microsoft

Title: Approximation of the independence number in polynomial expected running time

Approximate the independence number (the size of the largest independent set) is a notoriously hard problem. Due to a recent result of Hastad, one expects that there is no polynomial time algorithm which can approximate this number up to the ratio $n^{1- \epsilon}$ for any positive constant $\epsilon$, in the worst case (where $n$ is the number of vertices of the graph).

We consider the average case of the problem. Take a graph uniformly randomly from the set of all graphs on $n$ points (i.e, $G(n, 1/2)$). We prove that there is a deterministic algorithm which can approximate the independence number up to a $O(n^{1/2})$ factor with polynomial expected running time. The result can be generalized to $G(n,p)$ with arbitrary $p$. The core of the proof is a new application of Talagrand's inequality and might be of independent interest.

Joint work with M. Krivelevich.

Peter Winkler, Bell Labs

Title: Optimality and Greed in Dynamic Allocation

In dynamic storage allocation objects arrive according to some sort of random process, are assigned space in a facility, and terminate randomly, freeing space for further use. Constraints on space may sometimes force an arrival to be rejected, costing money.

Often in such a problem intuition tells us where in the facility to place an arrival, but it is notoriously difficult to prove that an on-line algorithm is best possible unless the process is completely specified.

We will describe a new technique which can sometimes prove optimality without a specified arrival distribution, provided one assumes that it's wrong to reject an arrival when there is space for it. Problems to which this technique can be applied have already risen twice at Lucent Technologies; in each case we have shown that if it's right to be greedy, then it's right to be efficient.

Joseph Yukich, LeHigh University

Title: A survey of the probability theory of the Euclidean TSP

Abstract: This talk will survey recent methods and results describing the probabilistic behavior of the Euclidean TSP, as well as other classical problems in Euclidean optimization. Laws of large numbers, large deviation principles and central limit theorems will be discussed in the context of isoperimetric methods, boundary functionals, and superadditivity. The talk, which will include some open problems, will focus on ideas rather than detailed techniques.

DIMACS Homepage
Contacting the Center