This workshop is jointly sponosored by Princeton University and the Center for Computational Intractability.
Title: Very Local Self Correcting of Homomorphism and MPC Codes
Blum-Luby-Rubinfeld gave a local self correcting algorithm for homomorphism codes Hom(G,H) for any finite abelian groups $G,H$. The BLR algorithm, given an entry location $s$ and oracle access to a corrupted codeword w close to a codeword C, outputs the value of C on entry s, while reading only a constant number of entries in w. Although the number of {\em entries} read by the BLR algorithm is constant, the number of read {\em bits} is O(log|H|), as each alphabet symbol is represented by log|H| bits. This number of read bits may be quite large with respect to the information rate log|G|; in particular, for Hom(Z_N,Z_N) the number of read bits is larger than the information rate.
In this work we present a local self correcting algorithm for homomorphism codes Hom(Z_N,Z_N) that reads only a {\em constant number} of {\bf bits} to output a single bit of the value of the closest codeword on the given entry. Our algorithm improves over prior works in reducing the number of read bits from O(log N) to O(1).
In addition we use our algorithm to obtain the first local self correcting algorithm for MPC codes, where MPC codes are a class of binary error correcting codes introduced in Akavia-Goldwasser-Safra FOCS 2003.
In terms of techniques, we develop a new approach for local self correcting, in which we first reduce the self correcting problem into a property testing problem concerning the location of significant Fourier transform coefficients (aka, {\em SFT-testing}), and then give an efficient algorithm solving SFT-testing with a constant number of read bits. The SFT-testing algorithm may be of independent interest.
Title: How to beat the Monte Carlo method
To compute, or at least estimate, the integral of a complicated function, especially in higher dimensions, is very difficult. The only available option is to use a random sample (Monte Carlo method). It has a built-in square root error, which cannot be improved in general. Our first result is how to replace the Monte Carlo method with an ``almost regular sample", which has the same worst case error term, but has many advantages.
Our second result is to describe the ``most uniformly distributed curves in the unit square". The most surprising fact here is that the problem has a definite answer. Finally, we show applications.
Title: Testing by Implicit Learning
The goal of property testing is to distinguish between the case that an object has a pre-specified property and the case that it differs significantly from any such object. Property testing can be viewed as a relaxation of learning (i.e., obtaining an approximate representation of an object). We therefore expect testing algorithms to be significantly more efficient than learning algorithms. We describe a general method for testing whether a function on n variables has a concise representation (e.g. an s-term DNF, a size-s decision tree, or an s-sparse polynomial). Roughly speaking, our approach may be viewed as a query-efficient reduction of testing to implicit proper learning. Our generic algorithm combines ideas from the junta test of Fischer et al. with ideas from learning theory, and yields property testers that make poly(s/epsilon) queries for several function classes, including the ones above. The query complexity of the algorithm is independent of n -- as opposed to the best learning algorithms -- and the polynomial dependence in s is necessary in general. The generic algorithm is not time efficient (its running time is exponential in s). An enhancement of the technique, using more sophisticated learning technology, yields a testing algorithm for the class of s-sparse GF(2) polynomials that is both time efficient and query efficient.
Based on joint works with H. Lee, K. Matulef, K. Onak, R. Rubinfeld, R. Servedio, and A. Wan.
Title: Constant time distributed algorithms, parameter and property testing on very large graphs of small vertex degrees. A measure theoretical approach.
We use the theory of measurable equivalence relations for property and parameter testing of large graphs of small vertex degrees.
The classes of graphs we consider are the so-called hyperfinite classes and our method can be viewed as an alternative approach to the recent results of Czumaj/Sohler and Benjamini/Schramm/Shapira.
Besides testing certain parameters, say, the size of the maximal independent set, we show how to construct in constant time independent sets of approximately maximal size via distributed algorithms.
Title: On Distance to Monotonicity and Longest Increasing Subsequence of a Data Stream
In this talk we consider problems related to the sortedness of a data stream. First we investigate the problem of estimating the distance to monotonicity; given a sequence of length n, we give a deterministic 2+epsilon streaming algorithm for estimating its distance to monotonicity in space O((1/epsilon^2}log^2 n). This improves over the randomized 4+epsilon-approximation algorithm of {gopalan07}.
We then consider the problem of approximating the length of the longest increasing subsequence of an input stream of length n. We use techniques from multi-party communication complexity combined with a direct-sum approach to prove that any deterministic streaming algorithm that approximates the length of the longest increasing subsequence within 1+epsilon for an arbitrarily small constant epsilon requires at least Omega(sqrt(n)) space. This proves the conjecture in {gopalan07}.
Joint work with H. Jowhari
Title: Approximate hypergraph partitioning and applications to regularity
The celebrated result of Goldreich, Goldwasser and Ron provides a constant query complexity property tester for any property of graphs defined as having a partition of the vertex set with prescribed subset sizes and edge densities, as well as an algorithm for finding an approximate partition whose running time is linear in the output size (making it sublinear in the input size).
Here this result is generalized to hypergraphs, including "directed" ones with several sets of hyperedges (these also go by the name of relational structures). Still a partition of the vertex set is sought, but now hyperedge densities over k-tuples of vertex subsets are prescribed. Many previous hypergraph testing and CNF testing results turn out to be special cases of this generalization.
A new and important application is that of finding regular partitions (in the sense of Szemeredi's lemma) in graphs. Up to a polynomial slack in the parameter, the regularity of a pair of density eta can be described by a relation between the number of its edges and the number of the squares it spans. With this alternative formulation it is possible to test for the existence of a regular partition of a graph with constant query complexity, and produce it in time linear in the output size. The main advantage over previous approaches is that here it is possible to produce a small regular partition if one exists, rather than being only guaranteed a partition whose size is that of the theoretical worst case of the lemma.
This is joint work with Arie Matsliah and Asaf Shapira, and has appeared in FOCS 2007.
Title: Succinct representation of codes with applications to testing
Motivated by questions in property testing, we search for linear error-correcting codes that have the ``single local orbit'' property: i.e., they are specified by a single local constraint and its translations under the symmetry group of the code. We show that every ``sparse'' binary code whose coordinates are indexed by elements of F_{2^n} for prime n, and whose symmetry group includes the group of non-singular affine transformations of F_{2^n}, has the single local orbit property. (A code is said to be ``sparse'' if it contains polynomially many codewords in its block length.) In particular this class includes the dual-BCH codes for whose duals (i.e., for BCH codes) simple bases were not known. Our result gives the first short (O(n)-bit, as opposed to the natural exp(n)-bit) description of a low-weight basis for BCH codes. Since it is known that codes with a single local orbit under the affine symmetry group are locally testable, we get that all the codes we consider are locally testable, adding to the body of sparse locally testable codes.
If, in addition to n being prime, if 2^n-1 is also prime, then we get that every sparse cyclic code also has the single local orbit. In particular this implies that BCH codes of Mersenne prime length are generated by a single low-weight codeword and its cyclic shifts.
Our techniques involve the use of recent results from additive number theory to prove that the codes we consider, and related codes emerging from our proofs, have high distance. We then combine these with the MacWilliams identities and some careful analysis of the affine-invariance property to derive our results.
It is joint work with Tali Kaufman and Madhu Sudan
Title: Sparse Random Linear Codes are Locally Decodable and Testable
We show that random sparse binary linear codes are locally testable and locally decodable (under any linear encoding) with constant queries (with probability tending to one). By sparse, we mean that the code should have only polynomially many codewords. Our results are the first to show that local decodability and testability can be found in random, unstructured, codes. Previously known locally decodable or testable codes were either classical algebraic codes, or new ones constructed very carefully.
Joint work with Madhu Sudan
Title: Property testing in sparse and general graphs
When thinking about Property Testing in graphs, one frequently has in mind the so called adjacency matrix model, where an algorithm queries the adjacency matrix of an input graph. This model is perfectly tailored for the dense case where the input graph is tacitly assumed to be dense, i.e., to have a quadratic number of edges. However there is much more to it, and other models for graph property testing exist and are equally exciting. Most notable of them are: - the incidence lists model, where input graphs are represented by the incidence lists of their vertices; this model is most suitable for sparse or even bounded degree graphs; - the hybrid model, where queries to both adjacency matrix of the input graph and to its incidence lists are allowed; this model is suitable in principle for graphs of arbitrary density. In this talk I will survey these models and their ramifications, mentioning several striking results obtained for them and also indicating possible research directions.
Title: Property Testing in the Underlying Graph Model
We consider here the problem of property testing in a model that was introduced by Halevi et. al (2005) and is referred to as the underlying graph model. In this model the tester has full knowledge of an underlying undirected fixed graph G=(V,E). A property is then a collection of assignments on the edges (vertices). Such an assignment can be interpreted in several ways. One such interpretation is considering the assignment as a characteristic function of the edges of a subgraph of G. Hence, a property will be a collection of G-subgraphs, e.g. the subgraphs that are bipartite, 3-colorable and so on. Another is to interpret the Boolean assignment on the edges as an orientation of the edges (relative to some fixed predefined orientation). In this case, a property is a property of G-orientations e.g. being strongly connected, Eulerian etc. A third, assignment as an assignment to a collection of edge variables. In this case, we also assume some fixed predefined collection of vertex constraints; A property now is the set of all assignments that satisfy all the constraints simultaneously.
A known variation of this interpretation is the case in which the vertices are associated with variables, and the assignment to each edge is interpreted as a selection of constraint over the variables adjacent to it. In this case a property is a selection of constraints that are simultaneously satisfiable by some assignment to the vertex variables.
I will survey several positive results (namely, a construction of testers) for several natural properties, several impossibility results (lower bounds on the complexity of any tester) and some connections to other models. Among these I will discuss the relation to Proofs of Proximity.
Title: Testing Halfspaces
We address the problem of testing whether a Boolean-valued function $f$ is a halfspace, i.e. a function of the form $f(x)=$sign$(w \cdot x - \theta).$ We consider halfspaces over the continuous domain ${\bf R}^n$ (endowed with the standard multivariate Gaussian distribution) as well as halfspaces over the Boolean cube $\{-1,1\}^n$ (endowed with the uniform distribution). In both cases we give an algorithm that distinguishes halfspaces from functions that are $\epsilon$-far from any halfspace using only poly$(\frac{1}{\epsilon})$ queries, independent of the dimension $n$. These testing results are based on new structural results about the Fourier and Hermite coefficients of halfspaces.
Title: Tolerant testing for nearly-sortedness and applications
An array A of n reals is (k,l)-nearly sorted if it contains a set E of k elements, such that in A'=A\E for every i < j either A'[i]<=A'[j] or j-i<=l. In this talk we will see a (tolerant) tester that can distinguish (k,l)-nearly sorted arrays from those that are not even (ck,cl)-nearly sorted. Then we will describe an efficient algorithm for sorting (k,l)-nearly sorted arrays. Finally, we will discuss some potential applications of these algorithms in databases.
Joint work with Sagi Ben-Moshe, Eldar Fischer, Yaron Kanza.
Title: Transitive-closure Spanners
We define the notion of a transitive-closure spanner of a directed graph. A transitive-closure spanner of a given directed graph is a graph of small diameter that preserves connectivity of the original graph. Formally, given a directed graph G = (V,E) and an integer k>=1, a k-transitive-closure-spanner (k-TC-spanner) of G is a directed graph H = (V, E_H) that has (1) the same transitive-closure as G and (2) diameter at most k. These spanners were studied implicitly in sublinear algorithms, access control, and data structures, and properties of these spanners have been rediscovered over the span of 20 years. We bring these areas under the unifying framework of TC-spanners. We abstract the common task implicitly tackled in these diverse applications as the problem of constructing sparse TC-spanners. We study the size of the sparsest k-TC-spanners for specific graph families, as well as the computational task of finding the sparsest k-TC-spanner for a given input graph. We present new positive and negative results on both fronts. In the talk, we will explain some ramifications of these results for testing monotonicity of functions. Joint work with Arnab Bhattacharyya, Elena Grigorescu, Kyomin Jung and David Woodruff.
Title: Testing Global Properties of Distributions
We survey several works regarding the complexity of testing global properties of distributions, when given access to only a few samples from the distributions. Such properties might include testing if two distributions have small statistical distance, testing various independence properties, testing whether a distribution has a specific shape (such as monotone decreasing), and approximating the entropy. Classical techniques, such as the Chi-squared test or the straightforward use of Chernoff bounds, have sample complexities that are at least linear in the size of the support of the underlying discrete probability distributions. We will describe bounds for many such testing problems whose sample complexities are sublinear in the size of the support.
Title: Low degree testing
Given a multivariate function over a finite field we would like to decide whether this function is a polynomial of a given (low) degree, or, is in fact rather far from all such polynomials. This decision should be made very quickly - after querying the function in a small number of places. In other words, we want to test the functional property of being a multivariate low-degree polynomial.
This is a natural property testing question which turns out to have many interesting connections with other ares of theoretical computer science and mathematics, such as coding theory, complexity, and additive number theory.
In this survey talk we will discuss the problem and try to cover some of the related results and developments.
Title: Green's Conjecture and Testing Linear Invariant Properties
Given a set of linear equations Mx=b, we say that a set of integers S is (M,b)-free if it contains no solution to this system of equations. Motivated by questions related to testing linear-invariant Boolean functions, as well as recent investigations in additive number theory, the following conjecture was raised (implicitly) by Green and by Bhattacharyya, Chen, Sudan and Xie: we say that a set of integers S \subseteq [n], is \epsilon-far from being (M,b)-free if one needs to remove at least \epsilon*n elements from S in order to make it (M,b)-free. The conjecture was that for any system of homogeneous linear equations Mx=0 and for any \epsilon > 0 there is a constant time algorithm that can distinguish with high probability between sets of integers that are (M,0)- free from sets that are \epsilon-far from being (M,0)-free. Or in other words, that for any M there is an efficient testing algorithm for the property of being (M,0)-free. In this paper we confirm the above conjecture by showing that such a testing algorithm exists even for non-homogeneous linear equations. As opposed to most results on testing Boolean functions, which rely on algebraic and analytic arguments, our proof relies on results from extremal hypergraph theory, such as the recent removal lemmas of Gowers, R?odl et al. and Austin and Tao.
Title: Property Testing in the Dense Graph Model
One of the most well studied models of property testing is testing properties of graphs represented as an adjacency matrix. In the past decade there have been many exciting results in this area which have lead to an almost complete (qualitative) understanding of the testability of graph properties in this model. In this lecture I will give an overview of the many results related to this area.
Title: Property testing and graph limits
In this talk we review topological methods for property testing. We also present a general property testing result for a family of relations with axioms.
Title: Algebraic Property Testing: A Survey
Property testing emerged in the wake of the ``linearity test'' of Blum, Luby and Rubinfeld, and the ``multilinearity test'' of Babai, Fortnow and Lund. Subsequently a formal definition was proposed by Rubinfeld and myself. All original examples of properties being studied were algebraic in nature. The first extension of the field to non-algebraic properties started with the work of Goldreich, Goldwasser, and Ron; and since then the field has blossomed with a very diverse collection of properties having been shown to be testable, with a rich variety of tools.
In this talk we will return to some of the original motivations, of algebraic property testing. We will survey some of the known results, and consequences to probabilistically checkable proofs and locally testable codes. We will conclude by presenting a unifying analysis, based on joint work with Tali Kaufman, of many of the basic algebraic property tests by focussing on the underlying invariance features of the properties.
Title: Random matrices: The distribution of the smallest singular values
Let $x$ be a real-valued random variable of mean zero and variance $1$. Let $M_n(x)$ denote the $n \times n$ random matrix whose entries are iid copies of $x$ and $\sigma_n(M_n(x))$ denote the least singular value of $M_n(x)$.
The problem of understanding the distribution of the least singular value of a random matrix was first raised by von Neuman and Goldstein in the 1940s, in their studies on numerical linear algebra. Since then, it has become an important problem in the theory of random matrices, numerical analysis and smooth analysis of linear programming.
We show that (under a finite moment assumption) the probability distribution of $ n^{1/2} \sigma_n(M_n(x))$ is UNIVERSAL in the sense that it does not depend on the distribution of $x$. In particular, it converges to the same limiting distribution as in the special case when $x$ is real gaussian. (The limiting distribution was computed explicitly in this case by Edelman in 1988.)
Our approach is guided by the general idea of ``property testing'' from combinatorics and theoretical computer science. This seems to be a new approach in the study of spectra of random matrices and combines tools from various areas of mathematics.
Joint work with T. Tao (UCLA).
Title: New Direct Product Code Testers, and PCPs
The direct product code encodes the truth table of a function f:U-->R by a function f^k:U^k -->R^k, which lists the value of f in all k-tuples of inputs. We study its (approximate) proximity testing to this code in the "list decoding" regime. We give a 3-query tester for distance 1-exp(-k) (which is impossible with 2 queries). We also give a 2-query tester for distance 1-1/poly(k) for a derandomized version of this code (of polynomial rate).
We then use the same techniques to reduce soundness errors in 2-query PCPs which gives incomparable decay to the Parallel Repetition Theorem
Joint work with Valentine Kabanets and Russell Impagliazzo.