Title: Efficiency of Adaptive MCMC
Recently, many adaptive MCMC algorithms have been proposed partly as a solution to the parameter-tuning problem in classical MCMC. The idea is to design Monte Carlo samplers that can optimize their own parameters for a given sampling problem. This talk will discuss the efficiency of such procedures; the notion of "efficiency" concerns the possibility for adaptive MCMC samplers to behave like the "optimal" MCMC sampler. (This is joint work with Christophe Andrieu, Bristol University.)
Title: Using SIS to Speed Up MCMC
We describe strategies for speeding up MCMC so that it can be used for practical computation on hard problems, such as estimating the number of monomer-dimer covers of a lattice. Our approach is based first on using aggregation to get a simplified version of the problem. The simplified problem is then used to get approximate values that can be used to define fugacity for use in MCMC to obtain accurate results faster.
Our second technique is based on sequential importance sampling (SIS). We show how to use SIS to generate the transition matrices for the aggregated process and we comment on possible uses of SIS allowing direct approximation of the constants needed for MCMC without requiring aggregation.
(This is joint work with Francis Sullivan, IDA/Center for Computing Sciences.)
Title: Centers for Random Walks on Trees
We consider two distinct centers which arise in measuring how quickly a random walk on a tree mixes. Lovasz and Winkler have shown that stopping rules which "look where they are going" (rather than simply walking a fixed number of steps) can achieve a desired distribution exactly and efficiently. Considering an optimal stopping rule that reflects some aspect of mixing, we can use the expected length of this rule as a mixing measure. On trees, a number of these mixing measures identify particular nodes with central properties. In this context, we study a variety of natural notions of centrality. Each of these criteria identifies the barycenter of the tree as the "average" center and the newly defined focus as the "extremal" center.
Title: Binary Contingency Tables in Theory and Practice
The binary contingency tables problem, arising in the design of statistical tests, is to estimate the number of 0/1 matrices with prescribed row and column sums. Several approaches have been considered for solving the problem -- most notably sequential importance sampling (SIS) and simulated annealing. We will describe both methods and discuss their limitations.
The sequential importance sampling method of Chen, Diaconis, Holmes, and Liu is guaranteed to converge to the correct count and it appears to work well in practice. Recently, Blanchet showed that for n x n tables with row sums and column sums bounded by o(n^{1/4}) the method converges in O(n^2) trials. However, in the worst case the required time is exponential. More precisely, with Sinclair, Stefankovic, and Vigoda we give a family of examples in which the SIS procedure, if run for any subexponential number of trials, will underestimate the number of tables by an exponential factor. This result holds for any of the usual design choices in the SIS algorithm, namely, the ordering of the columns and rows.
On the other hand, a simulated annealing approach yields a fully polynomial approximation scheme (fpras) for any input, as was first shown by Jerrum, Sinclair, and Vigoda. In this talk we describe a joint result with Bhatnagar and Vigoda which uses a different and faster simulated annealing approach; the main idea is to start the annealing process with a nontrivial instance. This is the currently fastest fpras for the problem, running in time O(n^{11}) for n x n tables.
Title: Designing and Testing Efficient Sequential Importance Sampling Algorithms
Sequential importance sampling has been reported to give excellent empirical performance in practical problems -- some of them related to statistical inference and others related to performance analysis of stochastic systems in engineering. The theoretical analysis of importance samplers has a long tradition within the rare-event analysis community and some of the techniques that have been developed recently are useful in applications of interest to computer scientists and statisticians. In order to illustrate these techniques, I'll study in detail a sequential importance sampler proposed by Chen, Diaconis, Holmes, and Liu (2005) for the statistical analysis of binary contingency tables with fixed margins. Chen et al (2005) use their algorithm to count the number of such tables. We shall illustrate how our techniques, based on Lyapunov bounds for Markov chains, allow us to compute a quantitative bound for the variance of the estimator of Chen et al (2005) and also prove that such algorithm allows (assuming that the margins grow slowly as the size of the table increases) for the approximate counting of the number of such tables with basically quadratic complexity.
Title: Computing for Bayesian Spatial Estimation and Prediction with Application to Residential Radon
Because exposure to radon gas in buildings is a likely risk factor for lung cancer, estimation of residential radon levels is an important public health endeavor. We describe an efficient MCMC algorithm that enables fitting a Bayesian geostatistical model that appropriately combines areal radon data with point-source measurements of indoor home radon in the state of Iowa.
(This is joint work with Brian Smith and Jun Yan.)
Title: Exact and Approximate Monte Carlo for Spatial Models
Hierarchical Bayesian approaches to modeling spatial data are becoming increasingly common. However, even fairly standard spatial models often result in posterior distributions that present challenges to non-expert users of Markov chain Monte Carlo (MCMC) methods. In addition to having to figure out an appropriate MCMC algorithm for each new problem, MCMC users have two long-standing questions: How long should one run the chain and, once the algorithm is stopped, how accurate are the resulting estimates? I will describe some approaches for resolving these issues for a small but useful class of spatial models. The first approach involves the construction of exact sampling procedures for which these questions are moot since the draws are i.i.d. The second (closely related) approach involves constructing MCMC algorithms with good theoretical properties, which allows for the computation of consistent estimates of Monte Carlo standard errors. These standard errors can then be used to determine the run length of the MCMC algorithm. I will conclude with a comparison of these methods in the context of some data examples.
Title: A Theoretical Comparison of the Data Augmentation, Marginal Augmentation, and PX-DA Algorithms
The PX-DA algorithm of Liu and Wu (1999, JASA) and the marginal augmentation (MA) algorithm of Meng and van Dyk (1999, Biometrika) are recently developed alternatives to the standard data augmentation (DA) algorithm. PX-DA and MA often converge much faster than DA and are only slightly more computationally demanding. This talk concerns a family of generalizations of the DA algorithm that includes the PX-DA and MA algorithms. The main result is that each member of the family is more efficient than DA in the sense of smaller asymptotic variance in the central limit theorem. As an example, the DA algorithm of Albert and Chib (1993, JASA) for Bayesian probit regression is compared with the alternative PX-DA algorithm developed by Liu and Wu.
(This is joint work with Dobrin Marchev, Baruch College (CUNY) and Vivekananda Roy, University of Florida.)
Title: Fixed-Width Output Analysis for Markov Chain Monte Carlo
Markov chain Monte Carlo is a method of producing a correlated sample from a target distribution. Features of the target distribution are then estimated using this sample. Thus a fundamental question in MCMC is: When should the sampling stop? That is, when have we achieved good estimates? I will introduce a method that stops the MCMC sampling when the width of a confidence interval is less than a user-specified value. Hence calculating Monte Carlo standard errors is a critical step in assessing the output of the simulation. In this talk I will give an overview of fixed-width methodology and methods for calculating Monte Carlo standard errors. The main results will be illustrated in several examples.
Title: Markov Chain Monte Carlo: Practice; or: The Full Monte Carlo: A Live Performance (with Stars)
Markov chain Monte Carlo (MCMC) methods, originating in computational physics about half a century ago, have seen an enormous range of applications in quantitative scientific investigations. This is mainly due to their ability to simulate from very complex distributions such as the ones needed in realistic statistical models. The first part of this tutorial introduces the two most frequently used MCMC algorithms: the Gibbs sampler and the Metropolis-Hastings algorithm. Using simple yet non-trivial examples, we demonstrate, via live performance, the good, bad, and ugly implementations. Audience participations are required, though no prior experiences are needed. In the second part of this tutorial, a newly obtained interweaving Data Augmentation algorithm is presented for fitting a Cox Process model to an astrophysics dataset.
Title: MCMC Algorithms for Distributions with Intractable Normalizing Constants, with a View to Perfect Simulation and Non-parametric Bayesian Inference for Inhomogeneous Markov Point Processes
Sampling from Bayesian posterior distributions are problematic when the likelihood for the parameter of interest involves an intractable normalizing constant which is also a function of that parameter. In [3] we present new methodology for drawing samples from such a posterior distribution without approximation, in [1] we illustrate the method, and in [2] we discuss how to do perfect simulation, which is an important ingredient of the method in [3].
References:
[1] K.K. Berthelsen and J. Møller (2007). Non-parametric Bayesian inference for inhomogeneous Markov point processes. Research Report R-2007-9, Department of Mathematical Sciences, Aalborg University. [2] W.S. Kendall and J. Møller (2000). Perfect simulation using dominating processes on ordered spaces, with application to locally stable point processes. Advances in Applied Probability, 32, 844-865. [3] J. Møller, A.N. Pettitt, K.K. Berthelsen and R.W. Reeves (2006). An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika, 93, 451-458.
Title: Mixing of Gibbs Sampling on Random Graphs
The tree-like structure of random graphs suggests that rapid mixing of Gibbs sampling on random graphs should depend on properties of the corresponding trees. Proving rigorous statements to that effect is a major challenge. I will describe two recent results where such statements were proven.
(This talk is based on joint projects with Dror Weitz and Nick Wormald, and with Allan Sly.)
Title: Can Extra Updates Delay Mixing?
We consider Glauber dynamics (starting from an extremal configuration) in a monotone spin system, and show that interjecting extra updates cannot increase the expected Hamming distance and the total variation distance to the stationary distribution. We deduce that for monotone Markov random fields, when block dynamics contracts a weighted Hamming metric, so does single-site dynamics started at extremal configurations. In particular, our result completes work of Kenyon, Mossel, and Peres concerning Glauber dynamics for the Ising model on trees. Our approach also shows that on bipartite $n$-vertex graphs, alternating updates systematically between odd and even vertices cannot improve the mixing time by more than a factor of $\log n$ compared to updates at uniform random locations.
(This is joint work with Peter Winkler.)
Title: Slow Mixing of Glauber Dynamics and Simulated Tempering Algorithms
Markov chains can be effective tools for sampling elements of large sets, including independent sets and matchings of a graph. These models arise in statistical physics in the context of lattice gases and monomer-dimer models, where there is typically a parameter corresponding to temperature and we are interested in sampling from the Gibbs distribution at various temperatures. For many of these models local Markov chains undergo a phase transition whereby they are efficient at high temperatures and inefficient at low temperatures. We will describe some recent results that rigorously show conditions under which certain Markov chains will require exponential time to converge to equilibrium, including Glauber (local) dynamics and simulated tempering.
Title: Markov Chain Monte Carlo: Synthesizing Theory and Practice
This tutorial will discuss ways in which theoretical Markov chain analysis can impact the application of MCMC algorithms to modern statistical inference problems. Specific topics will include: differences between discrete and continuous state space chains; general ergodicity theorems via phi-irreducibility; and the use of coupling and minorization conditions to bound convergence to stationarity. We will also discuss adaptive MCMC algorithms, which update their tuning parameters during their run in an effort to better converge. We will provide a coupling proof of their ergodicity (under appropriate conditions) and consider some examples.
Title: Implementing Gibbs-type Samplers Using Incompatible Draws, with Applications in High-Energy Astrophysics
Ensuring the compatibility of conditional distributions is commonly emphasized in the constructions of Gibbs samplers: Compatibility is a necessary condition for proper convergence. In this talk I discuss a systematic strategy for the construction of MCMC samplers involving incompatible conditional distributions. This strategy is designed not only to ensure proper convergence to the the target stationary distribution but also to provide chains with quicker convergence and smaller autocorrelations at stationarity. Our method can be viewed as a generalization of blocking in that the systematic strategy sometimes yields a blocked version of the parent Gibbs sampler. It is more general than blocking in that it sometimes yields a set of incompatible conditional distributions that do not correspond to any standard Gibbs sampler. The method was motivated by an applied problem in high-energy astrophysics, and we illustrate its use in three problems stemming from our applied work. Our strategy can be viewed as a stochastic version of the ECME and AECM algorithms. Changing the order of the steps of these EM-type algorithms can destroy their celebrated monotone convergence. Likewise, changing the order of the conditional draws of our samplers can destroy their target stationary distribution. Thus, some care must be taken in deriving and implementing these fast samplers.
Title: Markov Chain Monte Carlo: Theory
I'll begin the tutorial by describing the basic connection between approximate counting (or estimating the partition function) and randomly sampling from the corresponding Gibbs distribution. We'll then dive into the focus of this tutorial: an overview of techniques for bounding the mixing time of finite Markov chains. The emphasis will be on the canonical paths technique and its application to matchings/permanent, and the coupling technique.
Title: Biopolymer Structure Simulation and Optimization via Fragment Re-growth Monte Carlo
An efficient exploration of the configuration space of a biopolymer is essential for its structure modeling and prediction. In this study, we propose a new Monte Carlo method, Fragment Re-growth via Energy-guided Sequential Sampling (FRESS), which incorporates the idea of multi-grid Monte Carlo into the framework of configurational-bias Monte Carlo and is suitable for chain polymer simulations. As a byproduct, we also found a novel extension of the Metropolis Monte Carlo framework applicable to all Monte Carlo computations. We tested FRESS on hydrophobic-hydrophilic (HP) protein folding models in both two and three dimensions. For the benchmark sequences, FRESS not only found all the minimum energies obtained by previous studies with substantially less computation time, but also found new lower energies for all the three-dimensional HP models with sequence length longer than 80 residues.
(This is joint work with department colleagues Sam C. Kou and Jun S. Liu.)