Discrete Mathematics and Theoretical Computer Science

TITLE: "Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications"

EDITORS: Regina Liu, Robert Serfling and Diane Souvaine

To order through AMS contact the AMS Customer Services Department, P.O. Box 6248, Providence, Rhode Island 02940-6248 USA. For Visa, Mastercard, Discover, and American Express orders call 1-800-321-4AMS.

You may also visit the
AMS Bookstore and
order directly from there. DIMACS ** does not** distribute or sell these books.

This book is a collection of some of the research work presented in the workshop on "DATA DEPTH: ROBUST MULTIVARIATE STATISTICAL ANALYSIS, COMPUTATIONAL GEOMETRY & APPLICATIONS". The workshop was held from May 14 to 16, 2003, at Rutgers University in New Jersey, and it was sponsored by DIMACS with support from the National Science Foundation. The workshop was co-organized by Regina Liu, Robert Serfling, Diane Souvaine and Yehuda Vardi. There were more than 100 participants from various fields, including: statistics, computer sciences, mathematics and operations research. This workshop brought together not only two DIMACS special foci but two different communities: statisticians and computational geometers. The result was a very exciting interdisciplinary workshop that laid the foundations for further interfaces and collaborations.

Multivariate data analysis plays a role of ever-increasing importance in scientific studies. In current developments of multivariate analysis, a more geometric point of view is being emphasized. Descriptive measures and sample statistics that can capture properly the higher-dimensional features of multivariate data are needed. Several geometric approaches have been proposed recently. Especially promising is the one founded on the concept of data depth. This new concept provides centeroutward orderings of points in Euclidean space of any dimension. It also provides many new perspectives to aspects of probability as well as of computer science. In particular, the development of implementable computing algorithms for depthbased statistics has brought about many new challenges in computational geometry. The extensive development of data depth in recent years has spawned attractive depth-based tools for nonparametric multivariate data analysis, with a wide range of applications. The diversity in approaches, emphases, and concepts, however, makes it necessary to seek unified views and perspectives that would guide the further development of the depth-based approach.

There has been a flurry of activities in the development of data depth. Different notions of data depth include: the projection of half-space, the counting of random simplices, the peeling of convex hulls, the summing over volumes of random simplices, the summing over absolute distance among random pairs, and others. Each notion of data depth provides a distinctive center-outward ordering of sample points in a multidimensional space. The probabilistic geometry underlying the definition of each data depth produces a particular ordering of the points in the space and makes it especially suitable for certain types of applications. Variousstudies have been carried out to investigate and compare the structural properties of various data depths and their corresponding geometric layout such as contours and central quantiles. Data analysis methodologies based on data depth have also been developed for constructing confidence regions, computing p-values for testing hypotheses, constructing rank tests, and regression, etc. Applications to statistical quality control, aviation safety data analysis, and gene clustering have been demonstrated with success. Many more applications are currently on-going or continue to be discovered. Although research on the subject of data depth has been on-going for more than a decade, many problems are still open. Developing efficient and implementable computer algorithms is also a crucial element of the development of data depth analysis methodology, and it requires special expertise in computer science. The need for collaborations between statisticians and computer scientists in this regard is evident and so was the need for this special workshop to bring together experts from both communities.

The computational geometry community has long recognized that there are many important and challenging problems that lie at the interface of geometry and statistics. There are many instances in which geometric techniques have been utilized to give efficient algorithms for important problems in statistics. However, some of these algorithms, while guaranteeing good theoretical asymptotic performance, are difficult to implement in practice and thus are of limited utility for statisticians or general consumers of "scientific computing". There is a clear demand for modifying or creating implementable algorithms with good theoretical performance for statistical analysis of large scientific datasets.

The concept of data depth has been developed by statisticians for nonparametric multivariate data analysis. Among the numerous different notions of data depth, the so-called halfspace depth lends itself nicely to the creation of convex depth contours within the multivariate data set. Computational geometers and statisticians have provided algorithms and code for computing the halfspace depth contours in two dimensions. The running time is O(n2 log n) for a dataset of size n. The theoretical result that requires only O(n log5 n) does not lend itself easily to implementation. Recent improvements have used either the duality and topological sweep of an arrangement of lines in the dual or some randomized versions. Although deepest points have been widely studied in the computational geometry literature, the best known time for algorithms for center points in high dimensions is still of O(nd+1). So far most computational algorithms are primarily centered around the halfspace depth in lower dimensions. Efficient computational algorithms are needed for other existing notions of depth, especially for higher dimensional settings. The main impediment to the widespread use of depth-induced data analysis is the computational bottleneck. Exploiting geometry to create implementable algorithms, both exact and approximate, would be extremely valuable, as evidenced by several of the papers in this volume.

This workshop was held under the auspices of the DIMACS Special Focus on Data Analysis and Mining and the DIMACS Special Focus on Computational Geometry and Applications, with support from NSF DMS 02-05136 and NSA MDA 04-02-0101. Many people in DIMACS and the Department of Statistics of Rutgers University had worked hard to help make this workshop a reality. In particular, we would like to thank the Department of Statistics of Rutgers University and Ms. Pat Wolf from the Department of Statistics, Rutgers University, for her enduring patience and efficiency in handling all the details from abstracts to those papers in this volume.

Foreword ix Preface xi Depth functions in nonparametric multivariate inference 1 R. Serfling Rank tests for multivariate scale difference based on data depth 17 R.Y. Liu and K. Singh On scale curves for nonparametric description of dispersion 37 J. Wang and R. Serfling Data analysis and classification with the zonoid depth 49 K. Mosler and R. Hoberg On some parametric, nonparametric and semiparametric discrimination rules 61 A. Hartikainen and H. Oja Regression depth and support vector machine 71 A. Christmann Spherical data depth and a multivariate median 87 R.T. Elmore, T.P. Hettmansperger, and F. Xuan Depth-based classification for functional data 103 S. Lopez-Pintado and J. Romo Impartial trimmed means for functional data 121 J.A. Cuesta-Albertos and R. Fraiman Geometric measures of data depth 147 G. Aloupis Computation of half-space depth using simulated annealing 159 B. Chakraborty and P. Chaudhuri Primal-dual algorithms for data depth 171 D. Bremner, K. Fukuda, and V. Rosta Simplicial depth: An improved definition, analysis, and efficiency for the finite sample case 195 M.A. Burr, E. Rafalin, and D.L. Souvaine Fast algorithms for frames and point depth 211 J.H. Dul Statistical data depth and the graphics hardware 223 S. Krishnan, N.H. Mustafa, and S. Venkatasubramanian

Index of Volumes

DIMACS Homepage

Contacting the Center

Document last modified on December 12, 2006.