DIMACS Series in
Discrete Mathematics and Theoretical Computer Science

VOLUME Seventy
TITLE: "Discrete Methods in Epidemiology"
EDITORS: James Abello and Graham Cormode

Ordering Information

This volume may be obtained from the AMS or through bookstores in your area.

To order through AMS contact the AMS Customer Services Department, P.O. Box 6248, Providence, Rhode Island 02940-6248 USA. For Visa, Mastercard, Discover, and American Express orders call 1-800-321-4AMS.

You may also visit the AMS Bookstore and order directly from there. DIMACS does not distribute or sell these books.


Faced with the question of the intended audience for a collection such as the one assembled here, we have been asking ourselves the following questions:

Dave Ozonoff provided to us the following quote by Rothman that sheds some light on possible answers to the questions posed above:

In general terms, epidemiology deals with populations rather than individuals. One of its goals is to study the frequency of occurrences of health related events. It has a major but not exclusive concern with causes and determinants of disease patterns in populations. The premise is that a systematic investigation of different populations can identify causal and preventive factors. Epidemiology is an observational rather than an experimental science. Sample questions take the form of:

We have observed that occurrence measures, causal inference and study designs play prominent roles in the daily endeavors of a typical epidemiologist. Descriptive and analytical epidemiology are two overlapping flavors of this discipline.

Descriptive epidemiology attempts to describe patterns of disease according to spatial and temporal information about the members of a population. These patterns are described by tabulations or summaries of surveys and polls or by parametric or non-parametric population models. Models are in general global descriptions of the major part of a data set. Patterns on the other hand are local features of the data that can be described by association rules, modes or gaps in density functions, outliers, inflection points in regressions, symptom clusters, geographic hot spots, etc. Some epidemiologists appear more interested in local patterns rather than in global structure. This raises questions of how "realistic" certain patterns are.

Analytical Epidemiology attempts to explain and predict the state of a population's health. A typical goal is to summarize the relationship between exposure and disease incidence by comparing two measures of disease frequency. These comparisons may be affected by chance, bias and by the presence or absence of an effect. This explains naturally why statistical methods play a major role in Epidemiology since bias is a central preoccupation of its practitioners. Bias means a systematic error that results in an incorrect or invalid estimate of the measure of association. This can create or mask associations. Selection and information bias are two of the main bias types. In particular, selection shall be independent of exposure if the purpose of the study is to explain the relationship between exposure and disease occurrence. In summary, one of the central themes in analytical epidemiology is to understand the roles of bias, chance and real effect in the understanding of populations health.

To evaluate the role of chance, statistical hypothesis testing and estimation appear to be the tools of choice. On the other hand, generative models offer a way to describe infectious disease dynamics. Since disease patterns are of primary interest, data mining algorithms and detection of rules for pattern formation have a lot to offer. Classification and taxonomies are useful tools to develop predictive models. In general we believe that some questions addressed by epidemiologists benefit from viewing them in a mathematical and algorithmic context. This volume is a first attempt to bridge the gap between the two communities. Its main emphasis is on discrete methods that have successfully addressed some epidemiological question. We begin by providing introductory chapters, on some of the key methods from discrete data mining by a selection of researchers in this area; and on descriptive epidemiology by D. Schneider. These collect, in a digested form, what we believe are among the most potentially useful concepts in data mining and epidemiology.

Next there are two chapters reporting work in epidemiology that suggest a discrete, analytical approach: Shannon on challenges in molecular data analysis, and Hirschman and Damianos on a system for monitoring news wires for indications of disease outbreaks. The remainder of the volume draws out further some of the key areas in the intersection between epidemiology and discrete methods. The technique of formal concept analysis, and the amazing depth of mathematical structure that arises from it is explored in chapters by Ozonoff, Pogel and Hannan, and Abello and Pogel. The dynamics of disease transmission can be modeled in a variety of ways, but often involves setting up systems of differential equations to model the ebb and flow of infection, as demonstrated by Desai, Boily, Masse and Anderson, and Vazquez, in the context of quite different problems. Eubank, Kumar, Marathe, Srinivasan and Wang study massive interaction graphs and give results by a combination of combinatorial methods and simulation; Abello and Capalbo focus on properties of graphs generated by an appropriate random model; while Hartke takes a combinatorial model of disease spread on tree graphs. Finally, we see two applications of Support Vector Machines to epidemiological data sets, from Li, Muchnik and Schneider (using breast cancer data from the SEER database) and from Fradkin, Muchnik, Hermans and Morgan (using data on disease in chickens). Some other potential areas of interest that we have not touched in this collection relate to patient confidentiality, coding and cryptography and multiscale inference.

We hope the volume helps to foster cooperation between epidemiologists, computer scientists and mathematicians. We believe this will help elucidate the main algorithmic and mathematical issues. In a relatively brief period of time we noticed a variety of interconnections between the disciplines, far richer than we ever dreamed of. We trust that the papers included here are a good indicator of the possibilities that discrete mathematical thinking can offer to a variety of epidemiological questions.

James Abello
Graham Cormode
Piscataway, NJ, 2005



Foreword                                                vii

Preface                                                  ix

Acknowledgments                                          xi

Selected Data Mining Concepts
     J. Abello, G. Cormode, D. Fradkin, D. Madigan,
       O. Melnik, and I. Muchnik                          1

Descriptive Epidemiology: A Brief Introduction
     D. Schneider                                        41

Biostatistical Challenges in Molecular Data Analysis
     W.D. Shannon                                        63

Mining Online Media for Global Disease Outbreak
     L. Hirschman and L.E. Damianos                      73

Generalized Contingency Tables and Concept Lattices
     D. Ozonoff, A. Pogel, and T. Hannan                 93

Graph Partitions and Concept Lattices
     J. Abello and A. Pogel                             115

Using Transmission Dynamics Models to Validate
   Vaccine Efficacy Measures Prior to Conducting
   HIV Vaccine Efficacy Trials
     K. Desai, M-C. Boily, B. Masse, and R.M. Anderson  139

Causal Tree of Disease Transmission and The Spreading
   of Infectious Diseases
     A. Vazquez                                         163

Structure of Social Contact Networks and Their Impact
   on Epidemics
     S. Eubank, V.S.A. Kumar, M.V. Marathe,
       A. Srinivasan, and N. Wang                       181

Random Graphs (and the Spread of Infections in a
   Social Network)
     J. Abello and M. Capalbo                           215

Attempting to Narrow the Integrality Gap for the
   Firefighter Problem on Trees
     S.G. Hartke                                        225

Influences on Breast Cancer Survival via SVM
   Classification in the SEER Database
     J. Li, I. Muchnik, and D. Schneider                233

Validation of Epidemiological Models: Chicken
   Epidemiology in the UK
     D. Fradkin, I. Muchnik, P. Hermans, and K. Morgan  243

Index                                                   257

Index Index of Volumes
DIMACS Homepage
Contacting the Center
Document last modified on July 17, 2006.