2001-2006 Special Focus on Data Analysis and Mining: Overview

Theoretical and algorithmic approaches to data analysis have played a central role in the development of modern methods for handling data. Now, however, the massive amounts of data gathered in important modern applications ranging from the Internet to credit card fraud detection to astronomy and medicine have dramatically changed the requirements for algorithms and provide ample motivation for a great deal of new theoretical development. We need methods for data analysis and mining that scale to the huge volumes of data that we are getting and can expect to get in such applications. DIMACS is planning a special focus devoted to data analysis and data mining, with emphases on the development of theoretical and algorithmic approaches to the massive data-mining problems that we face today, on the increasingly abstract formulations and models of data mining questions that are being seen in current research, and on the connections between theoretical approaches and practical applications.

The emphasis of this special focus will be on unifying promising approaches to data analysis and data mining that come from many distinct communities of researchers. The topics of interest include methodologies and algorithms for data mining, including clustering, discriminant analysis, enumerative methods, and multidimensional scaling; the increasingly abstract formulations and models of data mining questions using logical methods, conceptual clustering, learning and discovery that are critical in data mining and in particular for automatic, intelligent decision making; and the special problems that arise from applications to such important areas as fraud and intrusion detection, web mining, medical and scientific databases, marketing, and natural language data.


The field of data analysis has a long history with roots in traditional statistical analysis and the development of artificial intelligence. Theoretical analysis of databases has allowed the precise formulation of questions about them and in turn, coupled with a strong effort in algorithms, has led to many powerful techniques for collecting, storing, consolidating, processing, correcting, and retrieving data, for learning, and for finding previously undiscovered patterns.

The emergence of new and powerful data collection technologies has led to the creation of massive amounts of data, often distributed, shared, partially unknown, or having specialized structures. Traditional data analysis tools are incapable of handling the sheer size and complexity of these gigantic data sets. There is a great need to develop new methods and algorithms that can handle these data sets. We need to develop theoretical underpinnings for managing and reasoning about data, and we need new tools for finding patterns or displaying useful summaries of the data.

Among the topics we shall emphasize in this special focus are the following:

Opportunities to Participate: The Special Focus will include:
Up. Index of Special Focus on Data Analysis and Mining
DIMACS Homepage
Contacting the Center
Document last modified on July 19, 2005.