DIMACS Workshop on Visualization and Data Mining

October 24 - 25, 2002
DIMACS Center, Rutgers University, Piscataway NJ, 08854-8018

Organizers:
Emden Gansner, AT&T, erg@research.att.com
Claudio Silva, Oregon Health & Science University, csilva@cse.ogi.edu
Presented under the auspices of the Special Focus on Data Analysis and Mining.

Abstracts



Gabriela Alexe, Sorin Alexe, and Peter L. Hammer, RUTCOR, Rutgers University

Title: Visualizing Knowledge Space in Logical Analysis of Data

The Logical Analysis of Data (LAD) is a method for the construction of data-driven classification models, based on combinatorics, optimization, and Boolean logic. LAD extracts from the dataset collections of rules (or "patterns") characteristic for the classes of observations, and uses this knowledge in the construction of classification models. Patterns play a central role in LAD: they define the "knowledge space" in which each observation is expressed as a characteristic vector of the patterns it triggers. We provide a consensus-type method to identify association rules, and to visualize many-to-many relationships between observations and patterns in the knowledge space. In addition, we will show how visualizing clusters in the knowledge space is relevant for new class discovery. All these concepts will be illustrated on several biomedical datasets.


George S. Almasi, Richard D. Lawrence, and Michael J. Rothman IBM T. J. Watson Research Center and Michael Rothman & Associates LLC

Title: KMAP - A Visualizer for Kohonen Self-Organizing Map Data Mining Results

Kmap is a highly interactive visualizer for quickly exploring neural clustering (Kohonen Self-organizing Map) results produced by data mining programs such as IBM's Intelligent Miner and NeuralWare's NeuralWorks Professional. Kohonen clustering separates similar records from a dataset into clusters using a Euclidian metric, and in addition puts the multi-dimensional clusters into a two-dimensional map in such a way as to preserve the underlying topology, that is, similar clusters are near each other. Kmap's emphasis is on conveying the characteristics of the clusters as well as their map positions in one view, with a rich set of controls and drilldowns for alternate views and additional details. We present case studies involving data from the stock market, US census, drug prescriptions, supermarket shoppers, and credit bureau areas. As an example, we used actual bankruptcy histories to color a map of clusters of consumers who had been assigned marginal credit rating. The coloring dramatically showed that the actual bankruptcies were limited to a small number of adjacent clusters corresponding to customers with a particular pattern of credit usage, and that the remainder were probably safe risks for additional credit.


Enrico Bertini and Giuseppe Santucci, Universit\`a di Roma "La Sapienza"

Title: Data Mining and Information Visualization: a result+data approach

There is a common agreement that integration of Data Mining and Information Visualization techniques can provide the right tools needed to handle the huge data collection we are faced with today. Data Mining and Information Visualization represent different approaches to solve a common problem: extract knowledge from data. The joint usage of these techniques can supply the lack that each one has in reaching the goal. In this paper we present a system in which the exploitation of both Data Mining and Information Visualization techniques provides a high degree of synergy and interaction, relating results coming from Data Mining activities with the underling data from which such results come up in a intuitive and usable manner.


Simon Byers, AT&T Labs - Research

Title: Seeing the Network

We present the results of various efforts to collect and analyze data pertaining to the edge of "the network". We consider several access media, including voice telephony, dialup, DSL, wireless, HFC and VOIP in scenarios motivated by a variety of business and research considerations. Visualization of the data and the analyses resulting from them is a key element of our work.


Yves Chiricota, UQAC Chicoutimi, Canada, Fabien Jourdan and Guy Melancon, LIRMM UMR CNRS 5506, Montpellier, France

Title: Graph clustering as a strategy for the visualization of large graphs: a metric-based approach

Researchers in Information Visualization are often challenged by large amounts of data to be processed and visualized. Many efforts have recently been devoted to the design of clustering techniques to lower the size of the data to be presented on the screen. The work we briefly present here describes a method to cluster a graph based on a metric derived from its combinatorial structure. The technique is simple to implement, has a low computational cost and proves to be useful in the context of Graph Visualization. It can be complemented by other navigation techniques, such as semantic zooming. Our work is based on a metric called the "cluster measure of a vertex" which has been introduced in the context of the so-called small-world graphs. This metric simply computes, for each vertex v, the ratio of the actual number of edges connecting neighbours of v with the total possible number of edges that could connect them. We introduce an extension of this metric to edges, and show how it can be used to induce a clustering of a graph. Also, we establish the usefulness and sharpness of our approach by computing a quality measure for each clustering of a graph.


Victoria Interrante, University of Minnesota

Title: The Application of Insights from Visual Perception, Art and Illustration to the Design of More Effective Techniques for Representing Data

Visualization research is the science and art of designing, implementing and evaluating methods for effectively communicating information through images, and feature extraction is critical to these efforts. Success depends on carefully choosing what to show, and carefully choosing how to show it, hierarchically guiding visual attention by selectively emphasizing the most important aspects of the data. In this talk I will discuss fundamental issues in effective visual representation, illustrated with examples from my research in 3D flow visualization and shape representation through texture. Questions I will address include: Where can we look to find insight into the science behind the art of effective visual representation? How can we begin to determine how best to portray a large, complicated set of data so that the essential information it contains can be easily and intuitively understood? And, how can we measure the success of our efforts?


Thomas Jackman, IBM T. J. Watson Research Center

Title: Deep View: A Cluster Based Rendering System with Network Delivery

We have architected a system from a cluster of workstations that functions as a parallel visualization and media server, which we call Deep View. This system can be used to meet the high fill rate requirements of large format and/or high resolution displays at a relatively economical price point. We discuss the results of our efforts to aggregate system components -- CPUs, graphics accelerators, I/O channels, hard drives, network adapters -- to deliver composite performance using parallel visualization strategies. A critical aspect of our approach has been an emphasis on the use of the network for delivery of raster pixel content to potentially remote displays. We discuss the technology for enabling network delivery, some of the current limitations this introduces, as well as some of the new directions this is taking us in the area of remote visualization. Some of the applications that will be discussed are from the areas of medical imaging, scientific visualization, geospatial imaging, and manufacturing design.


Kwan-Liu Ma, Univerisity of California, Davis

Title: An Emerging Interface Technology for Data Visualization

The process of scientific visualization is inherently iterative. A good visualization comes from experimenting with visualization and rendering parameters to bring out the most relevant information in the data. This raises a question. Considering the computer and human time we routinely invest for exploratory and production visualization, are there methodologies and mechanisms to enhance not only the productivity of scientists but also their understanding of the visualization process and data used? Recent advances in the field of data visualization have been made largely in rendering and display technologies (such as realtime volume rendering and immersive environments), but little in coherently managing, representing, and sharing information about the visualization process and results (images and insights). Naturally, the various information about data exploration should be shared and reused to leverage the knowledge and experience scientists gain from visualizing scientific data. A visual representation of the data exploration process along with expressive models for recording and querying task specific information can help scientists keep track of their visualization experience and findings, use it to generate new visualizations, and share it with others. Such a visual representation can be used as an interface to the data analysis and visualization process to dramatically improve explorability and facilitate collaboration. This talk introduces such a new user interface technology.


Curtis Rueden, University of Wisconsin

Title: VisBio - a biological visualization tool for animated 3D multispectral data

VisBio is a computer program for the interactive graphical display and mathematical analysis of multidimensional biological image data. VisBio harnesses the full functionality of the VisAD API, a scientific visualization toolkit written in 100% cross-platform Java. Through the VisBio interface, users can import microscope data of many file formats and interactively explore and measure that data within 3-dimensional time- varying (4D) recordings of specimens. VisBio is being developed to be specially tailored to the demands of handling and animating massive data sets fluidly, and to enable the interactive representation of recordings in which each spatiotemporal pixel element contains multiple dimensions: e.g., emission intensity, spectrum, and lifetime. In addition, VisBio provides a number of custom biological tools for analysis of biological events within the cell, such as endocytosis.


Chris Stolte, Stanford U.

Title: Polaris: A System for Query, Analysis and Visualization of Large, Hierarchical Relational Databases In the last several years, large, hierarchical relational databases have become common in a variety of applications such as data warehousing and scientific computing. It has proven difficult to analyze and interactively visualize these databases. Recently, we have designed and built Polaris, a system for exploring such databases. Polaris is based on the Pivot Table interface found in statistics packages and popularized in Excel. Polaris allows the quick interactive specification of table-based graphical displays, automatically generating a precise set of queries and drawing commands from the specifications. In this talk I will give an overview of Polaris, and describe two underlying technologies. The first is the formal methods used to specify visualizations, which are based on earlier work by Bertin, MacKinlay and Wilkinson. Second is a set of techniques for specifying multiscale visualizations of hierarchical databases represented as datacubes. Finally, I will discuss the role of tools such as Polaris in the knowledge discovery process.


Jim J. Thomas, Pacific Northwest National Laboratory

Title: Discovering the Unexpected Through Visual Analytics

A major challenge facing the technical community today in the science fields such as cell biology and national security for homeland defense is the analysis of masses of flowing information coming in different types (text, video, imagery, signals,..) from many different contexts. The likely solutions will come from a combination of technologies found within knowledge management, visualization, statistics, language theory to name a few. This talk will describe the driving factors of data types, scale, and analytical needs. Then I will describe specific examples of technologies in use today. We have discovered that a core founding analytic need is going beyond just "connecting the dots" to finding which dots to connect. This will require a new human information discourse for discovery of the unexpected. The lessons learned will lead to parts of technical agenda for the future.


Brian Wylie, VisWave

Title: Large Scale Graph Layout: Approaches and Performance Metrics

The talk explores the use of force-directed layout for large-scale graphs in the area of information visualization. We present approaches to extend traditional force-directed algorithms from toy-sized problems to graphs containing over one hundred thousand nodes. We also present methods of measuring graph stability with respect to random starting conditions, and perturbations in edge weights. The talk covers an array of performance metrics that facilitate the quantification of layout quality and capability. In the last section we demonstrate the coupling of graph layout based presentation to a more traditional data analysis tool.


Workshop Index
DIMACS Homepage
Contacting the Center
Document last modified on September 25, 2002.