DIMACS TR: 97-16

Optimization algorithms for separable functions with tree-like adjacency of variables and their application to the analysis of massive data sets



Authors: Ilya Muchnik and Vadim Mottl

ABSTRACT

A massive data set is considered as a set of experimentally acquired values of a number of variables each of which is associated with the respective node of an unoriented conjugacy graph that presets the fixed structure of the data set. The class of data analysis problems under consideration is outlined by the assumption that the ultimate aim of processing can be represented as a transformation of the original data array into a secondary array of the same structure but with node variables of, generally speaking, different nature, i.e. different ranges. Such a generalized problem is set as the formal problem of optimization (minimization or maximization) of a real-valued objective function of all the node variables. The objective function is assumed to consist of additive constituents of one or two arguments, respectively, node and edge functions. The former of them carry the data-dependent information on the sought-for values of the secondary variables, whereas the latter ones are meant to express the a priori model constraints. For the case when the graph of the pair-wise adjacency of the data set elements has the form of a tree, an effective global optimization procedure is proposed which is based on a recurrent decomposition of the initial optimization problem over all the node variables into a succession of partial problems each of which consists in optimization of a function of only one variable like Bellman functions in the classical dynamic programming. Two kinds of numerical realization of the basic optitimization procedure are considered on the basis of parametrical representation of Bellman functions, respectively, for discretely defined and quadratic node and edge functions. The proposed theoretical approach to the analysis of massive data sets is illustarted with its applications to the problems of the analysis of long molecular sequences and large visual images.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/1997/97-16.ps.gz
DIMACS Home Page