DIMACS TR: 2005-33
Estimating Entropy and Entropy Norm on Data Streams
Authors: Amit Chakrabarti, Khanh Do Ba and S. Muthukrishnan
ABSTRACT
We consider the problem of computing information theoretic functions
such as entropy on a data stream, using sublinear space.
Our first result deals with a measure we call the ``entropy norm''
of an input stream: it is closely related to entropy but is
structurally similar to the well-studied notion of frequency moments.
We give a polylogarithmic space one-pass algorithm for estimating
this norm under certain conditions on the input stream. We also
prove a lower bound that rules out such an algorithm if these
conditions do not hold.
Our second result is a sublinear space one-pass algorithm for estimating
the empirical entropy of an input stream. For a stream of $m$ items and
a given real parameter $\alpha$, our algorithm uses space
$\widetilde{O}(m^{2\alpha})$ and provides an approximation of
$O(1/\alpha)$ in the worst case and $(1+\eps)$ in ``most'' cases.
All our algorithms are quite simple.
Paper Available at:
ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2005/2005-33.pdf
DIMACS Home Page