On May 10, 1994 Michael Burrows and David Wheeler published the Technical Report "A block sorting lossless data compression algorithm" describing a new data compression algorithm based on a reversible transformation of the input. This transformation had a profound impact in several algorithmic fields and is today universally known as the BWT: the Burrows-Wheeler Transform.
The BWT changed the approach to data compression in a twofold way: it highlighted the role of input transformations to prepare data for achieving better compression, and showed that a pipeline of simple encoders may be more effective that a one-shot compression pass. The BWT is at the heart of the algorithm bzip2 which has become the standard tool for lossless compression, thanks to its 20% compression ratio wrt 33% of gzip. Ten years after its discovery, new algorithms and theoretical studies deploying the BWT continue to flourish, which indicates that further progress is to be expected.
The BWT has also revolutionized the field of indexing data structures: using the BWT it is possible to build the so called "compressed indexes", a new family of data structures which support powerful substring searches and occupy roughly the same space required by the best compressors. These results have disproved the belief that an efficient full-text index must use space superlinear in the indexed string length (cfr. suffix tree and array).
Recent theoretical results, for example on combinatorics on words, have shown that our understanding of the properties of the BWT and its potential applications is still incomplete. There is therefore the need of further studies on the theory and applications of this fascinating mathematical tool.
CONFIRMED INVITED SPEAKERS
Michael Burrows (Google)
Julian Seward (Microsoft Research)
Participation:
People interested in attending the workshop should send an email to bwt2004@di.unipi.it before July 15, 2004. Since DIMACS provides some support to students, the interested ones should contact the organizers at the email address above.