We consider the problems of compressing i.i.d. sequences and estimating the underlying distribution over unknown alphabets. A sequence can be described by separately conveying its symbols, and its pattern ---the order in which the symbols appear. We show that the information contained in the pattern grows in the same asymptotic rate as in the sequence. But the worst-case per-symbol redundancy of compressing the patterns and sequence are very different. While it is known that the second one grows to infinity as the alphabet size grows, here we show the first one diminishes to zero. More precisely, the patterns of i.i.d. sequences over all, including infinite and even unknown, alphabets, can be compressed with diminishing redundancy, both in block and sequentially, and that the compression can be performed in linear time.; Employing the maximum-likelihood principle, we look for the high-profile distributions that maximize the profile probability of the observed data. We derive local maxima conditions for the distributions. We design several practical algorithms to estimate profile probability. We use the Monte Carlo EM algorithm to find the high-profile distributions. Experiments show the estimators based on the high-profile distributions perform well in estimating several statistics.
展开▼