Information theory

Course objectives

  1. Introducing main approaches to define the notions of information and of quantity of information: the combinatorial approach (Hartley information), the probabilistic approach (Shannon entropy) and the algorithmic approach (Kolmogorov complexity).
  2. Studying properties of these notions.
  3. Applying information theory in different areas: proving upper and lower bounds of the number of operations (in different contexts), coding and information transmission, learning theory.

Tentative contents

  1. Hartley information (the binary logarithm of the number of outcomes).
  2. Applying Hartley information to sorting and other problems on the number of comparisons: n log n lower bound for the number of comparisons needed to sort n numbers, lower bounds for the number of comparisons needed to find the fake coin (which is lighter or heavier than genuine ones) among several coins.
  3. Informational approach in communication complexity: the rectangle method.
  4. Informational analysis in logic of knowledge (the problem of dirty wise men, the problem of two sided cards, the problem of gentlemen's hats).
  5. Probability distributions over an alphabet (random variables taking values in a finite set). Uniquely decodable codes and prefix codes of variable length. Expected length of code words.
  6. Shannon entropy and its relation to minimal expected length of a prefix code. Shannon-Fano codes.
  7. Kraft-McMillan theorem and the lower bound of the expected length of any uniquely decodable code.
  8. Real texts, as Markov chains of small order, and an upper bound of the quantity of information in them. Redundancy.
  9. Pairs of jointly distributed random variables taking values in finite sets. The inequality for Shannon entropy of a pair of random variables.
  10. Conditional Shannon entropy and its properties.
  11. Independent random variables and their entropy. The quantity of information in a random variable about another random variable.
  12. Bit guessing game in a randomly chosen string. The cost of insider information and Shannon entropy.
  13. Information inequalities. Basic inequalities and their consequences (Shannon type inequalities).
  14. Two random variables that take the same value with high probability and Fano's inequality.
  15. Shannon noiseless coding theorem.
  16. Capacity of noisy channels and Shannon's theorem on the block codes for noisy channels.
  17. Information transmission in the case when the receiver has some partial information. Slepyan-Wolf theorem.
  18. Shannon entropy and mathematical statistics: information in the data about the model and minimal sufficient statistic.
  19. Information compression and universal decompressors. The amount of information in a given text (file) according to Kolmogorov (Kolmogorov complexity).
  20. Properties of Kolmogorov complexity: complexity does not exceed the length, complexity does not increase via algorithmic transformations, complexity is not computable, but it is upper semi-computable.
  21. The number of words with low complexity, incompressible words.
  22. An application of Kolmogorov complexity to one-tape Turing machines: a quadratic lower bound for the number of steps needed to copy a given input string.
  23. Conditional Kolmogorov complexity. The complexity of a pair of strings and Kolmogorov-Levin theorem.
  24. Similarities between Kolmogorov complexity, Shannon entropy and Hartley information. The relation between Kolmogorov complexity and Shannon entropy: with high probability Kolmogorov complexity of a word consisting of independent randomly chosen symbols is equal to its Shannon entropy.
  25. Applications of Kolmogorov complexity in mathematical statistics: separating useful information in the data from accidental noise; algorithmic minimal sufficient statistic.
  26. R. Solomonoff's theory on predicting bits of a sequence that is random with respect to an unknown probability distribution; universal predictors.
  27. PAC learning and Chervonenkis-Vapnik dimension.