Information retrieval

Lecture 1. Introduction to IR tasks.
Lecture 2. IR systems architecture.
Lecture 3. IR models. Vector space retrieval model. Probabilistic IR. Binary Independence Model on BM25.
Lecture 4. Natural language processing and computational linguistics in IR tasks. Languages and code pages. Unicode. Computational morphology, stemming, syntactical parsing. Machine translation in IR. Lexical ambiguity resolution. Language detection. Synonyms.
Lecture 5. Web crawler (Part 1). Web crawler architecture. robots.txt. Freshness. Documents storage. BigTable.
Lecture 6. Web crawler (Part 2). Link graph analysis. PageRank and its variations. Host rank, HITS, Hubs, Authorities.
Lecture 7. Index construction. Postings size estimation, sort-based indexing, dynamic indexing, positional indexes, n-gram indexes, distributed indexing, real-world issues. Index construction in MapReduce. Tiering.
Lecture 8. Index compression. Lexicon compression and postings lists compression. Gap encoding, gamma codes, Zipf's Law, variable-byte encoding. Blocking. Extreme compression.
Lecture 9. Duplicates and near duplicates detection. Exact duplicates. Near duplicates. Shingles. Local sensitive hashing. Simhash. Examples.
Lecture 10. Quality measure. Cranfield. ERR, pFound, DCG, NDCG, MAP. Tie-awarness metrics.
Lecture 11. Social search. Deep web.
Lecture 12. Learning to rank.
Lecture 13. Spelling correction. Real-time search.
Lecture 14. Log analysis.