Parallel and distributed computing

The course is intended to offer an introduction to parallel computing and distributed systems, as well as to develop skills in the practical use of related technologies. Homework assignments include writing of parallel programs and running them on a computer cluster.
Course outline:
  1. Concurrency. Application areas and problems. Ways of implementing concurrent systems, processes and threads, programming tools. Basics of multithreaded programming on the example of C++ and Java. Common concurrent programming errors. Mutual exclusion and conditional synchronization. Memory model and low-level synchronization primitives. Alternative approaches to concurrent programming.
  1. Parallel computing. Application areas and problems. Modern parallel computing systems. Theoretical foundations of parallel computing. Performance metrics of a parallel algorithm. Design principles and typical structures of parallel algorithms. PCAM methodology. Parallel programming systems. Common parallel programming models and patterns. Parallel programming on shared memory systems with OpenMP. Parallel programming on distributed memory systems with MPI.
  1. Parallel processing of large data sets. Big Data phenomenon. MapReduce programming model. Principles of parallel data processing with MapReduce. Application areas and examples. Principles of distributed implementation of MapReduce on computer clusters. Apache Hadoop platform. Application programming interfaces and implementation of Hadoop programs. Local debugging and running programs on a cluster. Common techniques and strategies for implementing MapReduce programs. High-level languages and tools for Hadoop platform. Practical examples of MapReduce algorithms. Limitations of MapReduce model, extensions and alternative approaches.
  1. Distributed systems and computing. Scope, characteristics and types of distributed systems. Problems of building distributed systems. Theoretical foundations of distributed computing, examples of distributed algorithms. Paradigms for process interaction in distributed systems, network protocols. Distributed programming technologies. Introduction to Erlang language. Distributed data storage, data replication, NoSQL systems. Distributed computing technologies, grids, volunteer computing. Cloud computing systems.