SEDS 542

Large-Scale Data Management

This course introduces the fundamental concepts and computational paradigms of large-scale data management. This includes major methods for storing, updating and querying large datasets as well as for data-intensive computing. The course covers concepts, algorithms, and system issues on the topics of parallel and distributed databases, peer-to-peer data management, MapReduce and its ecosystem, Spark and dataflows, datalakes and NoSQL databases.

Topics
Distributed Database Design
Distributed Query Processing
Distributed Transaction Processing
Parallel Architectures and Data Placement
Parallel Query Processing
Infrastructure and Schema Mapping
Querying and Replica Consistency
Blockchain
Distributed Storage Systems
MapReduce and its Ecosystem
Spark and Data Flows and DataLakes
Key-Value Stores and Document Stores
Wide-Column Stores and Graph DBMSs
Hybrid Data Stores and Polystores