SEDS 542

Large-Scale Data Management

This course introduces the fundamental concepts and computational paradigms of large-scale data management. This includes major methods for storing, updating and querying large datasets as well as for data-intensive computing. The course covers concepts, algorithms, and system issues on the topics of parallel and distributed databases, peer-to-peer data management, MapReduce and its ecosystem, Spark and dataflows, datalakes and NoSQL databases.

Course Objectives

To introduce students to the current trends in large-scale data management covering concepts, architectures, algorithms and system issues.

Recommended or Required Reading

T. Öszu, P. Valduriez. Principles of Distributed Database Systems. Springer, 4th ed., 2020 ,H. Garcia-Molina, J. D. Ullman, J. Widom. Database Systems: The Complete Book. Prentice Hall, 2nd ed., 2008 ,L. Wiese. Advanced Data Management: For SQL, NoSQL, Cloud and Distributed Databases. De Gruyter, 2015

Learning Outcomes

1. Learn state-of-the-art research and industry trends in Large Scale Data Management Systems,

2. Understand the fundamental principles that govern all modern DBMSs,

3. Be able to make design decisions in deploying large scale data processing applications as well as to identify the bottlenecks of such applications,

4. Learn how to install and use open source systems and libraries in order to perform meaningful large-scale data management tasks.

Topics

Distributed Database Design

Distributed Query Processing

Distributed Transaction Processing

Parallel Architectures and Data Placement

Parallel Query Processing

Infrastructure and Schema Mapping

Querying and Replica Consistency

Blockchain

Distributed Storage Systems

MapReduce and its Ecosystem

Spark and Data Flows and DataLakes

Key-Value Stores and Document Stores

Wide-Column Stores and Graph DBMSs

Hybrid Data Stores and Polystores

Grading

Midterm 20%

Homework 20%

Attendance 20%

Final 40%

Instructor(s)

Assistant Professor / Vice Chair

Damla Oğuz

Other MS - SEDS Courses

About

SEDS 542

Large-Scale Data Management

Instructor(s)