CENG 465

Principles of Data-Intensive Systems

In this course, students will gain knowledge in the following areas: Data models and query languages; storage and retrieval; replication and partitioning in distributed data; transactions; challenges of distributed systems, consistency, and consensus in distributed systems; batch and stream processing.

Objectives of the Course

The main aim of this course is to familiarize students with the fundamental concepts underlying a broad range of data-intensive systems. Students will learn to compare different approaches, techniques, and tools for storing and processing data, evaluate their strengths and weaknesses, and determine the most appropriate solutions for various application needs.

Recommended or Required Reading
M. Kleppman. Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O’Reilly Media, Inc., 2017.
T. Özsu, P. Valduriez. Principles of Distributed Database Systems. Springer, 4th ed., 2020.

Learning Outcomes

1. To be able to define the basic terminology used in data-intensive applications
2. To be able to compare different data models and query languages
3. To be able to decide the technologies, techniques, and tools appropriate for a given task
4. To be able to design data-intensive applications

Topics
Reliable, Scalable and Maintainable Applications
Data Models and Query Languages (1)
Data Models and Query Languages (2)
Data Storage and Retrieval
Distributed Data Replication
Distributed Data Partitioning
Midterm Exam
Transactions in Single-Node and Distributed Databases (1)
Transactions in Single-Node and Distributed Databases (2)
Challenges of Distributed Systems
Consistency and Consensus
Batch Processing
Stream Processing
Project Presentations

Assessments

Midterm 40%

Project 20%

Final 40%

Supplementary Course: CENG315