CS 560: Large-Scale Data Management (FSS16)

Organization

  • Lecturer: Prof. Dr. Rainer Gemulla
  • Tutors: Kaustubh Beedkar
  • Type of course: Lecture and practical exercises (6 ECTS points)
  • Lecture: Wednesday, 8:30-10:00, A5, 6 - Room B 244
  • Tutorium: Thursday, 13:45-15:15, Castle, East Wing - Room O 151
  • Evaluation: Final exam or oral examination, regular exercises
  • Prerequisites: Database Systems I or equivalent

News

  • There will be a tutorium on Wednessday (25.05) in room A101 in B 6, 23-25 Bauteil A from 3:30 PM--5:00 PM.
  • We now have an ILIAS forum.
  • Note that this course is likely to be moved to HWS in 2017 (i.e., next course after FSS16 is HWS17/18). So if you plan to take this course, you may need to take it now.
  • To get notified about updates to the course website, you may use http://www.changedetection.com/

Content

This course introduces the fundamental concepts and computational paradigms of large-scale data management and Big Data. This includes methods for storing, updating, querying, and analyzing large datasets as well as for data-intensive computing in general. The course covers concepts, algorithms, and system issues; accompanying exercises provide hands-on experience.

Tentative list of topics:

  • Parallel and distributed databases
  • MapReduce and its ecosystem
  • Spark and dataflows
  • NoSQL databases
  • Stream processing
  • Graph databases

Lecture Notes

  • 00 Organization (pdf)
  • 01 Introduction (pdf, updated Mar 1)
  • 02 Parallel and Distributed Database Systems (pdf, updated Mar 9)
  • 03 Parallel Query Processing (pdf, updated Apr 5)
  • 04 MapReduce (pdf, updated May 11)
  • 05 Spark and Dataflows (pdf, updated May 3)
  • 06 Distributed Transactions (pdf, updated May 17)
  • 07 NoSQL (pdf, updated May 24)

Literature

  • H. Garcia-Molina, J. D. Ullman, J. Widom
    Database Systems: The Complete Book
    Prentice Hall, 2nd ed., 2008
  • T. Öszu, P. Valduriez
    Principles of Distributed Database Systems
    Springer, 3rd ed., 2011
  • T. White
    Hadoop – The Definitive Guide

    O’Reilly, 3rded., 2012
  • J. Lin, C. Dyer
    Data-Intensive Text Processing with
    MapReduce
    Morgan and Claypool, 1st ed., 2010
  • C. Strauch
    NoSQL databases

    Stuttgart Media University, 2011
  • E. Redmond, J. R. Wilson
    Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement
    Pragmatic Bookshelf, 1st ed., 2012
  • P. J. Sadalage, M. Fowler
    NoSQL Distilled

    Addison-Wesley, 2012
  • More in lecture notes