IE 673: Data Mining and Matrices (HWS 2015)

News

Organization

  • Lecturer: Prof. Dr. Rainer Gemulla
  • Type of course: Lecture and practical exercises (6 ECTS points)
  • Lecture: Tuesday, 08:30-10:00, A5 C013
  • Tutorium (irregularly): Tuesday, 10:15-11:45, A5 C013
  • Evaluation: Final exam or oral examination, regular exercises
  • Prerequisites: Data Mining (recommended, not required)

Content

Many data mining tasks operate on dyadic data, i.e., data involving two types of entities (e.g., users and products, objects and attributes, points and coordinates, or vertices in a graph). Such dyadic data can be naturally represented in terms of a matrix, which opens up a range of powerful data mining techniques. This course provides an introduction into matrix decomposition models and algorithms for analyzing dyadic data, covers data mining tasks such as prediction, clustering, pattern mining, and dimension reduction, as well as application areas such as recommender systems, information retrieval, and topic modelling.

Data Matrix Mining
Book 1 5 0 3
Book 2 0 0 7
Book 3 4 6 5
          
Avatar The Matrix Up
Alice 4 2
Bob 3 2
Charlie 5 3
A document–term matrix            An incomplete rating matrix
 
Hot Topics
in IR
IR &
DM
DM &
Matrices
Student A 1 1 0
Student B 1 1 1
Student C 0 1 1
          
Jan. June Sept.
Saarbrücken –1 11 10
Helsinki –6.5 10.9 8.7
Cape Town 15.7 7.8 8.7
A student–course matrix            Cities and their average minimum temperatures

List of topics (tentative):
  • Singular value decomposition (SVD)
  • Non-negative matrix factorization (NMF)
  • Semi-discrete decomposition (SDD)
  • Boolean matrix decomposition (BMF)
  • Independent component analysis (ICA)
  • Matrix completion
  • Probabilistic matrix factorization
  • Spectral clustering
  • Graphs
  • Tensors

Lecture notes

  • 00: Organization (pdf)
  • 01: Introduction (pdf)
  • 02: Vectors & Matrices (pdf, updated Sep 21)
  • 03: Singular Value Decomposition (pdf, updated Oct 19)
  • 04: Matrix Completion (pdf, updated Oct 13)
  • 05: Non-Negative Matrix Factorization (pdf, updated Dec 8)
  • 06: Spectral clustering (pdf, updated Nov 3)
  • 07: Link analysis (pdf, updated Dec 8)

Exercises

  • 00 (15.9.): Introduction to R (R), Introduction to Python (Jupyter notebook, py, html, UNdata.csv)
  • 01 (22.9.): Vectors and matrices (pdf, updated 22.9.), solutions (pdf, R)
  • 02 (29.9.): Linear algebra & singular value decomposition (pdf), solution (pdf)
  • 03 (6.10.): Matrix and vector operations in R/Phyton (R), solution (R)
  • 04 (20.10.): Matrix calculus (pdf, updated Oct 19), solution (pdf)
  • 05 (3.11.+10.11.): Non-negative matrix factorization (pdf), solution (pdf)
  • 06 (17.11.): Spectral clustering (pdf, updated Nov 17), solution (pdf)

Assignments

  • 01: Singular value decomposition (zip), due Oct 14, 23:59
  • 02: Matrix completion (zip, updated Nov 3: added function "showcomps"), due Oct 28, 23:59
  • 03: Non-negative matrix factorization (zip, updated Nov 3: added function "showcomps"), due Nov 11, 23:59
  • 04: Spectral clustering (zip)

Literature

  • David Skillicorn
    Understanding Complex Datasets: Data Mining with Matrix Decompositions
    Chapman & Hall, 2007
  • See lecture notes for additional references.