IE 673: Data Mining and Matrices (HSS 2014)

  • Lecturer: Prof. Dr. Rainer Gemulla
  • Type of course: Lecture and practical exercises (6 ECTS points)
  • Lecture: Thursday, 15:30-17:00, B6 A305 (Portal)
  • Tutorium (irregularly): Thursday, 17:15-18:45, B6 A305
  • Evaluation: Final exam or oral examination, regular exercises
  • Prerequisites: Data Mining (recommended, not required)

Content

Many data mining tasks operate on dyadic data, i.e., data involving two types of entities (e.g., users and products, objects and attributes, or points and coordinates); such data can be naturally represented in terms of a matrix. Matrix decompositions, with which we (approximately) represent the data matrix as a product of two (or more) factor matrices, can be used to perform many common data mining tasks. In this lecture, we explore the use of matrix decompositions for denoising, discovery of latent structure, and visualization, among others. We cover data mining tasks such as prediction, clustering and pattern mining, and application areas such as recommender systems and topic modelling.

Data Matrix Mining
Book 1 5 0 3
Book 2 0 0 7
Book 3 4 6 5
          
Avatar The Matrix Up
Alice 4 2
Bob 3 2
Charlie 5 3
A document–term matrix            An incomplete rating matrix
 
Hot Topics
in IR
IR &
DM
DM &
Matrices
Student A 1 1 0
Student B 1 1 1
Student C 0 1 1
          
Jan. June Sept.
Saarbrücken –1 11 10
Helsinki –6.5 10.9 8.7
Cape Town 15.7 7.8 8.7
A student–course matrix            Cities and their average minimum temperatures

List of topics (tentative):
  • Singular value decomposition (SVD)
  • Non-negative matrix factorization (NMF)
  • Semi-discrete decomposition (SDD)
  • Boolean matrix decomposition (BMF)
  • Independent component analysis (ICA)
  • Matrix completion
  • Probabilistic matrix factorization
  • Spectral clustering
  • Graphs
  • Tensors

Lecture notes

  • 00: Organization (pdf)
  • 01: Introduction (pdf)
  • 02: Linear Algebra Basics (pdf)
  • 03: Singular Value Decomposition (pdf)
  • 04: Matrix completion (pdf)
  • 05: Non-negative matrix factorization (pdf)
  • Guest lecture: Boolean matrix factorization (pdf)
  • 06: Spectral clustering (pdf)
  • 07: Probabilistic matrix factorization (pdf)

Literature

  • David Skillicorn
    Understanding Complex Datasets: Data Mining with Matrix Decompositions
    Chapman & Hall, 2007
  • See lecture notes for additional references.