Data Mining II

Building on the Data Mining fundamentals course, this course deepens the theory and practice of advanced data mining topics, such as:

  • Data Preprocessing
  • Regression and Forecasting
  • Dimensionality Reduction
  • Anomaly Detection
  • Time Series Analysis
  • Parameter Tuning
  • Ensemble Learning
  • Online Learning

The course consists of a lecture together with accompanying practical exercises as well as student team projects.  In the exercises the participants will gather initial expertise in applying state of the art data mining tools on realistic data sets.

Like in the previous years, participants will take part in the annual Data Mining Cup (DMC), an international student competition in data mining, as part of the project work. In addition to the DMC submission, the approaches and results of the project have to be compiled into a written project report, and presented in a plenary session.

Time and Location

  • Lecture: Monday, 10.15 - 11.45, B6 23-25, A 103
  • Exercise: Tuesday, 13.45 - 15.15, B6 23-25, A 104

 Instructors

Final exam

  • 50 % written exam
  • 50 % project work
  • 1st date: 17.6.
  • 2nd date: 31.8.

 Slides and Excercises

  Participation FSS 2016

  • The course is open to students of the Master Business Informatics and Lehramt Informatik.
  • The course is restricted to 30 participants.
  • Registration is done via the ILIAS group 
  • Registration will be opened Monday, February 8th, 8:00 am 
  • Allocation of places is done by FCFS (limit 30 students)

Outline (preliminary)

 

WeekSession 1Session 2Important Dates
15.02.Lecture: PreprocessingExercise: Preprocessing
22.02.Lecture: RegressionExercise: Regression
29.02.Lecture: Anomaly DetectionExercise: Anomaly Detection
07.03.Lecture: EnsemblesExercise: Ensembles09.03.: DMC Registration
14.03.Lecture: Time SeriesExercise: Time Series
21.03.Easter Break
28.03.Easter Break
04.04.Lecture: Online LearningExercise: Online Learning
11.04.DMC Task discussionTeam Building and brainstorming
18.04.Lecture: Parameter TuningExercise: Parameter Tuning
25.04.Project DiscussionProject Work
02.05.Project DiscussionProject Work
09.05.Project DiscussionProject Work
16.05.Project DiscussionProject Work
18.05.: DMC Submission
23.05.Final Presentation
RapidMiner Certification

 

Literature 

  1. Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining, Pearson.
  2. Ian H. Witten, Eibe Frank, Mark A. Hall: Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition, Morgan Kaufmann.
  3. Bing Liu: Web Data Mining, 2nd Edition, Springer.

Further literature on specific topics will be announced in the lecture.

Software

  • We will use the most recent version of RapidMiner. Licence key handling will be discussed within the first sessions of this course.

Lecture Videos

  • Video recordings of the Data Mining II lectures are available here (accessible from within the university network or VPN).